Random forests (RFs) are a versatile choice for many machine learning applications. Despite their promising efficiency and simplicity, RFs are seldom used in collaborative scenarios like federated learning (FL). In FL, training data is scattered among a federation of clients. To train federated models, a central server aggregates inputs from all clients. For RFs as non-parametric models, coordinating the training phase and aggregating the global model is non-trivial. Design choices regarding the evaluation of candidate splits and the aggregation of decision trees prove to be context-specific. In this work, we identify aggregation techniques proposed in extant literature. The identified techniques are categorized across dimensions like training coordination, inference process, and privacy. We find an important distinction between synchronous and asynchronous techniques and evaluate the practical suitability of aggregation techniques by comparing advantages and drawbacks for prediction robustness and technical feasibility. Our results facilitate design choices of future federated RFs.

Paper Number



Track 15: Student Track