Document Type



We model the cloud as a network of servers holding large XML trees, where the communication costs between servers are high and the local computations costs are low. We propose a general method, StatsReduce, which combines some statistical information of the data on each server and construct a global statistics for the combination of the trees. We show how to use this global statistics to approximately answer Analytics queries on the global data, in the case of a composition of trees. The value of new services on the cloud is dependent on the efficient estimation of Analytics queries with methods such as StatsReduce.