Communication Cost in Parallel Query Processing
De Dan Suciu
We consider the following problem: what is the amount of communication required to compute a query in parallel on p servers, over a large database instance? We define the Massively Parallel Communication (MPC) model, where the computation proceeds in rounds consisting of local computations followed by a global reshuffling of the data. Servers have unlimited computational power and are allowed to exchange any data, the only cost parameters are the number of rounds and the maximum amount of communication per server. I will describe tight bounds on the amount of communication for the case of single round algorithms on non-skewed data, and discuss some partial results for multiple rounds and for skewed data.
 
     
	
                 
                 
	
                 
	
                 
	
               
	
               
	
               
	
               
	
               
	
               
	
               
	
         
	
           
                       
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
	
           
      
    