Fair task allocation for large data sets analysis

Fair task allocation for large data sets analysis

Quentin Baert Anne-Cécile Caron Maxime Morge Jean-Christophe Routier 

Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, F-59000 Lille, France

Corresponding Author Email: 
prenom.nom@univ-lille1.fr
Page: 
401-426
|
DOI: 
https://doi.org/10.3166/RIA.31.401-426
Received: 
|
Accepted: 
|
Published: 
31 August 2017
| Citation

OPEN ACCESS

Abstract: 

Many companies are using MapReduce applications to process very large amountsof data. In order to optimize the task allocation, several systems collect data from  previous runs and predict the performance doing job profiling. However they are not effective during the learning phase, or when a new kind of job or data set appears. In this paper, we present an adaptive multiagent system for large data sets analysis with MapReduce. We do not preprocess data but we adopt a dynamic approach, where the reducer agents interact during the job. In order to decrease the workload of the most loaded reducer - and so the running time - we propose a task re-allocation based on negotiation. We prove that the negotiation process terminates and leads to a better task allocation. Our experimentations over real-world data confirm the added-value of negotiation

Keywords: 

multiagent system, distributed problem solving, negotiation, big data, MapReduce

1. Introduction
2. MapReduce
3. Travaux connexes
4. Proposition
5. Expérimentations
6. Conclusion
Remerciements

Ce travail s’inscrit dans le défi CNRS MASTODONS 2017. Nous remercions le comité de programme des JFSMA ainsi que les relecteurs RIA qui, par leurs remarques, nous ont permis d’améliorer cet article.

  References

Baert Q., Caron A.-C., Morge M., Routier J.-C. (2016). Allocation équitable de tâches pour l’analyse de données massives. In Actes des journées francophones sur les systèmes multiagents (JFSMA), p. 55–64. Saint-Martin-du-Vivier (Rouen), France, Cépaudès.

Baert Q., Caron A.-C., Morge M., Routier J.-C. (2017). Stratégie de découpe de tâche pour le traitement de données massives. In Actes des journées francophones sur les systèmes multi-agents (JFSMA). Caen, France, Cépaudès. (À paraître)

Brandt F., Conitzer V., Endriss U., Lang J., Procaccia A. D. (2016). Handbook of computational social choice. Cambridge University Press.

Chen Q., Zhang D., Guo M., Deng Q., Guo S. (2010). SAMR: A self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In Proc. of 10th international conference on computer and information technology (CIT), p. 2736–2743.

Clinger W. D. (1981). Foundations of actor semantics. Thèse de doctorat non publiée, Massachusetts Institute of Technology.

Damamme A., Beynier A., Chevaleyre Y., Maudet N. (2015, May). The power of swap deals in distributed resource allocation. In Proc. of the 14th international conference on autonomous agents and multiagent systems (AAMAS), p. 625-633. Istanbul, Turkey.

Dean J., Ghemawat S. (2004). MapReduce: Simplified data processing on large clusters. In Proc. of the 6th symposium on operating system design and implementation, p. 137-150.

DeCandia G., Hastorun D., Jampani M., Kakulapati G., Lakshman A., Pilchin A. et al. (2007). Dynamo: Amazon’s highly available key-value store. In Proc. of the 21st ACM SIGOPS symposium on operating systems principles, p. 205-220.

Graham R. L. (1969). Bounds on multiprocessing timing anomalies. SIAM journal on Applied Mathematics, vol. 17, no 2, p. 416–429.

Hewitt C., Bishop P., Steiger R. (1973). A universal modular ACTOR formalism for artificial intelligence. In Proc. of the 3rd international joint conference on artificial intelligence, p. 235-245.

Kwon Y., Balazinska M., Howe B., Rolia J. (2012). SkewTune: Mitigating skew in MapReduce applications. In Proc. of the ACM SIGMOD international conference on management of data, p. 25-36.

Lama P., Zhou X. (2012). AROMA: Automated resource allocation and configuration of Map-Reduce environment in the cloud. In Proc. of the 9th internatinal conference on autonomic computing, p. 63-72.

Liroz-Gistau M., Akbarinia R., Valduriez P. (2015). FP-Hadoop: efficient execution of parallel jobs over skewed data. Proc. of the VLDB Endowment, vol. 8, no 12, p. 1856–1859.

Morge M., Stathis K., Vercouter L. (2008). Argumentation sur les motivations propres dans l’architecture V3A pour des agents auto-adaptatifs (présentation courte). In Actes des journées francophones sur les systèmes multi-agents systèmes multi-agents, p. 149-158. Brest, France, Cépaduès.

Smith R. (1980). The contract net protocol: Highlevel communication and control in a distributed problem solver. IEEE Trans. on Computers, C, vol. 29, p. 12.

Verma A., Cherkasova L., Campbell R. H. (2011). Aria: Automatic resource inference and allocation for MapReduce environments. In Proc. of the 8th internatinal onference on autonomic computing, p. 235-244.

Vernica R., Balmin A., Beyer K. S., Ercegovac V. (2012). Adaptive MapReduce using situationaware mappers. In Proc. of the 15th international conference on extending database technology, p. 420-431.

White T. (2015). Hadoop: The definitive guide, 4th edition. O’Reilly Media. Zaharia M., Chowdhury M., Das T., Dave A., Ma J., McCauley M. et al. (2012). Resilient

distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proc. of the 9th usenix conference on networked systems design and implementation, p. 15-28.