Synthèse des méthodes de conduite de projets Big Data et des retours collectés lors de pilotes industriels

Synthèse des méthodes de conduite de projets Big Data et des retours collectés lors de pilotes industriels

Christophe Ponsard Mounir Touzani  Annick Majchrowski 

CETIC - Centre de recherche, Gosselies, Belgique

Docteur en Informatique, Toulouse, France

Corresponding Author Email: 
mtouzani64@gmail.com
Page: 
9-33
|
DOI: 
https://doi.org/10.3166/ISI.23.1.9-33
Received: 
| |
Accepted: 
| | Citation
Abstract: 

Companies are increasingly faced with the challenge of handling increasing and even massive amounts of digital data. Although Big Data technical solutions are available, many companies are struggling to deploy them because of a lack of maturity related to their management. This paper aims at improving guidance in this area based on a series of methodological brick documented in the literature from data mining projects to nowadays. It is complemented by lessons learned from pilots conducted in four key areas (IT, health, space, food industry). They give a concrete vision of how to implement the requirements gathering and data understanding steps with a focus on the identification of value, the definition of a relevant strategy and an agile follow-up to also manage the rise in maturity.

Keywords: 

projet management, adoption process, agile Methods, Big Data, case study.

1. Introduction
2. Typologie des méthodes d’analyse de données massives
3. Revue des méthodes et processus existants
4. Processus global suivi pour développer et valider la méthode
5. Retour d’expérience et recommandations
6. Travaux connexes et discussion
7. Conclusion et perspectives
Remerciements

Ce travail a été financé en partie par le projet PIT Big Data de la région wallonne (no 7481). Nous remercions nos partenaires d’avoir partagé leurs cas d’étude.

  References

Abdelhédi F., Brahim A. A., Atigui F., Zurfluh G. (2017). MDA-Based Approach for NoSQL Databases Modelling. In Proceedings 19th International Conference DaWaK, Lyon, France, August 28-31, p. 88–102.

Alliance A. (2001). Agile Manifesto. http://agilemanifesto.org.

Balduino R. (2007). Overview of OpenUP. https://www.eclipse.org/epf/general/OpenUP.pdf.

Bargiotti L. et al. (2016). European Union Location Framework Guidelines for public administrations on location privacy. JRC Technical Reports.

Batty M. et al. (2010, April). Predictive Modeling for Life Insurance Ways Life Insurers Can Participate in the Business Analytics Revolution. Deloitte Consulting LLP.

Bedos T. (2015). 5 key things to make big data analytics work in any business. http://www.cio.com.au/article/591129/5-key-things-make-big-data-analytics-work-any-business.

Béranger J. (2016). Big data and ethics: The medical datasphere. Elsevier Science.

Campbell H., Hotchkiss R., Bradshaw N., Porteous M. (1998). Integrated care pathways. BritishMedical Journal, p. 133-137.

Chang V. (2015). A proposed cloud computing business framework. Commack, NY, USA,Nova Science Publishers, Inc.

Chen H., Chiang R. H. L., Storey V. C. (2012). Business intelligence and analytics: From big data to big impact. MIS Q., vol. 36, no 4.

Chen H., Kazman R., Haziyev S. (2016). Agile big data analytics development: An architecturecentric approach. In Proc. hicss’16, hawaii, usa.

CloudPassage. (2016). Survey: Exponential Server Growth, Dynamics of Cloud Increase Attackable Surface Area and Risk. http://bit.do/cloud-passage.

Corea F. (2016). Big data analytics: A management perspective (1st éd.). Springer Publishing.

Crowston K. (2010). A capability maturity model for scientific data management.

Dam P. A. van. (2013). A dynamic clinical pathway for the treatment of patients with early breast cancer is a tool for better cancer care : implementation and prospective analysis between 2002–2010. World Journal of Surgical Oncology, vol. 11, no 1, p. 70.

Damiani E. et al. (2016). Big data threat landscape and good practice guide. https://www.enisa.europa.eu/publications/bigdata-threat-landscape.

EESC. (2017). The ethics of Big Data: Balancing economic benefits and ethical questions of Big Data in the EU policy context. European Economic and Social Committee.

Erl T., KhattakW., Buhler P. (2016). Big Data Fundamentals: Concepts, Drivers & Techniques. Prentice Hall.

European Commission. (2016). Regulation (EU) 2016/679 - General Data Protection Regulation (GDPR). http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32016R0679.

European Commission. (2017). A framework for the free flow of non-personal data in the EU. http://europa.eu/rapid/press-release_MEMO-17-3191_en.htm.

Franková P., Drahošová M., Balco P. (2016). Agile project management approach and its use in big data management. Procedia Computer Science, vol. 83, p. 576 - 583.

Gao J., Koronios A., Selle S. (2015). Towards A Process View on Critical Success Factors in Big Data Analytics Projects. In Amcis.

Gartner. (2016). Investment in big data is up but fewer organizations plan to invest. http://www.gartner.com.

Halper F. (2014). Predictive Analytics for Business Advantage. The DataWarehousing Institute Best Practices Report, TDWI.

Hoppen J. (2015). 7 characteristics to differentiate BI, Data Mining and Big Data. https://aquare.la/articles/2015/05/01/7-characteristics-differentiate-bi-data-mining-big-data.

Hurwitz J. et al. (2013). Big Data For Dummies (J. W. . Sons, Ed.).

IBM. (2013). Stampede. http://www.ibmbigdatahub.com/tag/1252.

IBM. (2015). Have you seen ASUM-DM? https://developer.ibm.com/predictiveanalytics/2015/10/16/have-you-seen-asum-dm/.

Kelly J., Kaskade J. (2013). CIOs & Big Data: What Your IT Team Wants You to Know. http://blog.infochimps.com/2013/01/24/cios-big-data.

Lamsweerde A. van. (2009). Requirements engineering - from system goals to UML models to software specifications. Wiley.

Liu L. (2007). From data privacy to location privacy: Models and algorithms. In Proc. of the 33rd international conference on very large data bases, p. 1429–1430. VLDB Endowment.

Lord N. (2017). The History of Data Breaches. https://digitalguardian.com/blog/history-data-breaches.

Lyman G. (2009, Jul). Impact of chemotherapy dose intensity on cancer patient outcomes. Journal of the National Comprehensive Cancer Network, p. 99-108.

Mariscal G., Marban O., Fernandez C. (2010). A survey of data mining and knowledge discovery process models and methodologies. Knowledge Eng. Review, vol. 25, no 2, p. 137-166.

Mauro A. D., Greco M., Grimaldi M. (2016). A formal definition of big data based on its essential features. Library Review, vol. 65, no 3, p. 122-135.

Nascimento G. S. do, Oliveira A. A. de. (2012). An Agile Knowledge Discovery in Databases Software Process. In 3rd Int. Conf., ICDKE, Wuyishan, Fujian, China, Nov. 21-23.

Nott C. (2014). Big Data & Analytics Maturity Model. http://www.ibmbigdatahub.com/blog/big-data-analytics-maturity-model.

Olavsrud T. (2017). Security-as-a-service model gains traction. https://www.cio.com/article/3192649/security/security-as-a-service-model-gains-traction.html.

Piccart M., Biganzoli L., Di Leo A. (2000). The impact of chemotherapy dose density and dose intensity on breast cancer outcome: what have we learned? European Journal of Cancer, vol. 36.

Ponsard C., Touzani M., Majchrowski A. (2017). Amélioration des méthodes de conduite de projets big data : retour d’expérience de pilotes industriels multi-sectoriels. In Actes du XXXVème Congrès INFORSID, Toulouse, France, 30 Mai-2 Juin, 2017, p. 179–194.

Popovic K., Hocenski Z. (2010, May). Cloud Computing Security Issues and Challenges. In The 33rd international convention mipro, p. 344-349.

Rot E. (2015). How Much Data Will You Have in 3 Years? http://www.sisense.com/blog/much-data-will-3-years.

RTBF. (2017). Vos données médicales sont revendues, vous le saviez? https://www.rtbf.be/ info/belgique/detail_vos-donnees-medicales-sont-revendues-vous-le-saviez?id=9728058.

Saltz J. (2015). The need for new processes, methodologies and tools to support big data teams and improve big data project effectiveness. In IEEE int. conf. on big data.

Saltz J., Shamshurin I. (2016). Big Data Team Process Methodologies: A Literature Review and the Identification of Key Factors for a Project‘s Success. In Proc. IEEE International Conference on Big Data.

Scrum Alliance. (2016). What is scrum? an agile framework for completing complex projects. https://www.scrumalliance.org/why-scrum.

Shearer C. (2000). The CRISP-DM Model: The New Blueprint for Data Mining. Journal of Data Warehousing, vol. 5, no 4.

Sun G., Liao D., Li H., Yu H., Chang V. (2017). L2P2: A location-label based approach for privacy preserving in LBS. Future Generation Computer Systems, vol. 74, p. 375 - 384.

Tripathi R., Sharma P., Chakraborty P., Varadwaj P. K. (2016). Next-generation sequencing revolution through big data analytics. Frontiers in Life Science, vol. 9, no 2, p. 119-149.

Vaillancourt J. (2010). Statistical methods for data mining and knowledge discovery. In Proc. of the 8th int. conf. on formal concept analysis, p. 51–60. Springer-Verlag.