Extraction of Association Rules Using Big Data Technologies

Extraction of Association Rules Using Big Data Technologies

Carlos Fernandez-Basso M. Dolores Ruiz Maria J. Martin-Bautista  

Universidad de Granada, CITIC-UGR

Page: 
178-185
|
DOI: 
https://doi.org/10.2495/DNE-V11-N3-178-185
Received: 
N/A
| |
Accepted: 
N/A
| | Citation

OPEN ACCESS

Abstract: 

The large amount of information stored by companies and the rise of social networks and the Internet of Things are producing exponential growth in the amount of data being produced. Data analysis techniques must therefore be improved to enable all this information to be processed. One of the most commonly used techniques for extracting information in the data mining field is that of association rules, which accurately represent the frequent co-occurrence of items in a dataset. Although several methods have been proposed for mining  association rules, these methods do not perform well in very large databases due to high computational costs and lack of memory problems.

In this article, we address these problems by studying the current technologies for processing Big Data to propose a parallelization of the association rule mining process using Big Data technologies which implements an efficient algorithm that can handle massive amounts of data. This new algorithm is then compared with traditional association rule mining algorithms.

Keywords: 

Apriori, association rules, big data algorithms, data mining

  References

[1] Fayyad,  Usama M., et al., Advances in Knowledge Discovery and Data Mining, 1996.

http://dx.doi.org/10.1016/j.ins.2014.03.043 

[2] Apache., Hadoop apache, 2015.

[3] Meng, Xiangrui, et al. MLlib: Machine Learning in Apache Spark. arXiv: preprint arXiv:1505.06807, 2015

[4] Delgado, M., Ruiz, M.D. & Sánchez, D., New approaches for discovering exception and anomalous rules. International Journal of Uncertainty, Fuzziness and Knowledge-Based S ystems, 19(02), pp. 361–399, 2011. http://dx.doi.org/10.1142/S0218488511007039

[5] del Rio, Sara, et al., On the use of mapreduce for imbalanced big data using random forest. Information Sciences, 285, pp. 112–137, 2014.

[6] Palit, Indranil; Reddy, Chandan K., Scalable and parallel boosting with mapreduce. Knowledge and Data Engineering, IEEE Transactions on, 24(10), pp. 1904–1916, 2012.

[7] Agrawal, Rakesh, Imielinski, Tomasz & Swami, Arun, Mining association rules between sets of items in large databases. ACM SIGMOD Record, 22, pp. 207–216, 1993.

http://dx.doi.org/10.1145/170036.170072

[8] Berzal, Fernando, et al., An alternative approach to discover g radual dependencies. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 15(05), pp. 559–570, 2007. http://dx.doi.org/10.1142/S021848850700487X

[9] Agrawal, Rakesh, et al., Fast algorithms for mining association rules. In Proceedings of 20th international Conference Very Large Data Bases, VLDB, pp. 487–499, 1994.

[10] Jiawei, H., Jian, P. & Yiwen, Y., Mining frequent patterns without candidate generation. ACM SIGMOD Record, 29, pp. 1–12, 2000. http://dx.doi.org/10.1145/335191.335372

[11] Berzal, Fernando, et al., A new framework to assess association rules. In Advances in  Intelligent Data Analysis, Springer Berlin: Heidelberg, pp. 95–104, 2001. http://dx.doi.org/10.1007/3-540-44816-0_10

[12] Hüllermeier, E., Association rules for expressing gradual dependencies. In Principles of Data Mining and Knowledge Discovery, Springer Berlin: Heidelberg, pp. 200–211, 2002. http://dx.doi.org/10.1007/3-540-45681-3_17