Data mining techniques for data streams mining

Data mining techniques for data streams mining

V. Sidda ReddyT.V. Rao Govardhan A. 

Professor in CSE, Sai Tirumala NVR Engineering College

Professor in CSE, K.L.University, Guntur

Professor in CSE and Principal, JNTUHCE, Hyderabad

Corresponding Author Email: 
siddareddy.v@gmail.com
Page: 
31-35
|
DOI: 
https://doi.org/10.18280/rces.040106
Received: 
| |
Accepted: 
| | Citation

OPEN ACCESS

Abstract: 

In resent years data stream mining plays an important role in real-time applications that generate gigantic of data needed intelligent data processing and on-line data analysis. The source of high-speed data streams may include video surveillance systems, stock markets, internet traffic, tweets etc. Traditional data mining techniques can’t feasible for the data stream mining due to unique characteristics of data streams such as high dimensional, continuous flow, high-speed and fast changing. It necessitates building new data mining techniques or modifying existing ones to mine data streams. The main challenges include that the data stream mining needs to handle data distribution and concept drifting. This paper analyzes the challenges involved in designing data mining techniques for mining data streams besides evaluating various existing techniques and their preprocessing methods. The evaluation results reveal which methods are feasible and which methods are not feasible in real-time data streaming applications.

Keywords: 

Data Mining, OLAP, Concept Drifting, Data Streams, Data Stream Mining.

1. Introduction
2. Mining Data Streams
3. Challenges
4. Data Stream Mining Techniques
5. Evaluation
6. Conclusion
Acknowledgement
  References

[1] Shearer C. (2000). The CRISP-DM model: the new blueprint for data mining, Journal of Data Warehousing, Vol. 5, No. 4, pp. 4-15.

[2] Gaber M.M., Zaslavsky A., Krishnaswamy S. (2005). Mining data stream: a review. SIGMOD Record. Vol. 34, No. 2, pp. 18-26.

[3] Kholghi M., Hassanzadeh H., Keyvanpour M. (2010). Classification and evaluation of data mining techniques for data stream requirements, International Symposium on Computer, Communication, Control and Automation (3CA), pp. 474-478.

[4] Yang H., Fong S. (2010). An experimental comparison of decision trees in traditional data mining and data stream mining, IEEE Xplore 2010 international Conference, pp. 442-447.

[5] Han J., Kamber M. (2006). Data mining: concepts and techniques, second edition, The Morgan Kaufmann Series in Data Management Systems: Elsevier.

[6] Aggrawal C.C. (2007). Data Streams: Models and Algorithms: Springer.

[7] Chu F. (2005). Mining techniques for data streams and sequences, Doctor of Philosophy Thesis: University of California.

[8] Gama J., Rodrigues P.P. (2009). An overview on mining data streams, Studies Computational Intelligence. Springer Berlin/Heidelberg, pp. 29–45.

[9] Khan. (2000). Data stream mining: challenges and techniques, Proceedings of 22th International Conference on Tools with Artificial Intelligence.

[10] Muthukrishnan S. (2003). Data streams: algorithms and applications, Proceedings of the fourteenth annual ACM-SIAM symposium on discrete algorithms.

[11] Golab L., Özsu M.T. (2003). Issues in data stream management, ACM SIGMOD Record, Vol. 32, No. 2, pp. 5-14.

[12] Chi Y., Wang H., Yu P.S. (2005). Loadstar: load shedding in data stream mining, Proceedings of the 31st VLDB Conference, Trondheim, Norway. pp. 1302-1305.

[13] Gaber M.M., Krishnaswamy S., Zaslavsky A. (2003). Adaptive mining techniques for data streams using algorithm output granularity, The Australasian Data Mining Workshop.

[14] Teng W., Chen M., Yu P.S. (2004). Resource-aware mining with variable granularities in data streams, Proceedings of the 4th SIAM International Conference on Data Mining, Lake Buena Vista, USA. pp. 527-53.

[15] Ganti V., Gehrke J., Ramakrishnan R. (2002). Data streams under block evolution, ACM SIGKDD Explorations Newsletter, Vol. 3, No. 2, pp. 1-10.

[16] Last M. (2002). Online classification of nonstationary data streams, Intelligent Data Analysis, Vol. 6, No. 2, pp. 129-147.

[17] Chi Y., Wang H., Yu P.S. (2005). Loadstar: load shedding in data stream mining, Proceedings of the 31th VLDB Conference, Trondheim, Norway. pp. 1302-1305.

[18] Gaber M.M., Krishnaswamy S., Zaslavsky A. (2006). On-board mining of data streams in sensor networks advanced, Methods of Knowledge Discovery from Complex Data, Springer, pp.307-335.

[19] Kwon Y., Lee W.Y., Balazinska M., Xu G. (2008). Clustering events on streams using complex context information, Proceedings of the IEEE International Conference on Data Mining Workshop. pp. 238-247.

[20] Wang H., Fan W., Yu P., Han J. (2003). Mining concept-drifting data streams using ensemble classifiers, Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining, Washington DC, USA.

[21] Law Y., Zaniolo C. (2005). An adaptive nearest neighbor classification algorithm for data streams, Proceedings of the 9th European Conference on the Principals and Practice of Knowledge Discovery in Databases, Verlag, Springer.

[22] Law Y., Zaniolo C. (2005). An adaptive nearest neighbor classification algorithm for data streams, Proceedings of the 9th European Conference on the Principals and Practice of Knowledge Discovery in Databases, Verlag, Springer.

[23] Ferrer-Troyano F.J., Aguilar-Ruiz J.S., Riquelme J.C. (2004). Discovering decision rules from numerical data streams, Proceedings of the 2004 ACM symposium on Applied computing, Nicosia, Cyprus. pp. 649-653.

[24] O'Callaghan L., Mishra N., Meyerson A., Guha S., Motwani R. (2002). Streaming-data algorithms for high-quality clustering, Proceedings of IEEE International Conference on Data Engineering.

[25] Domingos P., Hutten G. (2002). Mining high-speed data streams, Proceedings of the Association for Computing Machinery 6th International Conference on Knowledge Discovery and Data Mining.

[26] Domingos P., Hulten G. (2001). A general method for scaling up machine learning algorithms and its application to clustering, Proceedings of the Eighteenth International Conference on Machine Learning, Morgan Kaufmann. pp. 106-113.

[27] Hulten G., Spencer L., Domingos P. (2001). Mining time-changing data streams, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, San Francisco, California. pp. 97-106.

[28] Aggarwal C., Han J., Wang J., Yu P.S. (2003). A framework for clustering evolving data streams, Proceedings of the 29th VLDB Conference, Berlin, Germany.

[29] Chen Y., Tu L. (2007). Density-based clustering for real-time stream data, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, San Jose, California, USA, pp. 133-142.

[30] Papadimitriou S., Faloutsos C., Brockwell A. (2003). Adaptive, hands-off stream mining, Proceedings of the 29th International Conference on Very Large Data Bases VLDB.

[31] Aggarwal C.C., Han J., Wang J., Yu P.S. (2004). A framework for projected clustering of high dimensional data streams, Proceedings of the 30th Conference VLDB, Toronto, Canada.

[32] Manku G.S., Motwani R. (2002). Approximate frequency counts over data streams, Proceedings of the 28th International Conference on VLDS, Hong Kong, China.

[33] Giannella C., Han J., Pei J., Yan X., Yu P.S. (2003) Mining frequent patterns in data streams at multiple time granularities, Data Mining: next generation challenges and future directions, MIT/AAAI Press, pp. 191-212.