Classification of Tweets with a Mixed Method Based on Pragmatic Content and Meta-Information

Classification of Tweets with a Mixed Method Based on Pragmatic Content and Meta-Information

M. Esteve F. Miró A. Rabasa 

University Miguel Hernández of Elche, Spain

| |
| | Citation



The sharp rise in social networks in any field of opinion has led to the increasing importance of content analysis. Due to the concretion of the texts published on Twitter from its limitation to 140 characters, this network is the most suitable for the analysis and classification of opinions according to different criteria. Therefore, there are multiple tweet analysis tools oriented from the perspective of semantics for trying to classify content characteristics such as feeling and polarity.

In this paper, the authors present a new approach to classification from a different perspective. The proposed approach addresses a complex mixed model from a perspective of pragmatics, the analysis of opinions in the context of their issuer carried out by a panel of experts, along with the classification of the type of discourse by considering the meta-information of the tweet.

From this new approach, the paper presents a complete and complex analysis process of Big Data, which covers all the characteristic phases of the life cycle: capture, storage, preprocessing and analysis of a tweets database. The aim is to classify the tweets as violent or non-violent in their reference to terrorist acts.

If the classification models based on the metadata of tweets reach acceptable levels of accuracy, this methodology will offer a reliable and semiautomatic alternative for tweet classification.


analysis, big data, classification, social networks

1. Introduction and Objectives
2. State of the Art
3. Input Data Set
4. Complex System Methodology
5. Conclusions and Future Research

The present paper was carried out in the framework of research project DER2014-53449-R entitled “Incitación a la violencia y discurso del odio en Internet. Alcance real del fenómeno, tipologías, factores ambientales y límites de la intervención jurídica frente al mismo”, from MINECO.


[1] Internet Live Stats, Online (accessed March 2017).

[2] Morstatter, F., Pfeffer, J. & Liu, H., When is it biased? Assessing the Representativeness of Twitter’s Streaming API. Ed. Cornel University Library, 2014.

[3] Rabasa, A., Método para la reducción de Sistemas de Reglas de Clasificación por dominios de significancia (doctoral thesis). University Miguel Hernández of Elche, 2009.

[4] Quinlan, J.R., Discovering rules by induction from large collections of examples. In D. Michie (Ed.), Expert systems in the micro electronic age. Edinburgh University Press, 1979.

[5] Quinlan, J.R., Bagging, Boosting, And C4.5., University of Sydney. Technical Report, 2006.

[6] Dernoncourt, D., Hanczar, B. & Zuckera, J.D., Analysis of feature selection stability on high dimension and small sample data. Computational Statistics and Data Analysis, 71, pp. 681–693, 2014.

[7] Miró, F., Taxonomía de la comunicación violenta y el discurso del odio en Internet. Journal of law and political science studies, 22(I), 2016.

[8] Hand, D., Mannila, H. & Smyth, P., Principles of Data Mining, Cambridge, MA: The MIT Press, 2001.

[9] Han, J. & Kamber, M., Data Mining: Concepts and Techniques (3th. ed.), San Francisco: Morgan Kaufmann, 2012.

[10] Hernández, J., Ramírez, M.J. & Ferri, C., Introducción a la minería de Datos, Pearson. Prentice Hall, pp. 19–45, 2004.

[11] Twitter Apps, online Accessed on March 2017.

[12] Wasilewska, A. & Menasalvas, E., Data Preprocessing and Data Mining as Generalization. Data Mining: Foundations and Practice, 118 of the series Studies in Computational Intelligence, pp. 469–484, 2008.

[13] Miró, F. & Johnson, S., Cybercrime and Place: Appliying Environmental Criminology to Crimes in Cyberspace. In G. Bruinsma & S. Johnson (eds), The Oxford Handbook of Environmental Criminology. Oxford: Oxford University Press, 2017.

[14] Burnap, P. & Williams, M.L., Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making. Policy & Internet, 7(2), pp. 223–242, 2015.