© 2023 IIETA. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
Despite the plethora of data generated on Arabic social media, research dedicated to this language remains comparatively scarce. Sentiment analysis, an extensively studied field in various languages, has seen limited development in Arabic. Existing approaches to Arabic sentiment analysis primarily employ machine learning, wherein word vector representations serve as features for model training. A significant challenge encountered in this approach is the substantial volume and sparsity of the matrix representation, attributable to the extensive vocabulary of the Arabic language. This paper proposes a novel word embedding that amalgamates the Bag of Roots (BoR) technique with Global Vector distributional representations (GloVe). This innovation is inspired by the characteristic of the Arabic language, where it is rare to find two or more words sharing the same root but conveying different sentiments. The impact of this innovative word embedding technique is highlighted through an evaluation using sentiment analysis. This involves the implementation of conventional classifiers, specifically Support Vector Machines (SVM) and Logistic Regression (LR). The results obtained demonstrate promising precision, recall, and F1-score metrics. Additionally, a significant reduction in processing time is observed when compared to other approaches referenced in literature. Thus, this paper contributes to the advancement of Arabic sentiment analysis, offering a potential pathway to overcoming the challenges associated with the large vocabulary and complex structure of the Arabic language.
sentiment analysis, roots, BOR, glove, word vector representation, machine learning, Arabic social media
Sentiment Analysis of Social Media (SASM) has emerged as a vibrant research area over the past decade, driven by the surge in data generated across various platforms [1]. Millions of words, covering a diverse range of topics, are generated within seconds. This colossal data set has piqued the interest of researchers and corporations alike, prompting a deeper exploration of its content. At its core, sentiment analysis seeks to predict and classify sentiments—be they positive, negative, or neutral—derived from comments or tweets [2].
From the existing state-of-the-art, it is observed that three primary approaches—lexicon-based [3], machine learning [4], and hybrid model [5]—have been employed to tackle sentiment analysis. Lexicon-based approaches rely on the polarity of the text, determined by the polarities of its composite words. Conversely, machine learning techniques utilize training and test datasets; a classifier learns differentiating text features from the training set, and the test dataset is used to assess the classifier's performance. Some researchers suggest that combining machine learning with lexicon-based approaches enhances sentiment classification performance.
Machine learning has demonstrated promising results in several languages. However, its effectiveness in Arabic remains limited due to the language's complex morphology and extensive lexicon, which comprises more than 12 million words [6]. Consequently, researchers have implemented various techniques to mitigate this vast term volume. Recent literature highlights distributional representations—GloVe [7], Word2Vec [8], and FastText [9]—that employ context to create vectors.
Despite the advancements brought about by word embedding, the Arabic language continues to present efficiency challenges and limitations. Thus, the present work seeks to enhance the GloVe architecture through the addition of a new processing layer, specifically designed to alter the text input into the GloVe algorithm via a root extraction module. The selection of word roots is driven by the rarity of Arabic words with the same root but opposite polarities—a characteristic not shared by languages like English and French, which have numerous words with identical roots and divergent polarities (e.g., "like and dislike", "worth and worthless"). The primary objective is to bolster the accuracy of the SA system, while simultaneously reducing processing time by compacting the vector space representations, which serve as model-building features.
The novel aspect of this approach lies in the enhancement of the GloVe architecture through the addition of a new layer tasked with vocabulary reduction. The overarching concept involves adding a new layer that rewrites a new corpus from the original by transforming all words into roots, subsequently feeding this into the GloVe approach. The impact of this new representation is demonstrated through an evaluation for a sentiment analysis task, using not deep learning or the new transformer-based architecture, but two conventional ML classifiers: Support Vector Machines (SVM) [10] and Logistic Regression (LR). These are tested using metrics relevant to F1-score and recall [11]. The new approach delivers superior results, outperforming the GloVe baseline representation and the state-of-the-art.
The remainder of this paper is structured as follows: Section 2 reviews existing literature related to the study. Section 3 elaborates on the GloVe distributional representation model. Section 4 details the proposed approach and methodology for extracting polarities from tweets. Section 5 presents the experiments and discusses the results of the system evaluation using two datasets and several metrics. The paper concludes with final observations and future perspectives.
In the last years, Arabic Sentiment analysis has become an attractive research. Several approaches are conducted in this area. The main approaches found in literature: Lexicon Based, Machine Learning or Hybrid models.
The Lexicon based use the polarity of words to determine texts sentiments. Bhamare and Prabhu [12], and Alotaibi et al. [13] use SentiWordNet and SentiStrength, respectively. They are English bag of words that use Machine Translation from Arabic to English; they did not give good results due to the difficulty of Arabic words, which are characterized by multiple meanings. Furthermore, Hamdi et al. [14] established a standard of Arabic lexicon was (Ar-SenL). It’s a large set of Arabic words embedding. There are no perfect accessible Arabic sentiment lexicons. Hence, several researchers used their own lexicons created in a defined field area [15], who interested in Egyptian Dialect.
The Machine Learning approach is based mostly on the notion of supervised learning technique. The ML's main idea is to split the dataset into a learning set and a set of tests, learning the first pre-labeled subset to create a model, and then testing the result with the second subset. In the literature, most of ML techniques are used in the English language, but still lack in Arabic. Sethy et al. [16] used different classifiers such as SVM, RBF to classify the polarity tweets collected from different domains, whereas, Babu and Rao [11] have used: Decision Tree and SVM algorithms to predict sentiments about COVID-19 vaccinations campaigns.
The hybrid approach, uses Machine learning and Lexicon Based. There are few works in Arabic language. Houari and Guerti [17] proposed hybrid approach of Arabic tweets sentiment analysis, in their method, the lexical-based classifier used to label the training data and the output is utilized to train the SVM machine learning classifier.
Mahmoudi and Salem [18] have used hybrid approach for sentiment analysis in Arabic tweets based on the deep learning model with features weighting.
As we conducted this comprehensive review of the literature on sentiment analysis in Arabic, we found that existing lexicon-based techniques suffer from sparsity and dimensionality issues due to the vast number of words in the Arabic language. Moreover, baseline word embeddings for machine learning, even in a hybrid approach, did not provide satisfactory results due to the complex morphology of the Arabic language, which often results in words with similar meanings having different surface forms. Therefore, to overcome these limitations, we proposed a new approach that integrates root extraction module (REM) and GloVe techniques to improve the embedding representation of Arabic words and enhance sentiment analysis on Arabic social media.
In literature, there are several models available for distributional word representations; these models are based on the linguistic hypothesis: “Words that occurs in the same context tends to have the same meaning." They are unsupervised techniques, which use statistics and probabilities of word occurrence in huge corpora.
Global vector representation (GloVe) [19, 20] is a famous one of the space vector representation. It is developed by Pennington in 2014, which aim to learn low-dimensional vector representations of terms. The global idea of GloVe is the use of ratios of co-occurrence probabilities instead of word occurrence. The formula of glove is denoted in the Eq. (1), while wi, wj, wk are three words and pik the probability of wk being the context word of wi, then if the ratios of pik and pjk is closer to 1, the representations of wi, wj are similar, else should be far away from wk.
$\mathrm{F}\left(\left(w_i-w_j\right)^T w_k\right)=\frac{P_{i k}}{P_{j k}}$ (1)
The use of GloVe model in the sentiment analysis system had improved the accuracy and reduced the dimensionality of matrix representations. The basic workflow of GloVe for sentiment analysis comprises three phases as shown in Figure 1. which is illustrated below. In the first, word vector representation is learned from a big corpus such as Wikipedia by the use of GloVe model. In the second phase, after the preprocessing of tweets or comments collected from the benchmark datasets, a list of words is created. In the third step, words are compared with the vectors created from the first step and get their representations, to be used as features for machine learning classifiers in order to create models.
Although that Glove representation gives better results in sentiment analysis in several languages such as English, French and Spanish languages than classical model such as bags of words (BoW) [21], in Arabic language is still poor due to the hard structure and morphology of The Arabic texts and numbers of words that exceeds the thirteen million. To enhance the results and get more benefit from the Glove representation in Arabic languages we added new module in the basic scheme of sentiment analysis, called roots extraction module (REM), it will be explained in the next section.
Figure 1. The Glove model for sentiment analysis scheme
The new approach involves combining two processing techniques, REM and GloVe, to create features that are more efficient and reduced in size. These features can then be used in machine learning classifiers to improve their accuracy and effectiveness. The goal of this approach is to enhance the quality of word embeddings and make them more useful in Arabic sentiment analysis tasks. The system is illustrated in Figure 2.
Figure 2. The Glove based Bor model for sentiment analysis scheme
Similar as the basic Glove sentiment analysis scheme, our system contains three phases. The novelty in this scheme is the incorporation of roots extraction model in GloVe representation, which we used in the phase one and two. The detail of the scheme is explained in the next subsections.
Phase I: Root vector representations
Data collection
It is necessary to work with a large amount of Arabic data, similar to the amount of data available on Wikipedia. This likely means that the process of learning the distributional vector representation involves analyzing patterns and relationships in a large dataset of Arabic text, in order to create an effective representation of how words are used and related in the language.
Root extraction module
It is the preprocessing step. The process is conducted as we show in the Figure 3 illustrated below, the first step is the data cleaning, which is removing symbols, URLs, punctuations and non-Arabic characters. Then we remove the stop words, i.e. meaningless word such as (the, in, or…). After that we use an efficient Arabic root extraction approach named Khoja [22], our result is an Arabic Wikipedia roots corpus.
Figure 3. Phase I. Roots extraction scheme
Glove
In this step, we utilize the global vector distributional GloVe model to train on the corpus obtained from the root extraction module. This allows us to obtain new vector representations for each word in the corpus based on its co-occurrence with other words, which will be used in the downstream tasks (i.e., Sentiment analysis).
Phase II: List of twitter dataset roots
Dataset acquisition
To create machine learning classifiers for this task, it is important to have access to a suitable dataset of Arabic text that has already been labeled with sentiment annotations. This dataset should contain a sufficient number of examples that are representative of the task and the domain in which the classifiers will be applied.
Root extraction module
As the phase I, a preprocessing step is needed to clean tweets from noises. Then we remove the stop words. After that a tokenization step is added, which is splitting the text into tokens separated with a coma or whitespace. The word list resulting from this process is processed by Khoja root extraction [22] to create a new BoR see Figure 4.
Figure 4. Phase II. List of roots extraction scheme
Phase III: Sentiment analysis models
Vector assignments
In this Phase, the result of the list of deducted from the Phase II is compared to root representations from the Phase I, and every root is assigned to their vector.
Models creations
vectors created are used as features in machine learning in which two efficient classifiers are used to create models for Arabic sentiment analysis, support vector machine (SVM) and logistic regression (LR). The dataset are divided to 70% for learning and 30% for test.
In this section, we describe the datasets and the two experiments realized and their results concerned the accuracy of our Arabic sentiment analysis system. All codes are implemented in Python, we used also the package glove-python, which is available online to implement the Glove model.
As mentioned in Section 4, we used three data collection, one collected from Arabic Wikipedia a huge corpus necessary to apply GloVe for creating the word vector representation using context to compact the size of matrix representations.
Datasets settings
Three well-known datasets are employed. The first is the Arabic Wikidata Dump 2018, which comprises all Wikipedia Arabic articles in wikipedia format from the January 20, 2018 data dump. It is available for download and can be accessed through the Wikidata Query Service or through third-party tools that support RDF data. It provides a valuable resource for those interested in exploring the Arabic language and developing new applications that can leverage structured data to enhance their performance. The content will be (mostly) in modern standard Arabic. It has a total of 75.000.000 tokens [23]. The second is a collection of 40K Egyptian tweets [24]. This is the Egyptian tweet corpus in Arabic. This corpus has 40.000 tweets, 20.000 of which are positive and 20.000 of which are negative. Additionally, the tweets obtained covered a wide range of topics commonly addressed on Twitter. The third dataset named Large-Scale Arabic Book Reviews (LABR), was obtained via the Internet [25]. It has 16,448 rows of book reviews labeled as positive (1) or negative (0) (see Table1).
Table 1. Datasets
Datasets |
Number of reviews |
40k Egyptian Tweet |
40000 Tweets |
BOOK REVIEW |
16448 Tweets |
Experimental settings and parameters
To prepare for model creation, we first trained the GloVe model using the Tensorflow tools. For our experiments, we employed a window size of 3 and 300 embedding dimensions. We also set a minimum frequency of 50 for each word, a learning rate of 0.05, Adam optimizer, and trained the model for 20 epochs on the entire text. These settings allowed us to capture a rich and informative representation of the words in the text data, which we then used to develop our models for the subsequent tasks.
In our study, we employed two highly effective machine learning classifiers-Support Vector Machine (SVM) and Logistic Regression (LR). SVM is a well-established approach widely used in various natural language processing tasks due to its high performance and effectiveness in text classification. LR, on the other hand, belongs to the log-linear family of classifiers and is a probabilistic and discriminative algorithm used for binary classification. Both classifiers are known for their robustness and ability to handle high-dimensional data, making them ideal for the task of sentiment analysis.
In our experimental setup, we utilized the standard parameters for both Support Vector Machine (SVM) and Logistic Regression (LR) classifiers. To ensure the robustness of the results, we employed cross-validation with the classifiers. Additionally, the datasets were randomly partitioned into two subsets, with 80% of the data used for training and the remaining 20% used for testing. This division allowed us to evaluate the performance of the models on unseen data and estimate their ability to generalize to new instances.
Evaluation
We used the confusion matrix as shown in Table 2 and three metrics to evaluate the approaches (Precision, Recall and F1-score).
Table 2. Confusion matrix
|
P (Predicted) |
N(Predicted) |
P(Actual) |
TP |
FP |
N(actual) |
FN |
TN |
Results of GloVe and Bag of Words approaches
In the first experiment labeled Baseline approach, the vector representations created with the GloVe model are used by the two classifiers SVM and LR on the two datasets EGYPTIAN TWEET and BOOK REVIEW. The given results are presented in Table 3 and Table 4, respectively. The last two columns of these tables represent the outcomes of the Bag of Word (BoW) technique applied to the forementioned datasets and classifiers.
Table 3 and Table 4 show that the model based on GloVe representation ameliorate the result compared to BoW technique, where we notice that the Precision is increased to 88% on EGYPTIAN TWEET and 83% on BOOK REVIEWS datasets, respectively.
Table 3. The Performance of GloVe baseline using EGYPTIAN TWEET Dataset compared with BoW approach
|
GloVe SVM |
GloVe LR |
BoW SVM |
BoW LR |
Precision |
0.86 |
0.88 |
0.78 |
0.77 |
Recall |
0.82 |
0.83 |
0.76 |
0.74 |
F1-score |
0.84 |
0.85 |
0.77 |
0.75 |
Table 4. The performance of GloVe baseline using BOOK REVIEW dataset compared with BoW approach
|
GloVe SVM |
GloVe LR |
BoW SVM |
BoW LR |
Precision |
0.82 |
0.83 |
0.72 |
0.74 |
Recall |
0.80 |
0.79 |
0.70 |
0.71 |
F1-score |
0.81 |
0.81 |
0.71 |
0.72 |
Results of the proposed approach
In the second experiment, we tried to enhance the former approach by introducing the Root module. Vectors generated by our new approach are employed to create models by the same classifiers and datasets used in the first experiment. The results are illustrated in Table 5 and Table 6.
Table 5 and Table 6 show that the approach proposed has the highest accuracy, it has improved the efficiency of GloVE by up to 8% in precision on EGYPTIAN TWEET dataset with SVM classifier and by 7% by LR classifier. For BOOK REVIEW dataset, the new approach has enhanced the performance of GloVE by 9% in precision, 1% in recall and 5% in F1 score using SVM classifier, respectively.
Table 5. The performance of the proposed approach using EGYPTIAN TWEET dataset
|
Approach Using SVM |
Approach using LR |
Precision |
0.94 |
0.95 |
Recall |
0.82 |
0.83 |
F1-score |
0.87 |
0.88 |
Table 6. The performance of the proposed approach using BOOK REVIEW dataset
|
Approach Using SVM |
Approach using LR |
Precision |
0.91 |
0.90 |
Recall |
0.81 |
0.81 |
F1-score |
0.86 |
0.85 |
Figure 5. Comparison of the performance of each approach applied on EGYPTIAN TWEET dataset
Figure 6. Comparison of the performance of each approach applied on BOOK REVIEW dataset
Indeed, Figure 5 and Figure 6 indicate that the approach proposed tested in EGYPTIAN TWEET dataset has the highest accuracy with 95%, which is a good result compared with the BoW and baseline approaches.
Among the causes of the performance of the two experiments, there is the compaction of the size of the matrix representation which leads to the reduction of sparsity. Furthermore, we notice that the number of tokens is reduced by the GloVe approach, which reaches 50%. In addition, the combination of BoR and GloVe improves this reduction by 30% on the EGYPTIAN TWEET dataset, as well as the new approach reduced the tokens of the BOOK REVIEW dataset to almost 50%. Therefore, the accuracy is increased and the processing's time is improved.
Table 7. The performance of the baseline word embedding on EGYPTIAN TWEET dataset
|
FastText Using SVM |
FastText Using LR |
Word2vec Using SVM |
Word2vec Using SVM |
Pre |
0.80 |
0.79 |
0.80 |
0.81 |
Rec |
0.78 |
0.81 |
0.79 |
0.80 |
F1-s |
0.79 |
0.80 |
0.79 |
0.80 |
Table 8. The performance of the baseline word embedding on BOOK REVIEW dataset
|
FastText Using SVM |
FastText Using LR |
Word2vec Using SVM |
Word2vec Using SVM |
Pre |
0.83 |
0.82 |
0.79 |
0.80 |
Rec |
0.80 |
0.80 |
0.80 |
0.79 |
F1-s |
0.81 |
0.81 |
0.79 |
0.79 |
In addition to comparing our proposed technique with BoW and GloVe representations, we conducted a comprehensive comparison with several Arabic baseline word embedding approaches, such as Ar-FastText and word2vec. The comparison results are presented in Table 7 and Table 8.
Our experimental results demonstrate that the new technique we proposed outperforms the entire baseline embedding approach. Despite being widely used, was unable to achieve the high accuracy of 90% that our new architecture was able to achieve. These findings confirm the effectiveness of the new approach and validate the importance of the processing techniques used on the dataset fed to the word embedding approach. Specifically, the roots extraction module was found to be crucial in improving the performance of the word embedding approach. This provides further support for the idea that pre-processing the dataset before feeding it to the model can have a significant impact on the overall performance of the system.
In this work, we proposed a new approach to improve the embedding representation of Arabic words for the sentiment analysis purpose on Arabic social media. The main idea is to introduce GloVe and BoR aiming to reduce the vocabulary and higher the density of the matrix representation. The obtained models using this approach improved significantly the results, and enhance the Arabic social media sentiment analysis. There are possible limitations of this new approach in dialect. In the future, we plan to add more texts to enhance the word embedding corpus and focusing in colloquial Arabic.
[1] Zheng, X., Chen, W., Zhou, H., Li, Z., Zhang, T., Yuan, Q. (2022). Emoji-integrated polyseme probabilistic analysis model: Sentiment analysis of short review texts on library service quality. Traitement du Signal, 39(1): 313-322. https://doi.org/10.18280/ts.390133
[2] Alotaibi, A., Rahman, A.U., Alhaza, R., Alkhalifa, W., Alhajjaj, N., Alharthi, A., Abushoumi, D., Alqahtani, M., Alkhulaifi, D. (2022). Spam and sentiment detection in arabic tweets using marbert model. Mathematical Modelling of Engineering Problems, 9(6): 1574-1582. https://doi.org/10.18280/mmep.090617
[3] Yadu, R., Shukla, R. (2022). A hybrid model integrating adaboost approach for sentimental analysis of airline tweets. Revue d'Intelligence Artificielle, 36(4): 519-528. https://doi.org/10.18280/ria.360402
[4] Karsi, R., Zaim, M., El Alami, J. (2021). Leveraging pre-trained contextualized word embeddings to enhance sentiment classification of drug reviews. Revue d'Intelligence Artificielle, 35(4): 307-314. https://doi.org/10.18280/ria.350405
[5] Alharbi, L.M., Qamar, A.M. (2022). Arabic sentiment analysis of eateries' reviews using deep learning. Ingénierie des Systèmes d'Information, 27(3): 503-508. https://doi.org/10.18280/isi.270318
[6] Al-Jarrah, M.A., Al-Jarrah, A., Jarrah, A., AlShurbaji, M., Magableh, S.K., Al-Tamimi, A.K., Bzoor, N., Al-Shamali, M.O. (2022). Accurate reader identification for the arabic holy quran recitations based on an enhanced vq algorithm. Revue d'Intelligence Artificielle, 36(6): 815-823. http://dx.doi.org/10.18280/ria.360601
[7] Rajasekar, D., Robert, L. (2022). Unsupervised word embedding with ensemble deep learning for twitter rumor identification. Revue d'Intelligence Artificielle, 36(5): 769-776. https://doi.org/10.18280/ria.360515
[8] Kumari, G., Sowjanya, A.M. (2022). An integrated single framework for text, image and voice for sentiment mining of social media posts. Revue d'Intelligence Artificielle, 36(3): 381-386. https://doi.org/10.18280/ria.360305
[9] Bousmaha, K.Z., Hamadouche, K., Gourara, I., Hadrich, L.B. (2022). DZ-OPINION: Algerian dialect opinion analysis model with deep learning techniques. Revue d'Intelligence Artificielle, 36(6): 897-903. https://doi.org/10.18280/ria.360610
[10] Mahmoudi, L., Salem, M. (2023). Improving multi-class text classification using balancing techniques. Artificial Intelligence: Theories and Applications. ICAITA 2022. Communications in Computer and Information Science, vol 1769. Springer, Cham. https://doi.org/10.1007/978-3-031-28540-0_21
[11] Babu, K.S., Rao, Y.N. (2023). A study on imbalanced data classification for various applications. Revue d'Intelligence Artificielle, Vol. 37, No. 2, pp. 517-524. https://doi.org/10.18280/ria.370229
[12] Bhamare, B.R., Prabhu, J. (2021). A multilabel classifier for text classification and enhanced bert system. Revue d'Intelligence Artificielle, 35(2): 167-176. https://doi.org/10.18280/ria.350209
[13] Alotaibi, A., Rahman, A.U., Alhaza, R., Alkhalifa, W., Alhajjaj, N., Alharthi, A., Abushoumi, D., Alqahtani, M., Alkhulaifi, D. (2022). Spam and sentiment detection in arabic tweets using marbert model. Mathematical Modelling of Engineering Problems, 9(6): 1574-1582. https://doi.org/10.18280/mmep.090617
[14] Hamdi, A., Shaban, K., Zainal, A. (2018). Clasenti: A class-specific sentiment analysis framework. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 17(4): 1-28. https://doi.org/10.1145/3209885
[15] Ali, B.A.B., Mihi, S., El Bazi, I., Laachfoubi, N. (2020). A recent survey of arabic named entity recognition on social media. Rev. d'Intelligence Artif, 34(2): 125-135. https://doi.org/10.18280/ria.340202
[16] Sethy, A., Patra, P.K., Nayak, S.R. (2022). A hybrid system for handwritten character recognition with high robustness. Traitement du Signal, 39(2): 567-576. https://doi.org/10.18280/ts.390218
[17] Houari, H., Guerti, M. (2020). Study the influence of gender and age in recognition of emotions from algerian dialect speech. Traitement du Signal, 37(3): 413-423. https://doi.org/10.18280/ts.370308
[18] Mahmoudi, L., Salem, M. (2023). BalBERT: A new approach to improving dataset balancing for text classification. Revue d'Intelligence Artificielle, 37(2): 425-431. https://doi.org/10.18280/ria.370219
[19] Killi, C.B.R., Balakrishnan, N., Rao, C.S. (2022). Classification of fake news using deep learning-based GloVE-LSTM model. International Journal of Safety and Security Engineering, 12(5): 631-637. https://doi.org/10.18280/ijsse.120512
[20] Yildirim, M. (2022). Detection of covid-19 fake news in online social networks with the developed cnn-lstm based hybrid model. Review of Computer Engineering Studies, 9(2): 41-48. https://doi.org/10.18280/rces.090201
[21] Bouziane, A., Bouchiha, D., Rebhi, R., Lorenzini, G., Doumi, N., Menni, Y., Ahmad, H. (2021). ARALD: Arabic annotation using linked data. Ingénierie des Systèmes d'Information, 26(2): 143-149. https://doi.org/10.18280/isi.260201
[22] Kanan, T., Sadaqa, O., Almhirat, A., Kanan, E. (2019). Arabic light stemming: A comparative study between p-stemmer, khoja stemmer, and light10 stemmer. In 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS). IEEE, pp. 511-515. https://doi.org/10.1109/SNAMS.2019.8931842
[23] Wikimedia Projects Editors. (2018). Wikimedia database dump of the Arabic Wikipedia on Mar. 01, 2018. Retrieved from https://archive.org/details/arwiki-20180301
[24] Rania, K., Ammar, M. (2019). Corpus on arabic egyptian tweets, Harvard Dataverse. https://doi.org/10.7910/DVN/LBXV9O
[25] Altowayan, A.A., Tao, L. (2016). Word embeddings for Arabic sentiment analysis. In 2016 IEEE International Conference on Big Data, pp. 3820-3825. https://doi.org/10.1109/BigData.2016.7841054