JOURNAL METRICS

CiteScore 2023: 1.1 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2023: 0.16 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2023: 0.571 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

qqtu_pian_20240428144739.png

Identification of Misleading Reviews from Textual Content Using Feature Structure with Machine Learning Model

Md Sirajul Huque^* | V. Kiran Kumar

Department of Computer Science and Technology, Dravidian University, Kuppam 517426, India

Corresponding Author Email:

sirajulhuque1985@gmail.com

Received:

8 September 2024

Revised:

30 November 2024

Accepted:

7 December 2024

Available online:

27 December 2024

| Citation

ijcmem_12.04_06.pdf

OPEN ACCESS

Abstract:

The misleading reviews posted on shopping websites and other media platforms sway the opinions and decisions of different customers. On the other hand, dishonest reviewers will make an effort to mimic the writing style of legitimate reviews. There is no guarantee that these text-feature-based approaches will work anytime soon. In addition, the likelihood of an imbalanced category distribution in practice limits detection performance. This paper proposes a fraudulent review detection system that uses ensemble feature selection and multidimensional feature creation to overcome these limitations. Our idea builds three-dimensional characteristics, which include text, reviewer behaviour, and misleading scores. Furthermore, a data resampling approach combines Random Sampling and oversampling techniques to mitigate the effects of an imbalanced distribution of categories. In addition, we combine the outcomes of several feature selection methods that focus on information gain, XGBoost feature importance, and the Chi-square test. On various text datasets, the proposed technique demonstrates exemplary performance in fraudulent review identification according to the experimental findings using feature selection methods, resampling methods, classification, etc. Our technique outperforms existing sophisticated methods when faced with low-quality text or an imbalanced dataset.

Keywords:

social networks, feature selection, multidimensional, classification, evaluation metrics

1. Introduction

Reviews on social media have risen in tandem with the popularity of online shopping and other forms of social media due to the constant improvement of network technology. However, some users submit inadequate and misleading evaluations to achieve their goals. In contrast, others offer excellent personal ratings because all the socializing and trading networks have relatively lax limits on customer reviews. The opinions and understanding of other users will be shaped by these reviews posted on social media. Furthermore, future buyers' decisions and actions are influenced mainly by evaluations and feedback on goods. In addition to severely violating users' rights and interests, misleading evaluations impede the positive and sustainable growth of social media and e-commerce [1]. It is still a worthwhile and challenging endeavour to detect fraudulent reviews.

The impact of misleading evaluations on customers' decisions and companies' reputations was initially highlighted in the study [2]. Because misleading reviews require context connections to be detected, it is tough for humans to identify them. Most approaches handle it using supervised learning techniques as it is a classification problem. The review content was analyzed using linguistic characteristics [3]. The impact of building a single feature data set is restricted in today's complicated internet buying market, where dishonest reviewers intentionally mimic or even replicate honest reviews, making the deceitful writings seem similar to authentic texts. The study [4] proposed that review behaviour traits and text elements play a significant role in determining review authenticity and can help identify fraudulent reviews. Also, training and predicting with a model becomes much more challenging if there is an imbalance in the quantity of honest and dishonest data.

Thank you very much for your insightful comment. The novel contributions of our work are considered as: We evaluated text features and reviewer behaviour features from the data set using sampling unbalanced data approaches. The existing work has faced difficulty for model training and prediction. The key differentiating factors of this paper are as follows. This paper compares it to five other sophisticated techniques to highlight its benefits. We employ certain cross-validation procedures to guarantee the precision of the findings. Using feature generation and selection, this paper presents a fraudulent review detection approach to tackle the aforementioned issues. After completing the fraudulent review identification task, it accepts the real scenario with less quality text data and with imbalanced type distribution.

The significant contributions of this work are mentioned below.

To improve the feature representation, we use anomaly detection methods like principal component analysis (PCA) to produce anomaly scores for each sample. Moreover, these features are the usual text and reviewer behaviour features.
The imbalanced data can be resampled using the Random Under-Sampling (RUS) and Borderline-SOMTE algorithms. In terms of sampling performance, this hybrid sampling strategy is superior.
The XGBoost feature importance approach, the Information gain method, and the Chi-square test are used to pick three candidate feature sets, respectively. Ensemble selection efficiently prevents feature severance and exact model, leading to the final collection of features.

The outline of this paper is summarized as follows. Section 2 explains different methodologies and existing experimental analyses by different researchers. Section 3 describes the approaches that assist in making a framework or model of the proposed work. Section 4 demonstrates the method's efficacy by employing various experimental comparisons, and Section 5 summarizes the results.

2. Related Work

False review detection was considered a two-class challenge [5]. Deep learning or machine learning is often used to solve the two-class deception detection issue. Methods for detecting false reviews may be categorized as content-based, behaviour-feature-based, or hybrid.

2.1 Illusive review noting method

Review detection utilizing a stylistic (lexical and syntactic) function regarding linguistic characteristics was developed [6]. Based on their analysis, Yuan et al. [7] found that combining lexical and syntactic characteristics yielded better results than using any feature alone. Zhang et al. [8] employed contextual information disparity using a deep learning approach for text feature identification. Using Word2vec on the Google News corpus, and Jain et al. [9] obtained the pretrained word-embedding vector, and they then presented many deep neural networks (DNN) based methods for detecting misleading information. The machine learning strategy [10] suggested using distinct features and sentiment orientation. Combining the conventional word bag with word context and customer sentiment, the various deep learning model is suggested [11] for detecting misleading reviews.

2.2 False review analysis

The behaviour characteristic is analyzed [12] with various performance methods. When looking for suspicious reviewer behaviour, the utilizing unanticipated variation approach [13] recommended the usual behavior as a guideline to look for it. Lim et al. [14] suggested a behavioural approach and a system to monitor reviewers' actions to identify reviewers who attempt to make misleading reviews. To identify misleading reviews, a relative subject model relies on individual group merchants, picks up elements of individual behavior, and combines it with features of evaluator group behavior.

2.3 Machine learning-based feature analysis

Different feature analyses have been done for classification using traditional or hybrid machine learning methods. Feature data analysis with various data sets is considered in studies [15, 16]. Privacy-preserving data analysis has been developed using machine learning methods as in studies [10, 17].

2.4 Aggregated content-based approaches

Noekhah et al. [15] suggested a multi-iterative network-based methodology to identify fraudulent reviews. In their work, Yang et al. [16] utilized a CNN approach to identify fraudulent reviews after extracting elements related to the content and the reviewer's behaviour. The combined characteristics are based on content, with those based on behavior to identify fraudulent reviews. Before discussing several CNN-based methods of combining text and non-text characteristics, Javed et al. [17] demonstrated the efficacy of behavior features.

2.5 An unbalanced category distribution dataset-based method

Most researchers resampled their data to achieve a fairer distribution of categories [17]. A false review identification approach based on comment time characteristics was also proposed [18]. The support vector machine (SVM) model for identifying misleading reviews was suggested by Zhu et al. [19]. Bhuyan et al. [20-22] have used machine and deep learning methods for feature selection and classification performance.

This paper integrates the text feature with the behaviour feature, building on earlier concepts of ways to detect misleading reviews. Anomaly detection, data resampling, and a misleading scoring feature are implemented to address the effects of imbalance. In the event of a dataset with an imbalanced distribution of categories, it can better handle categorizing misleading reviews.

3. Framework for Illusive Review Model

To identify fraudulent reviews, this paper employs the technique of ensemble feature selection and multidimensional feature building. Simple data processing is considered for certain data levels with the headword probability shown in Figure 1. Figure 2 depicts the various procedures for the proposed framework based on multidimensional feature construction. We have considered the extended work of study [23] using additional methodological analysis and experiments with further datasets. We also compare our evaluation with existing methods of assessment.

3.1 Building features

In this part, we considered the preprocessing of text data and created structure text features from the text dataset as follows:

(a) Preprocessing Text: Preprocessing data is essential for accurate and successful feature extraction from the text. Standard text preprocessing steps include changing case, removing punctuation and stop words, correcting spelling and grammar mistakes, and fine-grained text segmentation. This paper uses Python's Natural Language Toolkit (NLTK) for preprocessing tasks. A text corpus is created for use in subsequent experiments after text preparation. The findings of the food dataset's preprocessing of reviews are displayed in Table 1.

(b) Text Feature Creation: Since computers aren't natural word processors, we must employ feature creation methods to convert English words into word vectors that computers can understand. The first step is to extract semantic information from word pairings using the Bigram algorithm.

It is possible to record the details of front and back word pairs using the N-gram [5] model (a tiny unit containing the number of words). This form of the N-gram is known as a Bigram when n = 2. With the review “I like Biryani in dinner” as an example, following word segmentation, stop word filtering, and Bigram processing, the sentence is transformed into {I, (I, like), like, (like, Biryani), Biryani, (Biryani, dinner)} which separates the formerly vague “I like Biryani in dinner” from “in dinner I like Biryani”.

1.png

Figure 1. Processing of dataset

2.jpg

Figure 2. Illusive model for multidimensional feature selection approach

Table 1. Preprocessing text

Original Text	Preprocessing
Bush used to have a white chili bean, which made this recipe super simple. I have written to them and asked them to please, bring them back	used to have a white chili bean and made this recipe super simple. have written to them and asked them to please, bring them back

During training the word, we use the Word2vec method. The three main layers of Word2vec [24] are the input, hidden, and output layers of a neural network model. The exact-dimensional word vector is trained from a high-dimensional One-Hot word vector used as input. The bag-of-words (BOW) models are present in Word2vec. The skip-gram concept is utilized in this work.

Figures 3 and 4 show that a pair-wise set is considered for each slide. With "like" as the central word in Figure 4, we considered the probability of two terms based on prior and posterior generation.

$P=P(I,(I, like ),( like, Biryani), Biryani|like )$ (1)

3.png

Figure 3. Skip-gram model for complete sentences

4.png

Figure 4. Skip-gram model for partial sentence

We considered the probability for self-governing generated items or words: (a) head word (w_c) and its Hot word vector (v_c) (b) front and back words w_b and its Hot word vector u_b. Now, we determined as per two different types of words as follows:

$P\left(w_b \mid w_c\right)=\frac{\exp \left(u_b^T v_c\right)}{\sum_{i=0}^n \exp \left(u_i^T v_c\right)}$ (2)

Let T be the length of the review text sequence. Next, with the sliding window j as input, we use the logarithm minimization loss function to update the model's parameters during training. Then, we can obtain the gradient of the headword vector v_c by using a differential calculation as follows.

$\frac{\partial \log P\left(w_b \mid w_c\right)}{\partial v_c}=u_b-\sum_{j=0}^n P\left(w_j \mid w_c\right) u_j$ (3)

The cyclic training of the Word2vec word vector follows Table 1.

TF-IDF is a statistical index that measures the relevance of phrases in the text library [25], composed of the TF and the IDF values. Assuming that the term frequency (TF) is defined as the ratio of the frequency T_w of entry w and overall entries N from the same class is below.

$\mathrm{TF}=\frac{T_w}{N}$ (4)

It is possible to define IDF as the logarithm of C (quantity of documents) divided by D_w (entire quantity of documents) is as follows.

$\mathrm{IDF}=\log \frac{C}{D_w}$ (5)

In last, the TF-IDF value is determined as:

$\mathrm{TF}-\mathrm{IDF}=\mathrm{TF} * \mathrm{IDF}$ (6)

Following the procedures outlined above yields the Text Feature (6). Considering aspects like word significance and word pairings, this technique improves the efficiency of text feature extraction.

(c) The construction of features based on reviewer behavior: dishonest reviewers mimic or even replicate genuine reviews. Relying only on text features makes it difficult to discern between honest and misleading evaluations. We show the paper's results using basic review information to develop 12 aspects of reviewer behavior such as (i) Review length, (ii) Digits, (iii) Number of Adjectives/Nouns/Adverbs, (iv) Reviewer Activity, (v) Emotional Index Review, (vi) Rating Consistency in Reviews, (vii) The Level of Discordance in the Reviewer's Rating, (viii) Reviewer Total Number of Reviews, (ix) Reviewers' Extreme Scores, (x) Distinction among Scores.

(d) Misleading Score Feature structure: Misleading reviews make up a small percentage of the total, leading to an imbalance in the sound and negative feedback. Neither the training nor the prediction of classification models is facilitated by it. The majority of current approaches disregard this issue.

An unsupervised approach called isolation forest [26] takes a dataset, runs it through the forest, and then estimates the average data height by watching where the dataset falls on each leaf node. We considered the path length from leaf to root node and made the equation to determine the misleading score S(x) for the sample x as follows:

$C(n)=2 \mathrm{~h}(\mathrm{n}-1)-\left(\frac{2(n-1)}{n}\right)$ (7)

$h(k)=\ln (k)+\varphi$ (8)

The Eq. (8) is determined the height and use for all trees with sample x is denoted as E(h(x)). We may express the misleading score S(x).

$S(x)=2^{\left(-\frac{E(h(x))}{c(n)}\right)}$ (9)

Using principal component analysis (PCA) [27] to identify outliers entails building a covariance matrix with n rows and m columns from the initial data and then eigen-decomposing it. Several eigenvectors are recovered upon decomposition.

The initial data set's covariance matrix is x. Let X be a matrix with n eigenvectors; for each i eigenvector, z_i, the corresponding eigenvalue in this direction is denoted by V_i, and the transpose of X is X^T. Thus, the formula is as follows:

$S(x)=\sum_{i=1}^n \frac{\left|X^T Z_i\right|}{V_i}$ (10)

The function S(x) is defined as the sum of the ratio of above parameters with respect to i, divided by V_i, where i ranges from 1 to n.

If there are more than two dimensions, one density-based anomaly identification approach is the local outlier factor (LOF) [28]. d(x, y) is a distance measure between two points x and y. y is the number k distance from x, as shown by dk(x, y). Then, the formula for the distance that may be reached RD(x,y) as follows:

$R D(x, y)=\max \{d(x, y), d k(x, y)\}$ (11)

Here, $c_k(x)$ is the neighborhood for k distance with sample x. The formula of local reachable density (LRD) is written

$L R D_k=\frac{1}{\left(\frac{\sum_{y \in c_k(x)} R D(x, y)}{c_k(x)}\right)}$ (12)

Further, we considered deceptive score S(x) through LRD, which can be expressed as:

$S(x)=\frac{\sum_{y \in c_k(x)} L R D_k(y)}{c_k(x) \times L R D_k(x)}$ (13)

The K-nearest-neighbors (KNN) [29] choose the category based on distance. With an assumed number of samples (n), we may find the first n positions closest to sample point x in terms of its distance from the origin (x) using the formula. The point's misleading score S(x) may be represented as (n-1)/n for positive samples is i.

$S(x)=\frac{n-i}{n}$ (14)

A neural network is employed by an autoencoder [30] to transform the dimensional data from high to low dimension. For an input sample X=(X₁, X₂,........, X_n) where n is the number of dimensions, a larger inaccuracy indicates a higher likelihood of the sample being an outlier. The output is $X^R=\left(X_1^R, X_2^R \ldots . . X_n^R\right)$ for the following Autoencoder reconstruction. When a mean squared error (MSE) and mean absolute error (MAE) are added together, it forms the misleading score S(x) as follows.

$S(x)=\frac{1}{n} \sum_{i=1}^n\left(X_i-X_i^R\right)^2+\frac{1}{n} \sum_{i=1}^n\left|X_i-X_i^R\right|$ (15)

A particularly effective outlier score in low dimensions is the histogram-based outlier score (HBOS) [31], which employs a density score for the anomalies after dividing each dimension into numerous intervals. In the n-dimensional space, x is the sample size, and H_n(x) represents its height. An abnormally big spread and a high height are indicators of a sample that is out of the ordinary. The misleading score S(x) can be expressed in the following way:

$S(x)=\sum_{n=1}^d \log \left(\frac{1}{{Hn}(x)}\right)$ (16)

Anomaly detection methods should be diverse for more thorough mining of sample anomaly information.

3.2 Resampling data

We present the creation of a hybrid resampling technique, RUS and Borderline-SMOTE, based on the pros and cons of the two approaches above. The first sample is labelled as either a Majority or a Minority Sample. Based on the large sample in the vicinity, the simple sample is categorized as safe, dangerous, or noisy, while the majority sample is a Random Under-Sampling. A potentially dangerous situation is when most samples constitute more than 50% of the close samples. Thus, the aggregate resampling dataset is obtained by combining the findings of under-sampling and over-sampling.

3.3 Features selection

Following the abovementioned procedures will allow retrieving the original set of multidimensional features and the RUS and Borderline-SMOTE sample sets. The feature construction dimensions, however, are excessive, making overfitting and dimension disasters all too common. The following is the formula for calculating the chi-square value as per [32].

$X^2=\sum_{i=1}^n \frac{\left(x_i-E\right)^2}{E}$ (17)

To choose features using the Chi-Square Test, the chi-square value is computed for every feature. The significance of the trait is proportional to its chi-square value. Two algorithms of the study [23] are considered for analyse our proposed model using RUS and Borderline-SMOTE and Ensemble Feature Selection Method.

This paper's feature set is filtered using Information Gain (IG) [33]. Information gain is the change in information entropy between two points in time. Assuming that C=2 for the fraudulent review detection job, we may fix the sample set as s, with S categories. The probability of class i is represented by pi. The conditional entropy is determined on feature X=(x₁, x₂,......, x_i) and takes the average value.

$H(C \mid X)=-\sum_{i=1}^c p_i H\left(C \mid X=x_i\right)$ (18)

Then, the IG is determined as

$I G(X)=H(C)-H(C \mid X)=-\sum_{i=1}^n \log _2 C_i+\sum_{i=1}^n p_i H\left(C \mid X=x_i\right)$ (19)

This approach is frequently highly successful and quite sensitive to the function of features. In the XGBoost setup process, the feature is chosen as the splitting node more frequently for features with higher priority values, indicating that they are more significant. The intersection of the three sets of candidate features yields the final feature. The ensemble technique reduced the original feature set's dimensions, eliminated redundant information, and improved the expressive ability of adequate information, all while selecting a new set of features.

4. Experiments

We have considered different experiments based on the datasets, evaluation tools, and proposed methodologies mentioned below.

4.1 Dataset

In this part, we conduct experiments using the labelled dataset of office and supermarket goods made public on Amazon's website. We considered the dataset components and descriptions of the data fields. We also considered the Recipe Reviews and User Feedback (RRUF) Dataset (UCI machine learning dataset) [34] for food review analysis. This data set contains feature types (such as Real, Categorical, and Integer), 18182 instances, and 15 features that help analyse recipe reviews.

4.2 Experiment analysis

(a) Building Features: Various sets of tests, including single and multidimensional features, were established to confirm the efficacy of the features developed for this paper. The prediction results were achieved after training the XGBoost model with ten-fold cross-validation. Figure 5 displays the categorization results for each set of features. The Bigram, TFIDF, and Word2vec outperform other approaches for extracting text features on both datasets.

Due to the unbalanced category distribution of samples, accuracy may no longer be valid as per the evaluation index. If the classification effect is not good, many positive samples may not be identified. Thus, we used area under curve (AUC), Macro Average Precision and Weighted Average F1-score to evaluate the model's capability.

5a.png

(a) Single dimensional feature

5b.png

(b) Multiple dimensional feature

5c.png

5d.png

(d) Multiple dimensional feature

Figure 5. Classification results of each feature set (a) and (b) for office products data set, and (c) and (d) for grocery products data set

With respect to the office goods dataset, the suggested multidimensional feature-building approach achieves an AUC of 0.83, a Macro Average Precision of 0.87, and a Weighted F1-score of 0.88. These three numbers are found in the grocery goods dataset: 0.85, 0.87, and 0.89. The classification performance of models is often better when trained using multidimensional features. We have considered the detection of fraudulent reviews using different datasets and its performance is also analysed in experimental part of the paper.

(b) Resampling data: The XGBoost model was used following feature creation to investigate the reliability of each resampling technique. When resampling features, we utilized RUS, SMOTE, Borderline-SMOTE, and RUS with Borderline-SMOTE. Figure 6 displays the categorization results for each resampling technique. The classifier performance is worst without resampling, as seen in Figure 6.

6a.png

(a) Office products

6b.png

(b) Grocery products

6c.png

Figure 6. Evaluation metrics performance on resampling method on different dataset as (a), (b) and (c)

The best resampling method is the mix of RUS and Borderline-SMOTE described in this paper. With values of 0.851 for AUC, 0.881 for Macro Average Precision, and 0.891 for Weighted F1-score in the office dataset, and 0.861, 0.885, and 0.901 in the grocery dataset, respectively, demonstrate a robust model.

(c) Feature selection: The efficacy of the group feature selection approach was confirmed following feature generation and data resampling. The XGBoost model was trained using various feature selection strategies to make a comparison. In Figure 7, you can see the classification results for each feature selection approach. The performance of Weighted F1-Score (blue), Micro average precision (cyan), AUC (Dark red) are shown in Figure 7(a) and (b).

7a.png

(a) Office products

7b.png

(b) Grocery products

7c.png

Figure 7. Classification results of each feature selection method

We observed that feature selection may boost the model's performance, keeping it from overfitting and making it run faster. The significance of feature selection increases with data volume, dimension size, and redundancy.

(d) Building the Classifier: we choose the models utilized for classification tasks, like KNN, SVM, logistic regression (LR), and RF. Then, we compare and verify the XGBoost model through tests. Figure 8 displays the outcomes of each model's predictions. The ensemble model outperforms KNN and other standalone classification algorithms. On both datasets, XGBoost outperforms compare to other models regarding various parameters. Adjusting the settings allows XGBoost to identify misleading reviews if the samples' category distribution isn't perfectly balanced.

8a.png

(a) Office products

8b.png

(b) Grocery products

8c.png

Figure 8. Classification results of each model

(e) Comparative Performance: The academic community has many sophisticated and efficient frameworks and approaches to detect fraudulent reviews. This paper compares it to five other refined techniques to highlight its benefits. We employ certain cross-validation procedures to guarantee the precision of the findings.

9a.png

(a) Office products

9b.png

(b) Grocery products

9c.png

Figure 9. Advanced classification model

The experiment compares the strategy suggested in this paper to five sophisticated algorithms that can detect fraudulent reviews using office and supermarket datasets. We included three experimental groups to replicate further the impact of the first three approaches on the initial dataset. We resampled the data using the RUS technique, which performed best in our earlier trials. Following a balanced distribution of categories in the data, these three strategies were used to create the models.

Figure 9 shows that the reviewer behavior feature technique outperforms CAA and IFD on the office goods dataset. However, it can still be helpful even when no labelled data is labelled because it is an unsupervised approach. The suggested technique outperforms all others with an AUC of 0.852 and a Weighted F1-score of 0.891.

Like the office dataset, the grocery goods dataset uses the same display style. The suggested technique has the most excellent AUC, Macro Average Precision, and Weighted F1-score of any method tested. These metrics stand at 0.851, 0.88, and 0.89, respectively. This approach, in contrast, is versatile and advantageous since it considers a wide range of scenarios.

5. Conclusions

We have considered false reviews that proliferated online in the last several years. Public opinion views on social media platforms have taken a significant hit due to the increasing number of users who employ the services to publish false evaluations. Deceptive evaluations are becoming increasingly common in online buying, with some business people using them to smear others or boost their own image. This is harmful to customers' rights and interests. Maintaining the healthy growth of online commerce greatly depends on detecting fake reviews. Using multidimensional feature generation and ensemble feature selection, our work presents a new approach to recognizing misleading reviews. This method is more complicated than others due to the construction of multidimensional features and data resampling; however, it is better at dealing with low-quality text, has better accuracy, and is appropriate for situations where the sample categories are imbalanced. In experiments, avoiding fake evaluations from real ones is getting more complex. Instead, it emphasizes the development of multidimensional features to boost accuracy. Future studies will focus on improving the combination of text features with other features, using an advanced CNN model for evaluating various text datasets. Further, future research will focus on improving the combination of text features with additional features, using depth features to fraudulent review detection, and developing large-scale pre-trained models.

References

[1] Ding, Y., Zhang, W., Zhou, X., Liao, Q., Luo, Q., Ni, L.M. (2020). FraudTrip: Taxi fraudulent trip detection from corresponding trajectories. IEEE Internet of Things Journal, 8(16): 12505-12517. https://doi.org/10.1109/JIOT.2020.3019398

[2] Jindal, N., Liu, B. (2007). Review spam detection. In Proceedings of the 16th International Conference on World Wide Web, Canada, pp. 1189-1190. https://doi.org/10.1145/1242572.1242759

[3] Li, H., Liu, B., Mukherjee, A., Shao, J. (2014). Spotting fake reviews using positive-unlabeled learning. Computación y Sistemas, 18(3): 467-475. https://doi.org/10.13053/CyS-18-3-2035

[4] Ong, T., Mannino, M., Gregg, D. (2014). Linguistic characteristics of shill reviews. Electronic Commerce Research and Applications, 13(2): 69-78. https://doi.org/10.1016/j.elerap.2013.10.002

[5] Ott, M., Choi, Y., Cardie, C., Hancock, J.T. (2011). Finding deceptive opinion spam by any stretch of the imagination. arXiv preprint arXiv:1107.4557. https://doi.org/10.48550/arXiv.1107.4557

[6] Shojaee, S., Murad, M.A.A., Azman, A.B., Sharef, N.M., Nadali, S. (2013). Detecting deceptive reviews using lexical and syntactic features. In 2013 13th International Conference on Intellient Systems Design and Applications, Salangor, Malaysia, pp. 53-58. https://doi.org/10.1109/ISDA.2013.6920707

[7] Yuan, L., Zhu, Z., Ren, T. (2021). Review of false comment recognition. Computer Science, 48(1): 111-118.

[8] Zhang, W., Wang, Q., Li, X., Yoshida, T., Li, J. (2019). DCWord: A novel deep learning approach to deceptive review identification by word vectors. Journal of Systems Science and Systems Engineering, 28: 731-746. https://doi.org/10.1007/s11518-019-5438-4

[9] Jain, N., Kumar, A., Singh, S., Singh, C., Tripathi, S. (2019). Deceptive reviews detection using deep learning techniques. In Natural Language Processing and Information Systems: 24th International Conference on Applications of Natural Language to Information Systems, NLDB 2019, Salford, UK, pp. 79-91. https://doi.org/10.1007/978-3-030-23281-8_7

[10] Alwan, E.H., Al-Qurabat, A.K.M. (2024). Optimizing program efficiency by predicting loop unroll factors using ensemble learning. International Journal of Computational Methods and Experimental Measurements, 12(3): 281-287. https://doi.org/10.18280/ijcmem.120308

[11] Hajek, P., Barushka, A., Munk, M. (2020). Fake consumer review detection using deep neural networks integrating word embeddings and emotion mining. Neural Computing and Applications, 32(23): 17259-17274. https://doi.org/10.1007/s00521-020-04757-2

[12] Jindal, N., Liu, B. (2008). Opinion spam and analysis. In Proceedings of the 2008 International Conference on Web Search and Data Mining, USA, pp. 219-230. https://doi.org/10.1145/1341531.1341560

[13] Jindal, N., Liu, B., Lim, E.P. (2010). Finding unusual review patterns using unexpected rules. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Canada, pp. 1549-1552. https://doi.org/10.1145/1871437.1871669

[14] Lim, E.P., Nguyen, V.A., Jindal, N., Liu, B., Lauw, H.W. (2010). Detecting product review spammers using rating behaviors. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Canada, pp. 939-948. https://doi.org/10.1145/1871437.1871557

[15] Noekhah, S., Salim, N.B., Zakaria, N.H. (2018). A novel model for opinion spam detection based on multi-iteration network structure. Advanced Science Letters, 24(2): 1437-1442. https://doi.org/10.1166/asl.2018.10765

[16] Yang, C., Li, T., Tan, S., Yang, X. (2020). Spam review detection based on double convolutional neural network. Computer and Digital Engineering, 48(8): 1954-1957.

[17] Javed, M.S., Majeed, H., Mujtaba, H., Beg, M.O. (2021). Fake reviews classification using deep learning ensemble of shallow convolutions. Journal of Computational Social Science, 4: 883-902. https://doi.org/10.1007/s42001-021-00114-y

[18] Liu, W., He, J., Han, S., Cai, F., Yang, Z., Zhu, N. (2019). A method for the detection of fake reviews based on temporal features of reviews and comments. IEEE Engineering Management Review, 47(4): 67-79. https://doi.org/10.1109/EMR.2019.2928964

[19] Zhu, J.H., Munjal, R., Sivaram, A., Paul, S.R., Tian, J., Jolivet, G. (2022). Flow regime detection using gamma-ray-based multiphase flowmeter: A machine learning approach. International Journal of Computational Methods and Experimental Measurements, 10(1): 26-37. https://doi.org/10.2495/CMEM-V10-N1-26-37

[20] Bhuyan, H.K., Ravi, V., Yadav, M.S. (2022). Multi-objective optimization-based privacy in data mining. Cluster Computing, 25(6): 4275-4287. https://doi.org/10.1007/s10586-022-03667-3

[21] Bhuyan, H.K., Saikiran, M., Tripathy, M., Ravi, V. (2023). Wide-ranging approach-based feature selection for classification. Multimedia Tools and Applications, 82(15): 23277-23304. https://doi.org/10.1007/s11042-022-14132-z

[22] Bhuyan, H.K., Ravi, V. (2023). An integrated framework with deep learning for segmentation and classification of cancer disease. International Journal on Artificial Intelligence Tools, 32(2): 2340002. https://doi.org/10.1142/S021821302340002X

[23] Li, S., Zhong, G., Jin, Y., Wu, X., Zhu, P., Wang, Z. (2022). A deceptive reviews detection method based on multidimensional feature construction and ensemble feature selection. IEEE Transactions on Computational Social Systems, 10(1): 153-165. https://doi.org/10.1109/TCSS.2022.3144013

[24] Madisetty, S., Desarkar, M.S. (2018). A neural network-based ensemble approach for spam detection in Twitter. IEEE Transactions on Computational Social Systems, 5(4): 973-984. https://doi.org/10.1109/TCSS.2018.2878852

[25] Yu, H., Ji, Y., Li, Q. (2021). Student sentiment classification model based on GRU neural network and TF-IDF algorithm. Journal of Intelligent & Fuzzy Systems, 40(2): 2301-2311. https://doi.org/10.3233/JIFS-189227

[26] Xiao, C.H., Su, C., Bao, C.X., Li, X. (2018). Anomaly detection in network management system based on isolation forest. In 2018 4th Annual International Conference on Network and Information Systems for Computers (ICNISC), Wuhan, China, pp. 56-60. https://doi.org/10.1109/ICNISC.2018.00019

[27] Huang, Z., Lin, X., Liu, H., Zhang, B., Chen, Y., Tang, Y. (2020). Deep representation learning for location-based recommendation. IEEE Transactions on Computational Social Systems, 7(3): 648-658. https://doi.org/10.1109/TCSS.2020.2974534

[28] Basiri, M.E., Abdar, M., Kabiri, A., Nemati, S., Zhou, X., Allahbakhshi, F., Yen, N.Y. (2019). Improving sentiment polarity detection through target identification. IEEE Transactions on Computational Social Systems, 7(1): 113-128. https://doi.org/10.1109/TCSS.2019.2951326

[29] Tang, Y., Liang, J., Hare, R., Wang, F.Y. (2020). A personalized learning system for parallel intelligent education. IEEE Transactions on Computational Social Systems, 7(2): 352-361. https://doi.org/10.1109/TCSS.2020.2965198

[30] Li, S., Jiang, L., Wu, X., Han, W., Zhao, D., Wang, Z. (2021). A weighted network community detection algorithm based on deep learning. Applied Mathematics and Computation, 401: 126012. https://doi.org/10.1016/j.amc.2021.126012

[31] Sathe, S., Aggarwal, C.C. (2018). Subspace histograms for outlier detection in linear time. Knowledge and Information Systems, 56: 691-715. https://doi.org/10.1007/s10115-017-1148-8

[32] Hong, S.K., Kim, H., Lee, S., Moon, Y.S. (2017). Secure multiparty computation of chi-square test statistics and contingency coefficients. In 2017 IEEE 3rd International Conference on Big Data Security on Cloud (Bigdatasecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), Beijing, China, pp. 53-56. https://doi.org/10.1109/BigDataSecurity.2017.24

[33] Dutta, K., Sharma, M., Sharma, U., Khatri, S.K., Johri, P. (2019). Information gain model for efficient influential node identification in social networks. In 2019 Amity International Conference on Artificial Intelligence (AICAI), pp. 146-150. https://doi.org/10.1109/AICAI.2019.8701344

[34] Dataset. (2023). Recipe reviews and user feedback dataset. https://archive.ics.uci.edu/dataset/911/recipe+reviews+and+user+feedback+dataset.

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Identification of Misleading Reviews from Textual Content Using Feature Structure with Machine Learning Model