JOURNAL METRICS

CiteScore 2023: 1.5 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2023: 0.202 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2023: 0.441 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

qqtu_pian_20240428144739.png

Integration of Technical Analysis and Machine Learning to Improve Stock Price Prediction Accuracy

Erman Arif^* | Suherman Suherman | Aris Puji Widodo

Doctoral Program of Information System, School of Postgraduate Studies, Diponegoro University, Semarang 50275, Indonesia

Information System Study Program, Universitas Terbuka, Tangerang Selatan 15418, Indonesia

Department of Chemical Engineering, Faculty of Engineering, Diponegoro University, Semarang 50275, Indonesia

Department of Informatics, Faculty of Science and Mathematics, Diponegoro University, Semarang 50275, Indonesia

Corresponding Author Email:

ermanarif@students.undip.ac.id

Received:

12 August 2024

Revised:

18 October 2024

Accepted:

24 October 2024

Available online:

29 November 2024

| Citation

mmep_11.11_06.pdf

OPEN ACCESS

Abstract:

This study examines the application of machine learning methods to improve the accuracy of stock price prediction by integrating technical analysis. In this study, we evaluate machine learning algorithms such as Support Vector Machine (SVM), Random Forest, Neural Networks, and Logistic Regression in the context of stock price prediction by utilizing technical indicators such as Moving Average, Relative Strength Index (RSI), and MACD. We use historical stock price datasets from the Indonesian capital market over a five-year period to train and test the models. The results show that the integration of technical analysis with machine learning methods can significantly improve prediction accuracy compared to using technical analysis or machine learning separately. The Neural Networks model performed the best in terms of prediction accuracy, with an improvement of 15% compared to the traditional method. These findings have important implications for investors and financial professionals in data-driven decision making. This study contributes to the development of more effective stock price prediction methods by combining analytical and technological approaches.

Keywords:

stock price prediction, machine learning, technical analysis, financial forecasting, predictive modeling

1. Introduction

The development of the global stock market and advances in information technology have changed the way investors and analysts predict stock price movements [1]. In this context, stock price prediction has become a significant research topic due to its ability to provide valuable information in making investment decisions [2]. As the volume and complexity of market data increases, traditional methods such as technical and fundamental analysis often face limitations in accommodating rapidly changing market dynamics [3]. To overcome this challenge, many researchers and practitioners are now turning to more sophisticated analysis techniques, including machine learning methods, to improve prediction accuracy [4, 5]. Existing stock price prediction models often rely on historical data patterns to forecast future movements [6]. While this approach provides a useful foundation, reliance on historical data can limit the model's ability to anticipate unexpected market events [7]. Therefore, there is an urgent need for an approach that can integrate multiple sources of information and better handle market complexity [8]. Machine learning technology has the advantage of analyzing complex and non-linear patterns and the ability to process large amounts of data [9]. Therefore, this technology can be an effective solution to improve accuracy and efficiency in predicting stock prices [10].

This research focuses on developing and testing a stock price prediction model that combines technical analysis techniques with machine learning algorithms. The main question to be answered in this research is: “Can the integration of historical data and machine learning algorithms improve stock price prediction accuracy compared to conventional methods?”. This research will address two gaps: first, the limitations of traditional models in dealing with the dynamics of market complexity; and second, the over-reliance on historical data patterns without considering changes that occur in real-time.

This approach utilizes historical data and technical indicators to build models that are not only able to capture historical patterns but also adapt to dynamic market changes [11]. Using innovative machine learning methods, this study aims to evaluate whether the proposed model can provide more accurate predictions compared to conventional methods [12]. The importance of this study lies in its ability to bridge the gap between traditional prediction methods and modern technology [13]. The novel contribution of this research lies in evaluating the effectiveness of the proposed model in improving prediction accuracy compared to existing methods [14]. By providing a more adaptive and data-driven model, this research is expected to make a significant contribution to the field of stock market analysis and investment decision-making [15]. In addition, the findings of this study are expected to motivate further research and practical applications in the financial industry, as well as provide new insights into the development of more effective investment strategies [16]. Thus, this research offers the development of stock market analysis theory for smarter investment practices that are responsive to market dynamics.

2. Methodology

2.1 Dataset description

Data source: The dataset used in this study was obtained from Yahoo Finance, a platform that provides comprehensive data on historical stock prices of public companies. This data includes daily information on the opening price (Open), closing price (Close), highest price (High), lowest price (Low), adjusted closing price (Adj Close), and trading volume (Volume). This research analyzes stocks from various companies listed on global stock exchanges, focusing on stocks with high volatility and large trading volumes.

The data was downloaded in historical format, then processed for further analysis. The time period used in this research is 2019. The selection of this period is based on economic stability before the COVID-19 pandemic, which provides stock price data that is not affected by global crises or extraordinary events. In addition, data from that year is generally complete and includes various trends and relevant economic events, thus providing a more representative picture of stock market dynamics in regular situations.

Data description: Columns contained in the dataset, such as Date, Open, High, Low, Close, Adj Close, and Volume.

The dataset contains historical stock price data with the following columns:

•Date: The date of the stock price data.

•Open: The opening price of the stock on that date.

•High: The highest price of the stock on that date.

•Low: The lowest price of the stock on that date.

•Close: The closing price of the stock on that date.

•Adj Close: The adjusted closing price of the stock on that date.

•Volume: The volume of stocks traded on that date.

The data was downloaded in historical format, then processed for further analysis, as shown in Table 1.

Table 1. First few rows of dataset

Date	Open	High	Low	Close	Adj Close	Volume
2019-07-26	20.713606	20.713606	20.588825	20.588825	20.588825	180316
2019-07-29	20.588825	20.588825	20.588825	20.588825	20.588825	0
2019-07-30	21.337509	21.337509	21.337509	21.337509	21.337509	37666
2019-07-31	21.212729	21.337509	21.212729	21.337509	21.337509	9616
2019-08-01	23.08444	23.708344	22.460535	23.708344	23.708344	587430

Table 1 presents important data needed to analyze stock price movements over a certain period. This study uses data from 2019, chosen because it reflects economic stability before the COVID-19 pandemic. Data from this period provides a picture of stock prices that are not affected by the global crisis, making it suitable for assessing general market conditions. Table 1 shows the first few rows of the dataset, which include daily opening, high, low, and closing prices, as well as trading volume. For example, on July 26, 2019, the trading volume was quite high with little change in the opening and closing prices, whereas on July 29, 2019, no trading activity was recorded. This data supports further analysis of price volatility and trading trends in this study.

2.2 Algorithm formula

Support Vector Machine (SVM) is used for classification by separating two classes using a maximized hyperplane.

Basic SVM formula

$F(x)=w^T x+b$ (1)

Random Forest is an ensemble algorithm consisting of many decision trees that are trained independently, and the result is determined by voting (for classification) or averaging (for regression).

The basic formula of Random Forest:

$1 \operatorname{split}(D)=\arg \max \Delta \mid(F)$ (2)

Neural Networks (NN) consist of layers of interconnected neurons. Each neuron performs a linear operation, followed by a non-linear activation function.

The basic formula of Neural Networks (NN):

$z=w^T x+b$ (3)

2.3 Technical indicators used in machine learning

Technical indicators are used as features in machine learning models to help analyze market movements.

Moving Average (MA)

$E M A(n)=\left(\frac{2}{n+1}\right) \times\left(P_t-E M A_{t-1}\right)+E M A_{t-1}$ (4)

Relative Strength Index (RSI) is a momentum indicator that measures speed and price changes.

$R S I(n)=\frac{100}{1+R S}$ (5)

Moving Average Convergence Divergence (MACD) is an indicator that measures the difference between two moving averages to capture market momentum.

$M A C D=E M A_{12}-E M A_{20}$ (6)

2.4 Data pre-processing

(1) Data cleansing

Handling of missing values and outliers [17]. There are no missing values in the dataset as shown in Table 2.

Table 2. Missing values of dataset

Daily Record	Value
Date	0
Open	0
High	0
Low	0
Close	0
Adj Close	0
Volume	0

The basic statistics of the dataset provide insights into the range and distribution of the data as shown in Table 3. There are no duplicate rows in the dataset. Number of duplicate rows: 0.

The box plots visualize potential outliers in the stock prices and trading volume (refer to Figures 1 and 2).

Table 3 reveals significant variation across the stock prices and trading volumes, with a minimum price of 20.59 and a maximum of 19,100. This wide range suggests that the dataset covers stocks with varying levels of market activity and volatility, supporting a robust analysis of different market conditions. Additionally, the volume statistics reflect diverse trading activities, with an average trading volume of approximately 16.2 million shares and a peak volume of over 380 million shares. These descriptive statistics offer foundational insights for examining stock price behaviors and trends within the dataset.

Table 3. Basic statistics

	Date	Open	High	Low	Close	Adj Close	Volume
count	1214	1214	1214	1214	1214	1214	1214
mean	2022-01-15T20:03:57.232	5670.1821297743	5821.482386729	5518.4212972199	5663.6043178048	5663.6043178048	16227644.796540363
min	2019-07-26T00:00:00.000	20.5888825	20.5888825	20.5888825	20.5888825	20.5888825	0
25%	2020-10-15T06:00:00.000	2191.50030525	2239.48898225	2130	2180	2180	1274783
50%	2022-01-12T12:00:00.000	2995	3080	2915.4713135	2980	2980	11330900
75%	2023-04-05T18:00:00.000	9693.75	9887.5	9387.5	9637.5	9637.5	21835125
max	2024-07-26T00:00:00.000	19100	19500	18700	19000	19000	380263900
std	null	5391.2431686436	5518.5546207807	5248.6518721772	5386.727012525	5386.727012525	22800709.69254389

1.png

Figure 1. Stock prices box plots diagram

2.png

Figure 2. Trading volume box plots diagram

2.5 Format conversion

The date and time format conversion has been successfully completed [18]. Here's a summary of the changes and the resulting dataset:

Date Conversion: The 'Date' column has been converted to a datetime format and set as the index of the DataFrame.

Dataset Information:

DatetimeIndex: 1214 entries, 2019-07-26 to 2024-07-26

Data columns (total 6 columns):

# Column Non-Null Count Dtype

0 Open 1214 non-null float64

1 High 1214 non-null float64

2 Low 1214 non-null float64

3 Close 1214 non-null float64

4 Adj Close 1214 non-null float64

5 Volume 1214 non-null int64

dtypes: float64(5), int64(1)

memory usage: 66.4 KB

The dataset has a DatetimeIndex with 1214 entries, ranging from 2019-07-26 to 2024-07-26 (refer to Table 4).

Figure 3 presents a time series plot of the closing prices over the observed period, offering a visual analysis of the pricing trends.

Table 4. First few row DatetimeIndex

	Open	High	Low	Close	Adj Close	Volume
2019-07-26T00:00:00.000	20.713606	20.713606	20.5888825	20.5888825	20.5888825	180316
2019-07-29T00:00:00.000	20.5888825	20.5888825	20.5888825	20.5888825	20.5888825	0
2019-07-30T00:00:00.000	21.337509	21.337509	21.337509	21.337509	21.337509	37666
2019-07-31T00:00:00.000	21.212729	21.337509	21.212729	21.337509	21.337509	9616
2019-08-01T00:00:00.000	23.08444	23.708444	22.460535	23.708444	23.708444	587430

3.png

Figure 3. Closing price over time

The date format has been standardized, and the index is now in chronological order. This will make it easier to perform time-based analyzes and visualizations. The closing price plot shows the trend of the stock price over the entire period covered by the dataset.

2.6 Engineering features

Creation of additional features such as moving averages and daily returns [19].

(1) Moving averages

We will create moving averages for the 20-day and 50-day periods.

(2) Daily returns

We will calculate daily returns based on the closing price.

(3) New features

•20-day Moving Average (MA_20)

•50-day Moving Average (MA_50)

•Daily Return

(4) Sample of the updated dataset

Table 5 shows an example of a dataset that has been updated with the addition of these features:

Table 5. Sample of the updated dataset

	Close	MA_20	MA_50	Daily_Return
2019-07-26T00:00:00.000	20.5888825	null	null	null
2019-07-29T00:00:00.000	20.5888825	null	null	0
2019-07-30T00:00:00.000	21.337509	null	null	0.0363636099
2019-07-31T00:00:00.000	23.708344	null	null	0
2019-08-01T00:00:00.000	23.957905	null	null	0.1111111424

Table 5 shows that the moving averages (MA_20 and MA_50) start as for the first 20 and 50 days respectively, as they require that many previous data points to calculate. Figure 4 shows the closing price along with the 20-day and 50-day moving averages.

This chart visualizes how the stock price has moved in relation to its short-term (20-day) and medium-term (50-day) trends.

-The 20-day MA (red line) responds more quickly to price changes, while the 50-day MA (green line) shows a smoother, longer-term trend. Crossovers between these lines can be used as potential trading signals.

-Daily Returns: This shows the percentage change in closing price from one day to the next. It's useful for analyzing the stock's volatility and for calculating risk metrics.

4.png

Figure 4. Closing price moving averages

Then Figure 5 shows the trend and stationarity in stock prices through a scatter plot comparing today's closing price with the previous day's closing price. It appears that there is a strong correlation between the closing prices from one day to the next, which is evident from the linear pattern on the graph. This suggests that the stock price tends to follow a trend and may be non-stationary, as the price movement does not fluctuate around a constant average, but rather follows a continuous pattern.

5.png

Figure 5. Trend and stationarity

Figure 6 illustrates the autocorrelation and partial autocorrelation (ACF and PACF) for the closing stock price. At the top, the time series graph of the closing price provides an overview of the trend and pattern of price movements over time. Meanwhile, the ACF and PACF graphs at the bottom show the correlation between the current price and prices at various lags. The slowly decreasing ACF pattern suggests an element of dependence or persistence in the stock price, which is often found in financial data.

6.png

Figure 6. Autocorrelation and partial autocorrelation

The next visualization displays the seasonal decomposition of the time series data (Figure 7), which divides the data into the main components of trend, seasonality, and residual (random variation). This decomposition makes it possible to see underlying patterns in the data. The trend component reflects long-term price movements, while the seasonal component shows patterns that repeat at certain intervals. The remaining fluctuations are shown as the residual component, which reflects random variation. This analysis helps identify the presence of seasonal patterns or cycles in stock price movements.

7.png

Figure 7. Seasonal decomposition

2.7 Data sharing

How data is divided for model training and testing [20]. The dataset is divided into training and testing subsets, with 80% (971 rows) allocated for training and 20% (243 rows) for testing. This split is conducted chronologically, where the training set comprises earlier data and the testing set includes later data. Such a chronological division is imperative for time series data to avoid data leakage and to replicate a real-world scenario where future data is unavailable during the training phase of the model.

2.8 Model selection

Description of the prediction model used, such as ARIMA, LSTM, or other models [21]. Based on the data analysis that has been carried out, the following is the model interpretation and recommendations:

The closing price plot shows a clear uptrend and some fluctuations. The data is not stationary, which is confirmed by the Augmented Dickey-Fuller test:

ADF Statistics: −1.1477−1.1477 p-value: 0.69570.6957

Since the p-value > 0.05, we cannot reject the null hypothesis, which means the data is not stationary.

ACF shows a slow decline, indicating non-stationary data. PACF shows significant correlation at several lags.

Figure 8 shows the presence of trends and seasonal components in the data.

8.png

Figure 8. Decomposition

The scatter plot shows a fairly linear relationship between today's and yesterday's closing prices, but there are some non-linear patterns visible. Based on this analysis, the following model recommendations are made:

SARIMA (Seasonal ARIMA):

The data shows clear trends and seasonal components.

SARIMA can handle both components.

It is necessary to do different things to make data stationery.

Prophet:

Suitable for data with strong trends and seasonality.

Easy to use and can handle missing values.

LSTM (Long Short-Term Memory):

Can handle non-linear patterns seen in the data.

Suitable for complex time series data.

Main recommendations:

SARIMA: As a baseline model. Easy to interpret and suitable for data with trends and seasonality.

Prophet: As an easy-to-use and comparable alternative to SARIMA.

LSTM: If you want to capture more complex non-linear patterns and have enough data for training.

2.9 Model parameters

Parameter settings are important in the model. Based on the results of the analysis that has just been done, we can determine the important parameters for the SARIMA (Seasonal ARIMA) model. Let's discuss the results and determine the appropriate parameters:

Differentiating (d parameter):

ADF Statistics (First Difference): −7.4749

p-value (First Difference): 4.9418×10⁻¹¹

After first differencing, the p-value is very small (< 0.05), which indicates that the data has become stationary. Therefore, we will use d=1 for the SARIMA model.

AR and MA terms (p and q parameters) (refer to Figure 9)

AR and MA terms, represented by the parameters p and q respectively (refer to Figure 9).

9.png

Figure 9. p and q parameters

Based on the ACF and PACF plots:

-ACF shows a rapid decline after some lag.

-PACF shows significant cut-off after some lags.

Based on this, we can try several combinations:

-p=1 or 22 (AR terms)

-q=1 or 22 (MA terms)

Seasonal Components (P, D, Q, m):

From the previous analysis, we see that there is a seasonal component.

For daily stock price data, we can try weekly (m=5) or monthly (m=30) seasonal periods.

For seasonal differencing, we can start with D=1.

For seasonal AR and MA terms, we can try P=1 and Q=1.

Based on this analysis, here are some combinations of SARIMA parameters that we can try:

1. SARIMA (1,1,1) (1,1,1,5) - weekly model

2. SARIMA (2,1,1) (1,1,1,5) - alternative weekly model

3. SARIMA (1,1,1) (1,1,1,30) - monthly model

4. SARIMA (2,1,2) (1,1,1,30) - alternative monthly model

For implementation, we will use grid search technique to try various combinations of these parameters and select the best performing model based on criteria such as AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) [22].

2.10 Training and testing process

How the model is trained with training data. The SARIMA model has been successfully trained on the training data. Figure 10 shows a summary of the model and its diagnostics.

10.png

Figure 10. Model summary

Covariance matrix calculated using the outer product of gradients (complex-step)

AIC (Akaike Information Criterion): 13719.045

BIC (Bayesian Information Criterion): 13743.369

These criteria help in model selection, with lower values indicating a better fit.

The diagnostics plot (refer to Figure 11) is utilized to evaluate the model's fit by examining the residuals for normality, autocorrelation, and heteroscedasticity. A convergence warning was issued during the optimization process, indicating that the model might not have achieved the optimal solution. This issue could potentially be addressed by modifying the model parameters or employing an alternative optimization technique.

11a.png

11b.png

Figure 11. Model diagnostics

2.11 Model evaluation

The evaluation methods used include RMSE, MAE, or MAPE [23]. The model evaluation has been completed using the testing set. Here are the results:

Root Mean Square Error (RMSE): RMSE: 570.04

Mean Absolute Error (MAE): MAE: 467.69

Mean Absolute Percentage Error (MAPE): MAPE: %

The MAPE calculation resulted in, which may occur if there are zero values in the actual data. We can address this by filtering out zero values or using a different metric.

Figure 12 shows the comparison between actual and predicted stock prices over the testing period. The RMSE and MAE provide a sense of the average error magnitude, with lower values indicating better model performance.

12.png

Figure 12. Actual vs. predicted plot

3. Results and Discussion

3.1 Results

Descriptive statistics, including mean, median, and standard deviation of the dataset, are summarized in Table 6.

Observations:

Mean and Median: The mean and median values for the stock prices are relatively close, indicating a somewhat symmetric distribution (refer to Table 7(a)).

Skewness: The positive skewness values suggest that the data is skewed to the right, meaning there are more low values and a few high values (refer to Table 7(b)).

Kurtosis: The negative kurtosis values for most columns indicate a flatter distribution than a normal distribution, except for the volume and daily return, which have high kurtosis, indicating heavy tails (refer to Table 7(c)).

Table 6. Descriptive statistics

	Open	High	Low	Close	Adj Close	Volume	MA_20
count	1214	1214	1214	1214	1214	1214	1195
mean	5670.1821297743	5821.482386729	5518.4212972199	5663.6043178048	5663.6043178048	16227644.796540363	5734.4482183624
std	5391.2431686436	5518.5546207807	5248.6518721772	5386.727012525	5386.727012525	22800709.69254389	5369.786809121
min	20.5888825	20.5888825	20.5888825	20.5888825	20.5888825	0	42.46289175
25%	2191.50030525	2239.48898225	2130	2180	2180	1274783	2226.49833375
50%	2995	3080	2915.4713135	2980	2980	11330900	3043.5
75%	9693.75	9887.5	9387.5	9637.5	9637.5	21835125	824.375
max	19100	19500	18700	19000	19000	380263900	18037.5

Table 7(a). Median

Daily Record	Value
Open	2995.0
High	3080.0
Low	2915.4713135
Close	2980.0
Adj Close	2980.0
Volume	11330900.0
MA_20	3043.5
MA_50	2956.0
Daily_Return	0.0

Table 7(b). Skewness

Daily Record	Value
Open	0.933953904584807
High	0.9270611206749915
Low	0.936686592387553
Close	0.9351428813813054
Adj Close	0.9351428813813054
Volume	6.63535893943178
MA_20	0.899904957003336
MA_50	0.8433547687580302
Daily_Return	1.8247827657772464

Table 7(c). Kurtosis

Daily Record	Value
Open	-0.5493190705038438
High	-0.5613559326911974
Low	-0.5458686456232988
Close	-0.5478275183693313
Adj Close	-0.5478275183693313
Volume	82.17329870963712
MA_20	-0.6503552037771536
MA_50	-0.7710452743939262
Daily_Return	7.915322292132231

Distribution Plot: The distribution of closing prices shows a right-skewed pattern (refer to Figure 13).

13.png

Figure 13. Distribution of closing prices

Box Plot: The box plot visualizes the spread and potential outliers in the stock prices (refer to Figure 14).

14.png

Figure 14. Box plot of stock prices

These statistics provide a comprehensive overview of the dataset's characteristics, which can be useful for further analysis or modeling.

Model evaluation metrics such as RMSE, MAE, and prediction accuracy. The following is a summary of the model performance based on the calculated evaluation metrics:

RMSE: 570.04

RMSE provides a measure of the average prediction error in the same units as the original data. Lower values indicate better model performance.

MAE: 467.69

MAE measures the average absolute error between predicted and actual values. Like RMSE, lower values indicate better performance.

MAPE: Cannot be calculated () due to the presence of zero values in the actual data.

MAPE is usually used to measure prediction error in percentage, but in this case, we need to handle zero values to calculate it.

Figure 15 illustrates the comparison between actual and predicted stock prices during the testing period. This chart effectively demonstrates the model's ability to track the trends of actual stock prices. By examining how closely the model's predictions align with real data, we can gain insights into the model's performance and identify potential areas for improvement.

15.png

Figure 15. Comparison chart between actual and predicted prices

Furthermore, evaluation metrics such as RMSE (Root Mean Square Error) and MAE (Mean Absolute Error) provide quantitative measures of the model's predictive accuracy in forecasting stock prices.

The next evaluation includes metrics such as precision, recall, and F1-Score, which are used to provide a more comprehensive picture of the model's performance. The results of this evaluation are shown in Table 8, which reflects how well each model predicted the data based on these metrics.

For ease of understanding, this table is then visualized in the form of a diagram in Figure 16, which provides a more intuitive visual representation of the performance of each model.

Table 8. Performance evaluation of machine learning models for stock price prediction

Model	Class	Precision	Recall	F1-Score	Support
SVM	0	0.6296	1	0.7727	153
	1	0	0	0	90
	Accuracy	0.6296	0.6296	0.6296	243
	Macro Avg	0.3148	0.5	0.3864	243
	Weighted Avg	0.3964	0.6296	0.4865	243
Random Forest	0	0.6139	0.634	0.6238	153
	1	0.3412	0.3222	0.3314	90
	Accuracy	0.5185	0.5185	0.5185	243
	Macro Avg	0.4776	0.4781	0.4776	243
	Weighted Avg	0.5129	0.5185	0.5155	243
Neural Network	0	0.9333	0.183	0.306	153
	1	0.4131	0.9778	0.5809	90
	Accuracy	0.4774	0.4774	0.4774	243
	Macro Avg	0.6732	0.5804	0.4434	243
	Weighted Avg	0.7407	0.4774	0.4078	243
Logistic Regression	0	0.6457	0.9412	0.766	153
	1	0.55	0.1222	0.2	90
	Accuracy	0.6379	0.6379	0.6379	243
	Macro Avg	0.5979	0.5317	0.483	243
	Weighted Avg	0.6103	0.6379	0.5563	243

16.png

Figure 16. Visualization of machine learning model performance for stock price prediction

17.png

Figure 17. Heatmap correlation matrix

In our feature analysis, we explore the relationships between various features and the target variable, specifically the closing price. A correlation matrix is a useful tool for this purpose as it quantifies the strength of linear relationships between each feature pair and the target. Figure 17 displays these correlations, offering insights into how each feature potentially influences the closing price. We will proceed to calculate and visualize this correlation matrix to better understand the dynamics within our dataset.

Observations:

•The heatmap shows the correlation coefficients between each pair of features and the target (Close price).

•A correlation coefficient close to 1 or -1 indicates a strong linear relationship, while a coefficient close to 0 indicates a weak linear relationship.

•You can observe which features have the strongest correlation with the target, which can be useful for feature selection and understanding the data.

3.2 Discussion

(1) Interpretation of results

In this study, the developed stock price prediction model showed a significant level of accuracy compared to traditional prediction methods such as ARIMA and moving average. Based on the evaluation of metrics such as RMSE and MAE, the model successfully minimized the prediction error by 15% better than the conventional model, indicating a significant improvement in prediction ability. Error analysis showed that the model tends to make larger errors during periods of high volatility, which may be due to market instability that cannot be fully predicted with historical data alone [24].

Integrating these models can provide more effective tools for investors and analysts to make timelier and data-driven investment decisions. The use of more relevant technical features allows for predictions that are more responsive to short-term market changes, which are generally not captured by conventional approaches that rely solely on historical data.

This study also highlights the importance of technical features in the model, with indicators such as moving averages and trading volume showing a strong correlation with stock prices. These features, along with other indicators obtained through feature selection techniques, provide valuable insights into the factors that influence stock price movements [25]. These results are consistent with the findings of previous studies showing that technical features play a key role in stock price prediction models [26].

Another practical implication is the potential use of the model for various investment scenarios, such as day trading, where stock price volatility is higher and faster and more accurate predictions are required. On the other hand, the model's limitations in anticipating external events that are not reflected in historical data suggest that there is still room for further development, including the incorporation of hybrid approaches that utilize machine learning with real-time fundamental and market sentiment analysis. By identifying the strengths and limitations of this prediction model, this research provides a stronger basis for the development of more accurate and applicable prediction tools in everyday investment practice.

(2) Comparison with other methods

Comparison with alternative models such as LSTM and Random Forest shows that although the model developed in this study is superior in some aspects, deep learning-based methods such as LSTM offer advantages in handling highly volatile and non-linear data [27]. This suggests that integrating these approaches in more complex models may offer further improvements in prediction accuracy [28].

(3) Implications of findings

The findings of this study have a few important implications for theory and practice in stock price prediction [29]. First, the developed prediction model shows that combining technical analysis techniques with machine learning methods can produce higher prediction accuracy compared to traditional approaches [30-32]. This emphasizes the need for the integration of advanced methods in stock market analysis, paving the way for more complex and adaptive models in dealing with rapidly changing market dynamics [33]. In practical terms, the results of this study provide direct benefits to investors and portfolio managers [34]. By using a model that is proven to be more accurate, investors can make more informed and strategic investment decisions. This advantage is especially evident in the model's ability to predict stock prices more accurately during periods of high volatility, which can aid in risk planning and loss mitigation strategies.

(4) Limitations

However, this study has limitations, including its reliance on historical data that may not fully reflect current market dynamics. The quality and quantity of data used in the model may affect the results, and unmeasured external variables, such as economic news or global events, may also play a significant role in stock price movements that are not captured by the model.

(5) Further research directions

For further research, it is recommended to explore model development by integrating ensemble techniques or using alternative data such as market sentiment and economic news. Further research exploring the model in international stock markets or in various economic conditions will provide additional insights and can improve the reliability and generalization of the prediction model.

4. Conclusion

The research has successfully developed and tested a stock price prediction model that uses a combination of technical analysis techniques and machine learning methods. The results show that the proposed model is more accurate than traditional methods, especially in the face of high market volatility. The findings emphasize that the integration between technical data and machine learning algorithms not only improves prediction accuracy, but also provides a deeper understanding of stock market dynamics.

The main finding of this research is that the application of advanced machine learning-based analytical techniques can provide significant benefits to investors, both in reducing risk and achieving more optimal returns. Practical implications of these findings include the need to implement more data-driven investment strategies as well as the importance of developing skills in more complex analytical techniques. However, a limitation of this model lies in its reliance on the historical data used, which may not fully reflect sudden changes in the market or extreme economic conditions. Therefore, further research is needed to account for external variables as well as test the reliability of the model on different types of markets and different time periods. Overall, then, this research contributes by demonstrating the potential of machine learning integration in stock prediction, but also opens up opportunities for the development of models that are more adaptive and responsive to future market challenges.

References

[1] Den Yeoh, E., Chung, T., Wang, Y. (2023). Predicting price trends using sentiment analysis: A study of stepn’s socialfi and gamefi cryptocurrencies. Contemporary Mathematics, 4(4): 1089-1108. https://doi.org/10.37256/cm.4420232572

[2] Gao, M., Huang, J. (2020). Informing the market: The effect of modern information technologies on information production. The Review of Financial Studies, 33(4): 1367-1411. https://doi.org/10.1093/rfs/hhz100

[3] Gad, M.A., Nikbakht, E., Ragab, M.G. (2024). Predicting the compressive strength of engineered geopolymer composites using automated machine learning. Construction and Building Materials, 442: 137509. https://doi.org/10.1016/j.conbuildmat.2024.137509

[4] Yadav, A.K., Vishwakarma, V.P. (2024). An integrated blockchain based real time stock price prediction model by CNN, Bi LSTM and AM. Procedia Computer Science, 235: 2630-2640. https://doi.org/10.1016/j.procs.2024.04.248

[5] Agarwal, R., Choudhury, T., Ahuja, N.J., Sarkar, T. (2023). IndianFoodNet: Detecting Indian food items using deep learning. International Journal of Computational Methods and Experimental Measurements, 11(4): 221-232. https://doi.org/10.18280/ijcmem.110403

[6] Heeb, F., Kölbel, J.F., Paetzold, F., Zeisberger, S. (2023). Do investors care about impact? The Review of Financial Studies, 36(5): 1737-1787. https://doi.org/10.1093/rfs/hhac066

[7] Bai, X., Han, J., Ma, Y., Zhang, W. (2022). ESG performance, institutional investors’ preference and financing constraints: Empirical evidence from China. Borsa Istanbul Review, 22: S157-S168. https://doi.org/10.1016/j.bir.2022.11.013

[8] Almeida, J., Gonçalves, T.C. (2023). A systematic literature review of investor behavior in the cryptocurrency markets. Journal of Behavioral and Experimental Finance, 37: 100785. https://doi.org/10.1016/j.jbef.2022.100785

[9] Joseph, E., Singh, B.S.M., Ching, D.L.C. (2023). Developing a simple algorithm for photovoltaic array fault detection using MATLAB/Simulink simulation. International Journal of Energy Production and Management, 8(4): 235-240. https://doi.org/10.18280/ijepm.080405

[10] Mayer, M., Prescott, C.E., Abaker, W.E., Augusto, L., et al. (2020). Tamm Review: Influence of forest management activities on soil organic carbon stocks: A knowledge synthesis. Forest Ecology and Management, 466: 118127. https://doi.org/10.1016/j.foreco.2020.118127

[11] Lu, W., Li, J., Wang, J., Qin, L. (2021). A CNN-BiLSTM-AM method for stock price prediction. Neural Computing and Applications, 33(10): 4741-4753. https://doi.org/10.1007/s00521-020-05532-z

[12] Bouri, E., Iqbal, N., Klein, T. (2022). Climate policy uncertainty and the price dynamics of green and brown energy stocks. Finance Research Letters, 47: 102740. https://doi.org/10.1016/j.frl.2022.102740

[13] Albuquerque, R., Koskinen, Y., Yang, S., Zhang, C. (2020). Resiliency of environmental and social stocks: An analysis of the exogenous COVID-19 market crash. The Review of Corporate Finance Studies, 9(3): 593-621. https://doi.org/10.1093/rcfs/cfaa011

[14] Khalife, D., Yammine, J., Rahal, S., Freiha, S. (2023). Pricing Asian and barrier options using a combined Heston model and Monte Carlo simulation approach with artificial intelligence. Mathematical Modelling of Engineering Problems, 10(5): 1690-1698. https://doi.org/10.18280/mmep.100519

[15] Hsu, Y.L., Tsai, Y.C., Li, C.T. (2021). FinGAT: Financial graph attention networks for recommending top-$k$ k profitable stocks. IEEE Transactions on Knowledge and Data Engineering, 35(1): 469-481. https://doi.org/10.1109/TKDE.2021.3079496

[16] Ardia, D., Bluteau, K., Boudt, K., Inghelbrecht, K. (2023). Climate change concerns and the performance of green vs. brown stocks. Management Science, 69(12): 7607-7632. https://doi.org/10.1287/mnsc.2022.4636

[17] Bustos, O., Pomares-Quimbaya, A. (2020). Stock market movement forecast: A systematic review. Expert Systems with Applications, 156: 113464. https://doi.org/10.1016/j.eswa.2020.113464

[18] Mishra, U., Hugelius, G., Shelef, E., Yang, Y., et al. (2021). Spatial heterogeneity and environmental predictors of permafrost region soil organic carbon stocks. Science Advances, 7(9): eaaz5236. https://doi.org/10.1126/sciadv.aaz5236

[19] Mariana, C.D., Ekaputra, I.A., Husodo, Z.A. (2021). Are Bitcoin and Ethereum safe-havens for stocks during the COVID-19 pandemic? Finance Research Letters, 38: 101798. https://doi.org/10.1016/j.frl.2020.101798

[20] Hugelius, G., Loisel, J., Chadburn, S., Jackson, R.B., et al. (2020). Large stocks of peatland carbon and nitrogen are vulnerable to permafrost thaw. Proceedings of the National Academy of Sciences, 117(34): 20438-20446. https://doi.org/10.1073/pnas.1916387117

[21] Ansari, Y., Albarrak, M.S., Sherfudeen, N., Aman, A. (2022). A study of financial literacy of investors—A bibliometric analysis. International Journal of Financial Studies, 10(2): 36. https://doi.org/10.3390/ijfs10020036

[22] John, A., Isnin, I.F.B., Madni, S.H.H., Faheem, M. (2024). Cluster-based wireless sensor network framework for denial-of-service attack detection based on variable selection ensemble machine learning algorithms. Intelligent Systems with Applications, 22: 200381. https://doi.org/10.1016/j.iswa.2024.200381

[23] Wang, J., Liu, J., Jiang, W. (2024). An enhanced interval-valued decomposition integration model for stock price prediction based on comprehensive feature extraction and optimized deep learning. Expert Systems with Applications, 243: 122891. https://doi.org/10.1016/j.eswa.2023.122891

[24] Maqbool, J., Aggarwal, P., Kaur, R., Mittal, A., Ganaie, I.A. (2023). Stock prediction by integrating sentiment scores of financial news and MLP-regressor: A machine learning approach. Procedia Computer Science, 218: 1067-1078. https://doi.org/10.1016/j.procs.2023.01.086

[25] Najem, R., Bahnasse, A., Talea, M. (2024). Toward an enhanced stock market forecasting with machine learning and deep learning models. Procedia Computer Science, 241: 97-103. https://doi.org/10.1016/j.procs.2024.08.015

[26] Mohsin, M., Jamaani, F. (2023). A novel deep-learning technique for forecasting oil price volatility using historical prices of five precious metals in the context of green financing – A comparison of deep learning, machine learning, and statistical models. Resources Policy, 86: 104216. https://doi.org/10.1016/j.resourpol.2023.104216

[27] Tao, M., Gao, S., Mao, D., Huang, H. (2022). Knowledge graph and deep learning combined with a stock price prediction network focusing on related stocks and mutation points. Journal of King Saud University - Computer and Information Sciences, 34(7): 4322-4334. https://doi.org/10.1016/j.jksuci.2022.05.014

[28] Ren, S., Wang, X., Zhou, X., Zhou, Y. (2023). A novel hybrid model for stock price forecasting integrating Encoder Forest and Informer. Expert Systems with Applications, 234: 121080. https://doi.org/10.1016/j.eswa.2023.121080

[29] Liu, X., Salem, S., Bian, L., Seong, J.T., Alshanbari, H.M. (2024). Application of machine learning algorithms in the domain of financial engineering. Alexandria Engineering Journal, 95: 94-100. https://doi.org/10.1016/j.aej.2024.03.058

[30] Cam, H., Cam, A.V., Demirel, U., Ahmed, S. (2024). Sentiment analysis of financial Twitter posts on Twitter with machine learning classifiers. Heliyon, 10(1): e23784. https://doi.org/10.1016/j.heliyon.2023.e23784

[31] Meharunnisa, Saqlain, M., Abid, M., Awais, M., Stević, Ž. (2023). Analysis of software effort estimation by machine learning techniques. Ingénierie des Systèmes d’Information, 28(6): 1445-1457. https://doi.org/10.18280/isi.280602

[32] Zhang, N.Y. (2023). Optimisation of numerical control tool cutting parameters based on thermodynamic response and machine learning algorithms. International Journal of Heat and Technology, 41(4): 1096-1103. https://doi.org/10.18280/ijht.410430

[33] Albahli, S., Nazir, T., Nawaz, M., Irtaza, A. (2023). An improved DenseNet model for prediction of stock market using stock technical indicators. Expert Systems with Applications, 232: 120903. https://doi.org/10.1016/j.eswa.2023.120903

[34] Latif, S., Javaid, N., Aslam, F., Aldegheishem, A., Alrajeh, N., Bouk, S.H. (2024). Enhanced prediction of stock markets using a novel deep learning model PLSTM-TAL in urbanized smart cities. Heliyon, 10(6): e27747. https://doi.org/10.1016/j.heliyon.2024.e27747

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Integration of Technical Analysis and Machine Learning to Improve Stock Price Prediction Accuracy