JOURNAL METRICS

CiteScore 2023: 2.1 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2023: 0.254 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2023: 0.504 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

1744106249035.jpg

Shallot Crop Harvest Time and Yield Prediction Using Machine Learning Based on Farmers’ Tacit Knowledge in Brebes Regency, Indonesia

Research Center for Agroindustry, National Research and Innovation Agency, Tangerang Selatan 15314, Indonesia

School of Economics and Business, Telkom University, Bandung 40257, Indonesia

Corresponding Author Email:

agun044@brin.go.id

Received:

4 September 2024

Revised:

25 November 2024

Accepted:

6 January 2025

Available online:

31 March 2025

| Citation

ijdne_20.03_21.pdf

OPEN ACCESS

Abstract:

Shallot price fluctuations in Indonesia are caused by a lengthy supply chain and limited production, which makes supply control challenging. This study employed machine learning to forecast shallot yields and harvest times in Brebes Regency, Central Java, one of the major production areas. Data were collected through farmer interviews, which encompassed productivity and farming practices, and analyzed using twelve machine learning algorithms, including Gradient Boosting, AdaBoost, XGB, ElasticNet, and Decision Trees. Model performance was evaluated using MSE, MAE, and R-squared values, with ElasticNet being identified as the most accurate. Harvest time predictions were influenced by plant age and morning temperature, while yield depended on factors such as planted area, bed dimensions, daily temperature range, bulb weight, and phosphorus levels. Farmers' tacit knowledge was also incorporated, improving the model's reliability. The deployment results revealed a 13% deviation between predicted and actual yields, demonstrating reasonable accuracy. However, the error margin for harvest time predictions was 23.5%, reflecting the complexity of environmental and operational factors. The study provides a data-driven framework for understanding shallot productivity and the variables influencing it, and offers insights into improving forecasting models for more effective agricultural planning.

Keywords:

farmer experiences, harvest time, machine learning, productivity, tacit knowledge, yield prediction

1. Introduction

Indonesia has a high demand for shallots, but seasonal production variations cause price fluctuations. The market is asymmetrical, as a significant increase in consumer-level demand directly impacts production centers. Conversely, supply shortages disproportionately impact farmers' prices, driven by high trade and transportation margins resulting from an extended supply chain [1]. Brebes Regency, in Central Java, is a shallot production center that supplies 60% of shallots to the Jakarta market. In Brebes, substantial changes in harvest area and production directly affect the sustainability of shallot farming. Market price information is easily accessible from Jakarta’s market.

Farmers often rely on the Ijon system, a pre-harvest selling arrangement that provides immediate financial relief but can cause them to lose control over their harvest and miss potential price increases. This issue highlights the need for precise yield forecasting to empower farmers in negotiating equitable repayment terms and achieving better financial outcomes. Data analysis and machine learning can help estimate crop yields accurately by integrating farmers’ experience with environmental and production variables [2].

Machine learning-based prediction models play a critical role in effective crop farming, assisting in decisions related to planting, irrigation, fertilization, harvesting, and trading [3]. Among the machine learning approaches, the MARS-ANN hybrid model demonstrates high prediction accuracy by combining ANN's predictive power with MARS's feature selection capabilities. For example, the model has been used effectively to predict wheat, rice, and maize yields based on meteorological and soil data [4]. Similarly, machine vision-based yield monitoring has been employed to create geotagged yield maps for shallot fields, achieving a 76% detection accuracy [5]. In Turkey, onion yield predictions utilized support vector regression and polynomial regression, while in Bangladesh, climatic data was combined with linear regression for shallot yield estimations [6, 7]. Furthermore, the SVM classifier has been used to assess shallot quality with 60% accuracy, highlighting its potential for integration into web or mobile tools [8].

The effectiveness of machine learning-based yield prediction can be evaluated across three dimensions: prediction horizon, scale, and crop type [9]. Predictions are feasible at all vegetation stages, but many studies focus on predictions right before harvest. The best grain forecasts for each model were made before harvest at the start, middle, and end of the growing season [10]. Scale is important because models at each scale serve different purposes, such as plant-scale models aiming to understand factors affecting crop growth, field-scale models assisting in crop management [11], and models at larger scales primarily informing policy-making in agriculture. Klompenburg's "Crop Prediction" model is a comprehensive framework that integrates 21 factors to enhance yield predictions on various scales [12].

Although machine learning has been extensively studied for crop yield prediction, its ability to integrate tacit knowledge, such as farmers’ intuition, experience, and observational skills, has not been thoroughly explored [13]. Tacit knowledge is qualitative and context-specific, often communicated through actions rather than words. Designing systematic yet simple questionnaires that elicit detailed observations from farmers can help encode this knowledge into machine learning processes, enhancing model accuracy and relevance.

This study directly addresses the challenges Indonesian shallot farmers face, such as market dynamics and the Ijon system, by leveraging machine learning for yield and harvest time predictions. Based on the discussion above, this paper aims to: 1) identify the most effective machine learning model for predicting shallot crop yield and harvest time, and 2) determine the critical variables, including tacit knowledge, that influence these predictions.

2. The Comprehensive Theoretical Basis

2.1 Soil conditions and fertilizer

Klompenburg's 'Crop Prediction' model used 21 factors to enhance yield predictions, including soil properties such as pH, moisture, and texture, as well as weather conditions like temperature, rainfall, and humidity [12]. Shallots prefer slightly acidic soil with higher CEC, as it retains nutrients for roots and increases the adsorptive capacity for cations in soil [14]. Crop yields were often predicted using nutrients in the soil, NDVI, and meteorological components [15]. The physical size of the tubers has a strong correlation with the leaf area index, which is an important element in growth and nutrient uptake [16].

2.2 Lighting conditions

Crop productivity can also be impacted by environmental variables like wind speed and sunshine exposure [17]. The development of shallots is also influenced by lighting other than sunlight. Their growth is influenced by two light mechanisms: photoperiod and gamma radiation. It has been demonstrated that controlled exposure to low levels of gamma radiation stimulates bulb development in shallots by inducing hormonal changes that expand bulb size and potentially have an impact on yield [18]. On the other hand, the amount of light exposure each day that impacts the shallot growth cycle is known as photoperiod. The best period of daily light exposure is when around 70% of the sunshine exposure occurs during the day [18].

2.3 Watering needs

Shallot plants require frequent watering in the beginning, followed by daily irrigation during growth, and then less watering for bulb formation [19, 20]. Shallots can use up to 81.17% of the available water for evapotranspiration [21]. Effective water management also depends on ridge size, width, and row layout to avoid waterlogging, which can lead to bulb rot and other challenges [22].

2.4 Cultivation strategies

The Klompenburg model also considers agronomic practices like fertilizer application, irrigation methods, planting density, crop characteristics like variety/genotype, growth stage timing, pest and disease incidence, and disease outbreaks [12]. Planting strategies such as increasing planting density, variety selection, planting distance, and bed size may contribute to higher shallot production. Increasing planting density must be carefully managed with a suitable planting distance and supported by other agronomic methods so that growth does not become restricted and harvest outcomes are maximized [23]. Excessive density will cause shallot plants to compete with one another for nutrients, water, and sunshine, which will stunt their growth and cause their bulbs to shrink. Insufficient air circulation will also increase the plants' vulnerability to disease attacks [24].

2.5 Farmer knowledge

Farmers' tacit knowledge, derived from years of experience and local adaptations, is invaluable for tailoring practices to specific conditions. Farmers select shallot varieties depending on market demand, pest and disease resistance, and production levels. The Bima Brebes shallot variety has huge bulbs, disease resistance, and a strong flavor [25]. However, in Tapin Regency, many farmers have ceased planting shallots due to the adverse effects of climate change, including stagnant water, fruit, and root rot, and decreased harvest and sale prices [26]. Farmers' understanding of optimal harvest timing, typically at 55-60 days post-flowering, and their ability to adapt practices contribute significantly to crop success [27]. Moreover, their knowledge helps maintain family food security and achieve high farming success rates, as seen in Malumbi Village, where motivation and competence correlate strongly with outcomes [28]. This tacit knowledge becomes especially critical when farmers face challenges such as low market prices. For instance, in Brebes, the lower price limit for shallots is IDR 13,730.49, with a fluctuation coefficient of 0.20, indicating a low-risk scenario [29].

3. Method

3.1 Experimental site

The study was conducted in Brebes, in the Central Java province of Indonesia (7°3′0″ S, 108°54′0″ E). This area has a tropical climate with temperatures ranging from 24℃ to 32℃ within the two months, with the lowest temperatures from June to August. The average annual rainfall is around 1200-2000 mm/year. In Brebes Regency, rain can occur in any month, although seasonal rainfall typically begins at the end of October and lasts until May. The probability of rain in a month is more than 80% in January and February.

3.2 Data collection

Data were collected via farmer interviews using questionnaires with the assistance of Brebes district extension personnel in nine sub-districts within the Brebes district, specifically Brebes, Wanasari, Ketanggungan, Larangan, Bulakamba, Kersana, Jatibarang, Bantarkawung, and Bantarharjo. These sub-districts were selected as the main shallot production areas. The diversity in agroclimatic conditions, soil properties, and farming practices across these areas ensures a comprehensive dataset for modeling shallot yield and harvest time.

The dataset gathered from Brebes' extension staff totaled 368, with the condition that the shallot plants were 45 days old when the samples were collected. This age was chosen as it is a critical growth phase where vegetative development and early bulb formation occur, allowing for accurate observation of key yield predictors.

Three samples were taken from each farmer's field to ensure representative and reliable data. This approach minimizes the impact of within-field variability caused by uneven soil properties, microclimatic differences, or irrigation patterns, thereby improving the robustness of the data. The sample size and distribution were designed to capture sufficient variability while maintaining feasibility for field data collection.

Tools such as rulers, scales, and vernier calipers are used to measure growth parameters. Information on the use of fertilizer in shallot plants was recorded. We also used smartphone weather applications to obtain climate data, including temperature, humidity, and precipitation. Agricultural extension conducts data gathering to minimize measurement mistakes. Farmers were then interviewed again after thirty days following the first interview to collect data on the harvest day and yields. All data was collected and stored in Microsoft Excel. They also take note of each field's soil type, texture, and pH. Farmers were also asked about how much water is consumed and how to water it. Several questions related to the condition of plant area, bed size, cultivation methods and farmers' tacit knowledge are detailed in the questionnaire in Table 1.

Table 1. The questions for farmer respondents

Klompenburg Data Structure	Questions	Data Type	Unit
Crop Information	How old will your shallot plants be when harvested? What is the height of a shallot plant at 45 days old? How many leaves will there be on a 45-day-old shallot plant? How many shallot bulbs will there be at 45 days old? What was the physical size of the tubers and the number of tubers per clump after 45 days?	Main; weighed; measured; calculated	days cm leaves bulb cm
Leaf Area Index	What size is the shallot plant leaf area at 45 days old?	Primary; measured	cm²
Soil Type	What is the soil type?	Primary	cat.
pH Soil	What is the pH of the growing medium?	Primary; measured	value
Rainfall	Does the amount of rainfall affect the productivity and yield of shallots, and is the availability of water in the fields sufficient?	Secondary	cat.
Wind Speed	What is the wind speed in a day?	Primary	m/s
Humidity	What is the average humidity in a day?	Primary; measured	%
Nutrients in Soil	What types of nutrients in the soil are needed for plant shallot growth? How much is the percentage of N, P, and K?	Secondary	%
Irrigation	How much watering is needed per day at each stage of the shallot plant?	Primary	cat.
Fertilization	What is the dose of fertilizer for plant development and shallot bulb yield? What is the fertilizer requirement in percentages of N, P, and K?	Primary	kg
Temperature	What is the morning temperature in a day?	Primary	℃
Variety	What variety of shallots are currently being planted on the land?	Primary	cat.
Land Size	What is the current area of shallot land planted?	Primary	m²
Planting Space	What is the planting distance?	Primary	m²
Width and Number of Beds -Planting Area	What is the current width and length of the shallot bed? How many beds are there in the area you are currently planting?	Primary	m beds
Elevation	How high is your shallot field? Calculating the height can be helped by using the extension officer's smartphone.	Primary	m
Weed, Pest, and Diseases	What is the weed infestation rate? Do you carry out pest and disease control regularly?	Primary	%
Farmer's Tacit Knowledge
Rainfall	Is the quantity of rain sufficient?	Primary	cat.
Reason for Planting	What are the considerations for planting?	Primary	cat.
Reason for Planting	If shallot prices are low, is the farmer still considering planting?	Primary	cat.
Price	Estimated price at harvest time: is it profitable? What is the price of the product when planting?	Primary	cat.
Harvest Time	Do you hurry up the harvest, and how long does it take to speed up if prices are high as the harvest gets close?	Primary	cat.
Wind	Is there any benefit to the wind blowing faster this month?	Primary	cat.
Motivation	Are you confident that you will get good results this planting season?	Primary	cat.

3.3 Dataset enhancement

The acquired data is then prepared and segregated into numerical and categorical data. The data underwent cleaning and pre-processing, including tasks such as encoding category data, normalizing numerical data, and scaling the data. Subsequently, both quantitative and qualitative data are merged. The dataset of 368 obtained when used directly in the machine learning process is still relatively small, so additional data is needed through data synthesis activities. The data synthesis process was conducted to replicate real-world conditions of shallot farming, ensuring the synthetic data retained the statistical and contextual integrity of the original dataset. Machine learning models require large volumes of data to be trained and validated. By using large data sizes, they can find meaningful patterns in real-life data [30]. It is known that most conventional machine learning methods produce good accuracy results when the dataset has high dimensionality [31].

The synthesis process involved the following steps:

1. Data preprocessing: The real data was first cleaned and analyzed for patterns, distributions, and correlations to understand the underlying statistical properties.

2. Synthetic data generation using SDV-CTGAN. The Synthetic Data Vault (SDV) library, in combination with the Conditional Tabular Generative Adversarial Network (CTGAN), was utilized. CTGAN is particularly effective in generating realistic tabular data with a small dataset, as it captures the distribution and dependencies between features. The model was trained iteratively, adjusting hyperparameters to achieve replication values above 75%.

3. Validation of synthetic data. The synthetic dataset was evaluated by comparing statistical properties such as mean, variance, and feature correlations with the original data. This step confirmed that the synthetic data retained the characteristics of shallot farming conditions.

4. Augmentation using Gretel AI. A cloud-based synthetic data platform was used to complement SDV-CTGAN. This platform employed advanced algorithms to generate additional synthetic data, offering a user-friendly interface and API for seamless integration. It also provided tools to compare synthetic data against real data for quality assurance.

5. Final Merging. The validated synthetic data was merged with the original dataset, creating a more comprehensive dataset for machine learning. This combined dataset was then split into training, validation, and test sets to ensure robust model evaluation.

3.4 Predictive modeling

A crucial step in machine learning is analyzing algorithm models using the train-test procedure. The machine learning process for estimating harvest time and yield of shallot crops consists of three phases: optimizing the hyperparameters of the base estimator model, optimizing the hyperparameters of the competitor model, and selecting the superior estimator between the base model and the competitor model. The training process uses synthetic data with 12 model algorithms, according to Figure 1.

The next step involves applying machine learning algorithms to the modeling process. We experimented with several methods to achieve improved performance. The algorithms used in this study included FNN, AdaBoost, XGB, SVR, Lasso, Ridge CV, decision tree, gradient boosting, ElasticNet, extra trees, and linear regression. Both hyperparameter tuning and optimum model selection utilize evaluation metrics given by the Sklearn library. Various metrics are utilized to assess the reliability of the optimal model across multiple tests. Based on 70% of the synthesis dataset, we trained the models to produce several alternative models, and the remaining 30% of the data was used in the validation phases. The models are compared using a mean square error (MSE) and a mean absolute error (MAE) to see which performs the best and has the lowest MAE and MSE values. Furthermore, the biggest R-squared value is also evaluated. All data sets without synthesis are reused in the testing and validation process.

1.png

Figure 1. Machine learning workflow was used in this study

3.5 Feature selection to coefficient model

The best prediction model can perform feature selection during the model fitting process. This means it can identify and potentially remove irrelevant or redundant features that might not contribute significantly to predicting yield or harvest time. The best prediction model shrinks coefficients of less important features and eliminates some, resulting in a more interpretable model. This reduces model complexity, enhances prediction accuracy, and ensures efficient models. This tool is used in machine learning models to identify important characteristics that determine shallot yield and harvest time, as well as key tacit knowledge variables.

3.6 Feature selection to coefficient model deployment machine learning model into Google spreadsheet

The model acquired from the training and testing analysis in the machine learning process in the form of linear programming is then entered into a Google spreadsheet with the code "= array formula (if (row (B:B) = 1)); formula calculation". After the farmers' data is input, all variables are normalized. This normalization method involves scaling the input variables to a specific range, often 0 to 1, which helps alleviate difficulties associated with varied input feature scales and improves the model's numerical stability. Each input's value is calculated using the linear programming equation, which includes categorical data. The model calculation results are transformed with the resulting data's normalized range value to provide the desired prediction results.

The model that was deployed in the Google spreadsheet was then evaluated on 15 shallot farmers who were interviewed, and samples of shallot plants were collected at the age of 45 days as in method 3.2 and then interviewed again following the first interview to discuss the harvest day and results. The data supplied into Google spreadsheet was then calculated and reported via an email sent by the system. The computation results must be compared to the data from farmer interviews to determine the difference in error.

4. Results and Discussion

4.1 Verification structure

The crop forecasting model proposed by Klompenburg divides several groups of 21 variables that are arranged like groups of latent variables. The grouped latent variables can be analysed using factor analysis techniques. The Kaiser-Meyer-Olkin (KMO) and Bartlett tests are commonly used to assess whether factor analysis requirements can be performed on a set of numerical data. The KMO test, which ranges from 0 to 1, tests the fit of each observed variable as well as the overall model. Bartlett's sphericity test tests whether the observed variables are correlated.

The results of the KMO test are 0.66 and the Bartlett test values are 245784.0 and 0.0. In the Bartlett test with a highly significant p-value (0.0), we can suggest the null hypothesis (all variances are identical) can be rejected. Both the KMO (borderline) and Bartlett's tests show that the dataset is suitable for factor analysis. Figure 2 shows the results of the factor analysis of the data set, which shows the existence of factor grouping with green to yellow cells. However, the number of latent variables formed is not as many as those grouped by Klompenburg. Some of the remaining cells do not form groups and remain unique.

2.png

Figure 2. Factor loading of the dataset using varimax rotation having numerical part of dataset

4.2 Best model selection

The score values from MAE, MAPE, MSE, and R-squared were taken into consideration during a prediction model analysis to determine the best prediction value, as shown in Table 2. RidgeCV is the best model when considering the mean absolute error, mean squared error, and R-squared scores combined. However, several linear regression models, such as ElasticNet and Lasso, predict TTH and yield with negligible variations in the outcomes. ElasticNet was chosen as the best prediction model by a different method that employed the best estimator application. Combining L1 and L2 penalties from the Lasso and Ridge models, ElasticNet is a linear regression model. This model can successfully handle data with many associated characteristics, preventing overfitting and preserving model stability. When applied to various and complicated data sets, the Elasticnet model performs better than the Lasso model in the event of outlier data, resulting in more consistent and dependable findings. This makes ElasticNet more consistent and reliable in various data sets. In addition, Table 2 also demonstrates that the model predicts TTH better than yield.

Table 2. Comparison of model’s performance because of best model selection

Score		MAE	MSE	MAPE	R-squared
FNN (adam)	TTH	0.0773	0.0096	1.8502E-13	0.7225
	Yield	0.0994	0.0156	4.1366E-13	0.4845
	Overall	0.0884	0.0126		0.6035
Decision Tree Regressor	TTH	0.0985	0.0157	3.1768E-13	0.5475
	Yield	0.1275	0.0254	7.4422E-13	0.1574
	Overall	0.1130	0.0206		0.3525
Ada Boost Regressor	TTH	0.0892	0.0122	5.3668E-13	0.6477
	Yield	0.1138	0.0187	9.9146E-13	0.3814
	Overall	0.1015	0.0155		0.5146
Gradient Boosting Regressor	TTH	0.0791	0.0098	2.4870E-13	0.7177
	Yield	0.0995	0.0153	5.5496E-13	0.4938
	Overall	0.0893	0.0125		0.6136
XGB Regressor	TTH	0.0821	0.0106	2.8848E-13	0.6956
	Yield	0.1041	0.0164	6.4942E-13	0.4552
	Overall	0.0931	0.0135		0.5837
SVR Regressor	TTH	0.0774	0.0094	2.5630E-13	0.7293
	Yield	0.0981	0.0146	5.7628E-13	0.5162
	Overall	0.0877	0.0120		0.6301
ElasticNet	TTH	0.0773	0.0094	2.5419E-13	0.7304
	Yield	0.0979	0.0146	5.6932E-13	0.5164
	Overall	0.0875	0.0120		0.6309
Lasso	TTH	0.0770	0.0093	2.5313E-13	0.7310
	Yield	0.0979	0.0146	5.6525E-13	0.5164
	Overall	0.0875	0.0120		0.6312
Ridge CV	TTH	0.0770	0.0093	2.5361E-13	0.7312
	Yield	0.0979	0.0146	5.6645E-13	0.5164
	Overall	0.0874	0.0120		0.6313
Random Forest Regressor	TTH	0.0834	0.0109	3.3871E-13	0.6871
	Yield	0.1069	0.0172	7.1030E-13	0.4315
	Overall	0.0874	0.0140		0.5682
Extra Trees Regressor	TTH	0.0830	0.0106	3.2207E-13	0.6934
	Yield	0.1068	0.0171	7.2378E-13	0.4325
	Overall	0.0949	0.0139		0.5721
Linear Regression	TTH	0.0770	0.0093	2.5313E-13	0.7310
	Yield	0.0979	0.0146	5.6525E-13	0.5164
	Overall	0.0875	0.0120		0.6312

4.3 Influence factor of yield and time to harvest

The best prediction model chosen is the Elasticnet model. ElasticNet may arrange the regression coefficient values, which are shown in Table 3, from largest to smallest to perform feature selection during the model fitting process. On the left side of the table, the time to harvest (TTH) regression coefficients are presented in ascending order, and on the right side, the Yield is displayed in decreasing order.

Among the most significant predictors for TTH, plant age prediction (HST) and morning temperature range stand out. The negative coefficient for plant age prediction suggests that farmers' ability to predict optimal harvest times, based on experience and growth pace, is critical. A negative coefficient value indicates that the prediction of harvest time is getting closer to plant age (HST).

Morning temperature, a proxy for daily heat accumulation, correlates with physiological processes in plants, aligning with the heat unit method [32]. These two factors interact synergistically: consistent morning temperatures provide reliable data for growth rate estimation, enhancing the accuracy of farmers’ harvest predictions. Additionally, the influence of rainfall, moisture levels, and soil type indicates the interplay between climatic and edaphic factors. While rainfall contributes to water availability, excessive amounts can delay harvest by prolonging vegetative growth. The morning temperature's moderation of daily heat accumulation further supports timely harvesting under variable rainfall conditions.

In Table 3, the yield predictors include planting area, bed structure (number and width), and day temperature range, emphasizing the interplay between agronomic practices and environmental conditions. The planting area is also influenced by the number of beds and the width of the beds and affects the density of plants in one area [33].

Larger planting areas and optimized bed dimensions influence plant density, which can enhance sunlight exposure and nutrient uptake. The day temperature range, with its positive coefficient, underscores how moderate fluctuations promote metabolic activities that boost bulb size and quality [34].

Table 3. Regression coefficient list produced by tuned ElasticNet regressor

Model Coefficient for Time to Harvest	Coefficient	Model Coefficient for Yield	Coefficient
Plant Age Prediction (day after plant)	-0.846	Area (m²)	0.520
Morning Temperature Range	-0.139	Number of Beds	0.293
Rainfall amount in the last month	-0.082	Bed Width	0.167
Moisture Value (%)	-0.078	Day Temperature Range	0.146
Paint-Type of Soil (sand, colour, clay, crumb, hard)	-0.060	Bulb Weight (gram)	0.143
Bed Width	-0.056	Phosphor Weight (kg)	0.131
Tuber Diameter (cm)	-0.050	Plant Height (cm)	0.084
Weed Grown Area (%)	-0.050	Number of Tubers (Cloves)	0.055
Nitrogen Weight (kg)	-0.050	How many times a week watered?	0.044
Potassium Weight (kg)	-0.047	Watering Methods	0.038
Area (m²)	-0.040	Estimated harvest during high price period (How many days earlier?)	0.036
Last Watering Close to Harvest (day after plant)	-0.039	Number of Leaves (Sheet)	0.033
Location Altitude	-0.037	Sulphur Weight (kg)	0.028
Estimated Harvest During High Price Period (How many days earlier?)	-0.037	Moisture Value (%)	0.028
Number of Tubers (Cloves)	-0.025	Plant Age Prediction (day after plant)	0.028
What determines the start of the season (Price, weather, pest infestation, planting)?	-0.021	What determines the start of the season (price, weather, pest infestation, planting)?	0.019
Watering Methods	-0.020	Paint-Are you confident of good yields this planting season?	0.016
Using POC/Mo/Soil Improver	-0.019	Morning Temperature Range	0.016
Paint- Are you confidence of good yields this planting season?	-0.011	Paint-Plant Spacing	0.015
Weight of Plant with Tuber (gram)	-0.003	Paint-Type of Soil (Sand, colour, clay, crumb, hard)	0.013
If the price is low will you plant?	0.000	Weed grown area (%)	0.000
Is the weather favourable for the crop?	0.000	Total Fertilizer (kg)	0.000
Is the wind blowing faster this month useful?	0.000	Last watering close to harvest (day after plant)	0.000
Total Fertilizer (kg)	0.000	Is the wind blowing faster this month?	0.000
Widest Leaf Width (cm)	0.000	Is the wind blowing faster useful?	0.000
Whether there are many weeds this planting season?	0.000	Is the weather favourable for the crop	0.000
Paint-Plant Spacing	0.002	Is the amount of rain sufficient in the past two months?	-0.002
Is the wind blowing faster this month?	0.002	Using POC/Mo/soil improver	-0.004
Number of Beds	0.002	Widest Leaf Width (cm)	-0.005
Bulb Weight (gram)	0.014	If the price is low will you plant?	-0.006
The expected soil condition after watering	0.025	The expected soil condition after watering	-0.006
How many times a week watered?	0.026	Whether there are many weeds this planting season?	-0.032
Number of leaves (Sheet)	0.029	Location Altitude (masl)	-0.039
Phosphor Weight (kg)	0.040	Weight of plant with tuber (gram)	-0.074
Is the amount of rain sufficient in the past two months?	0.040	Potassium Weight (kg)	-0.082
Plant Height (cm)	0.052	Rainfall amount in the last month	-0.085
Other Weight (kg)	0.065	Other Weight (kg)	-0.090
Sulphur Weight (kg)	0.077	Tuber Diameter (cm)	-0.118
Day Temperature Range	0.086	Nitrogen Weight (kg)	-0.146

Balanced fertilization plays a critical role in maximizing shallot yields, with phosphorus standing out as a key nutrient. Phosphorus significantly impacts yield by fostering healthy root systems, which enhance the plant's ability to efficiently absorb water and nutrients from the soil. This nutrient is also integral to various metabolic processes that convert energy and support plant growth, ultimately leading to improved yields. While nitrogen and potassium are essential for overall growth, their lower regression coefficients suggest that balanced fertilization holds greater importance than the dominance of individual nutrients.

Interestingly, the absence of potassium (K) does not substantially reduce shallot yields, whereas deficiencies in nitrogen, phosphorus, magnesium, or sulfur can lead to a noticeable decrease in bulb dry weight [35]. Adequate phosphate levels, therefore, not only promote robust root development but also ensure the efficient execution of metabolic functions critical for plant health and productivity. This reinforces the need for a balanced approach to fertilization, where each nutrient plays a synergistic role in achieving optimal yields.

4.4 Statistical evidence supporting factor importance

The ElasticNet model assigns coefficients based on their predictive power, removing redundant or negligible variables (coefficients = 0). For example, total fertilizer and wind conditions have zero coefficients in both TTH and Yield models, suggesting limited direct impact. Conversely, bulb weight and bed width consistently exhibit positive coefficients, underscoring their critical roles in Yield predictions. These statistical insights reinforce the importance of prioritizing key agronomic and environmental factors.

Table 3 outlines tacit knowledge components in the machine learning model related to harvest time and yield prediction. For harvest time, key factors include 1) prediction of harvest time, 2) day length (which increases when product prices are high), 3) reasons for planting season timing, and 4) confidence in crop yields. Experienced farmers can predict harvest timing using plant indicators, weather data, and historical trends. In the yield section, factors such as day length, harvest time prediction, planting season reasons, confidence in harvest results, and weed attacks influence the model. Watering during planting has a greater influence on yield than planting distance, with coefficients of 0.038 and 0.015, respectively.

The comparison of regression models (linear regression and random forest) in Table 4 shows that while average predictions are similar, standard deviations vary. No model matches the standard deviation of the given test, and the random forest regressor weakly maintains minimum and maximum values. This comparative analysis highlights the strengths and weaknesses of each method, guiding the selection of the most suitable model.

Table 4. Comparison of statistical description prediction model produced by linear regression and random forest regressor

Stats.	Data for Testing		Linear Regression		Random Forest Regressor
Stats.	TTH (days)	Yield (kg)	TTH (days)	Yield (kg)	TTH (days)	Yield (kg)
Data Count	300	300	300	300	300	300
Mean	9.35	2369.81	9.39	2319.66	9.4	2337.86
Std	5.51	1525.63	4.81	1120.44	4.37	961.48
Min	0	200	0	202	0.85	474.98
25%	5	1181.46	6.21	1459.98	6.34	1561.84
50%	9	2262.59	9.39	2225.51	9.33	2268.53
75%	13	3377.69	12.64	3090.02	12.29	2995.75
Max	30	7601.94	26.39	7236.06	22.25	5338.76

4.5 Model performance and qualitative discussion

The model calculations that have been implemented on Google spreadsheet are then compared with the actual results at harvest age and production results shown in Table 5. The ElasticNet model achieves a TTH error deviation of approximately 1.9 days (23.5%) and a yield error of 166.9 kg/m² (13%), with prediction accuracy exceeding 80% as demonstrated in the study of Apriyanti et al. [36], which used feature extraction for orchid identification. However, these numerical results should be contextualized with on-ground realities. Farmers value timely and accurate predictions, especially when aligning harvests with market demands. For instance, predicting harvest timing during high-price periods can significantly enhance economic returns despite a minor trade-off in prediction accuracy.

Qualitatively, farmers express satisfaction when models reduce uncertainty, particularly in dynamic climates. However, challenges like yield variability due to unforeseen weather changes or suboptimal input availability highlight areas for improvement. These insights suggest that while the model is robust, integrating real-time environmental monitoring and farmer feedback loops could further enhance its utility.

Table 5. Result prediction of farmers' field data (harvest time and yield)

Code	Field Data			Result of Model Prediction		TTH Error Dev.	Yield Error Dev.
Code	TTH (Day)	Area (m²)	Yield (kg)	TTH (Day)	Yield (kg)	TTH Error Dev.	Yield Error Dev.
F.A (J1)	55	1350	1750	54.6	1292.95	-0.4	-457.05
F.B (J2)	53	875	875	52.8	877.90	-0.2	2.9
F.C (J3)	53	875	875	54.6	997.98	1.6	122.98
F.D (Sis1)	52	875	900	55.5	752.53	3.5	-147.47
F.E (Sis2)	55	875	875	53.8	808.16	-1.2	-66.84
F.F (Suh1)	53	1750	1200	54.8	1274.63	1.8	74.63
F.G (Suh2)	55	1750	1200	53.9	1171.22	-1.1	-28.78
F.H (Suk1)	55	875	875	51.3	767.45	-3.7	-107.55
F.I (Suk2)	55	1350	1350	53.6	1100.47	-1.5	-249.53
F.J (Suk3)	52	875	875	53.5	808.16	1.5	-66.84
F.K (IR1)	55	1350	1500	52.6	1173.69	-2.4	-326.31
F.L (IR2)	55	875	900	51.1	903.77	-3.9	3.77
F.M (R1)	51	1350	1600	53.6	1004.55	2.6	-595.45
F.N (R2)	51	875	1000	52.6	912.92	1.6	-87.08
Average						1.9	166.9

5. Conclusion

This study utilized machine learning models, including classical linear regression, random forest, decision trees, and FNN, to estimate shallot harvest time and production yields. Among these models, the linear model demonstrated the highest R-squared value for both estimating harvest time (TTH) and yields. While RidgeCV outperformed the linear model in predicting TTH and yields by a small margin, the ElasticNet model was identified as the best prediction model through the best estimator application. These models effectively estimated the critical factors influencing shallot harvest time and yield. Harvest time is influenced by factors such as plant age and morning temperature, while yield predictions are determined by area, number of beds, bed width, temperature range, bulb weight, and phosphorus weight. Additionally, tacit knowledge such as farmers' calculations for harvest timing, day length, reasons for planting, and confidence in crop yields, was found to significantly impact the model's accuracy. Farmers can leverage this tacit knowledge, including historical data and environmental cues, to estimate the optimal harvest date and predict yield outcomes more accurately.

The study's findings have practical implications for farmers and agricultural planners. By applying these machine learning models and integrating tacit knowledge, farmers can make more informed decisions regarding planting and harvesting schedules, optimizing yield predictions. This could lead to better resource allocation and improved shallot production efficiency. The model performed well in Brebes Regency but may require further adjustments and validation for use in other regions or with different crops. The forecasted TTH deviation was 23.5%, with a yield inaccuracy of 13%. Comparing these results with actual farm data will be crucial for refining the prediction models and enhancing their applicability in broader agricultural contexts.

Acknowledgment

We would like to express our sincere gratitude to the National Research and Innovation Agency (BRIN), Telkom University, and the Brebes Regency Government for their funding assistance and close collaboration in developing shallot in Brebes and Indonesia. This research would not have been possible without their support. We hope that the results of this collaboration in the form of scientific writing can be useful in supporting government policy-making in developing shallots in Brebes and Indonesia.

Nomenclature

cat.	Categorial
m	Meter
mo	Microorganism

References

[1] Utari, M.H., Azijah, Z. (2019). Volatilitas Harga Bawang Merah di Indonesia. Buletin Ilmiah Litbang Perdagangan, 13(2): 309-336. https://doi.org/10.30908/bilp.v13i2.419

[2] Konfo, T.R.C., Djouhou, F.M.C., Hounhouigan, M.H., Dahouenon-Ahoussi, E., Avlessi, F., Sohounhloue, C.K.D. (2023). Recent advances in the use of digital technologies in agri-food processing: A short review. Applied Food Research, 3: 100329. https://doi.org/10.1016/j.afres.2023.100329

[3] Kuradusenge, M., Hitimana, E., Hanyurwimfura, D., Rukundo, P., Mtonga, K., Mukasine, A., Uwitonze, C., Ngabonziza, J., Uwamahoro, A. (2023). Crop yield prediction using machine learning models: Case of Irish potato and maize. Agriculture, 13(1): 225. https://doi.org/10.3390/agriculture13010225

[4] Qi, M., Zhang, G. P. (2008). Trend time–series modeling and forecasting with neural networks. IEEE Transactions on Neural Networks, 19(5): 808-816. https://doi.org/10.1109/TNN.2007.912308

[5] Jacques, A.B., Adamchuk, V.I., Cloutier, G., Clark, J.J., Miller, C. (2018). Development of a machine vision yield monitor for shallot onion harvesters. In Proceedings of the 14th International Conference on Precision Agriculture June 24–June 27, 2018 Montreal, Quebec, Canada.

[6] Selvi, A. (2021). Onion yield prediction based on machine learning. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(2): 2322-2327.

[7] Iqbal, L.B., Rahman, M.M., Mamun, S., Nabi, N., Ahamed, M.S. (2022). OnionBangla: A supervised machine learning approach for predicting onion yield using Bangladeshi climate data. In 2022 32nd International Conference on Computer Theory and Applications (ICCTA), pp. 110-115. https://doi.org/10.1109/ICCTA58027.2022.10206199

[8] Lestari, N.A.P., Dijaya, R., Azizah, N.L. (2021). Identification growth quality of red onion during planting period using support vector machine. Journal of Physics: Conference Series, 1764(1): 012060. https://doi.org/10.1088/1742-6596/1764/1/012060

[9] Leukel, J., Zimpel, T., Stumpe, C. (2023). Machine learning technology for early prediction of grain yield at the field scale: A systematic review. Computers and Electronics in Agriculture, 207: 107721. https://doi.org/10.1016/j.compag.2023.107721

[10] Filippi, P., Jones, E.J., Wimalathunge, N.S., Somarathna, P.D., Pozza, L.E., Ugbaje, S.U., Jephcott, T.G., Paterson, S.E., Whelan, B.M., Bishop, T.F. (2019). An approach to forecast grain crop yield using multi-layered, multi-farm data sets and machine learning. Precision Agriculture, 20: 1015-1029. https://doi.org/10.1007/s11119-018-09628-4

[11] Basso, B., Liu, L. (2019). Seasonal crop yield forecast: Methods, applications, and accuracies. Advances in Agronomy, 154: 201-255. https://doi.org/10.1016/bs.agron.2018.11.002

[12] Van Klompenburg, T., Kassahun, A., Catal, C. (2020). Crop yield prediction using machine learning: A systematic literature review. Computers and Electronics in Agriculture, 177: 105709. https://doi.org/10.1016/j.compag.2020.105709

[13] Sumberg, J., Okali, C., Reece, D. (2003). Agricultural research in the face of diversity, local knowledge and the participation imperative: Theoretical considerations. Agricultural Systems, 76(2): 739-753. https://doi.org/10.1016/S0308-521X(02)00153-1

[14] Oliveira, R.A.D., Brunetto, G., Loss, A., Gatiboni, L.C., Kürtz, C., Júnior, V.M., Lovato, P.E., Oliveira, B.S., Souza, M., Comin, J.J. (2016). Cover crops effects on soil chemical properties and onion yield. Revista Brasileira de Ciência do Solo, 40: e0150099. https://doi.org/10.1590/18069657rbcs20150099.

[15] Dewangan, U., Talwekar, R.H., Bera, S. (2022). Systematic literature review on crop yield prediction using machine & deep learning algorithm. In 2022 5th International Conference on Advances in Science and Technology (ICAST), pp. 654-661. https://doi.org/10.1109/ICAST55766.2022.10039620

[16] Murti, A.C., Al Machfudz, W.D.P., Prihatiningrum, A.E., Arifin, S. (2022). Effect of planting distance and bulb size on growth and production of shallots (Allium ascalonicum L.). In IOP Conference Series: Earth and Environmental Science, 1104(1): 012002. https://doi.org/10.1088/1755-1315/1104/1/012002

[17] Hidayah, B.N., Sugianti, T., Mardiana, M., Pramudia, A. (2023). The impact of weather anomalies on shallot seed production in West Lombok, Indonesia. In E3S Web of Conferences, 373: 03003. https://doi.org/10.1051/e3sconf/202337303003

[18] Sumarni, N., Hidayat, A. (2005). Budidaya Bawang Merah (in Bahasa) Panduan Teknis No 3. vol. 1. Jakarta: Pusat Penelitian dan Pengembangan Hortikultura, Badan Penelitian dan Pengembangan Pertanian.

[19] Mermoud, A., Tamini, T.D., Yacouba, H. (2005). Impacts of different irrigation schedules on the water balance components of an onion crop in a semi-arid zone. Agricultural Water Management, 77(1-3): 282-295. https://doi.org/10.1016/j.agwat.2004.09.033

[20] Patel, N., Rajput, T.B.S. (2013). Effect of deficit irrigation on crop growth, yield and quality of onion in subsurface drip irrigation. International Journal of Plant Production, 7(3): 417-436.

[21] Fauziah, R., Susila, A.D., Sulistyono, E. (2017). Budidaya Bawang Merah (Allium ascalonicum L.) pada Lahan Kering Menggunakan Irigasi Sprinkler pada Berbagai Volume dan Frekuensi. Jurnal Hortikultura Indonesia, 7: 1-8. https://doi.org/10.29244/jhi.7.1.1-8

[22] Cho, Y.C., Lee, J.T., Park, Y.G., Jeong, B.R. (2011). Effect of mulching material and planting density on growth and bulb development of shallot (Allium cepa var. ascalonicum Backer). Korean Journal of Plant Resources, 24(5): 507-513. https://doi.org/10.7732/kjpr.2011.24.5.507

[23] Sipahutar, T., Hidayat, S., Girsang, M.A., Haloho, L., et al. (2022). Characteristics and analysis of shallots farming in Dolok Silau Simalungun, North Sumatra. Agric, 34(2): 287-299. https://doi.org/10.24246/agric.2022.v34.i2.p287-299

[24] Ayu, N.G., Rauf, A., Samudin, S. (2016). Pertumbuhan dan hasil dua varietas bawang merah (Allium ascalonicum L.) pada berbagai jarak tanam. AGROTEKBIS: JURNAL ILMU PERTANIAN (e-Journal), 4(5): 530-536.

[25] Harsela, C.N. (2023). Growth and yields of bima brebes shallot variety planted using a floating hydroponics system. Eduvest-Journal of Universal Studies, 3(7): 1381-1388. https://doi.org/10.59188/eduvest.v3i7.887

[26] Hasanah, L.N., Fatah, L., Bachri, A.A., Susanti, H. (2023). SERI methods to measure the vulnerability of shallot farming to climate change in Tapin Regency, South Kalimantan. Technium Sustainability, 3: 26-35. https://doi.org/10.47577/sustainability.v3i.8544

[27] Darnhofer, I., Bellon, S., Dedieu, B., Milestad, R. (2010). Adaptiveness to enhance the sustainability of farming systems. A review. Agronomy for Sustainable Development, 30: 545-555. https://doi.org/10.1051/agro/2009053

[28] Wandal, A.K., Retang, E.U.K., Saragih, E.C. (2023). Pengaruh Kompetensi dan Motivasi Petani Terhadap Keberhasilan Usahatani Bawang Merah di Kelurahan Maulumbi. In Proceeding Sustainable Agricultural Technology Innovation (SATI), Sumba: Universitas Kristen Wira Wacana Sumba, pp. 168-175.

[29] Rahayu, E., Irianto, H., Sutrisno, J. (2023). Production and price risk analysis of shallot (Allium stipitatum regel) cultivation among farm households in brebes district, Indonesia. Applied Ecology & Environmental Research, 21(3): 26252640. https://doi.org/10.15666/aeer/2103_26252640

[30] Adadi, A. (2021). A survey on data‐efficient algorithms in big data era. Journal of Big Data, 8(1): 24. https://doi.org/10.1186/s40537-021-00419-9

[31] Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E. (2015). Deep learning applications and challenges in big data analytics. Journal of Big Data, 2: 1-21. https://doi.org/10.1186/s40537-014-0007-7

[32] Bonhomme, R. (2000). Bases and limits to using ‘degree. day’units. European Journal of Agronomy, 13(1): 1-10. https://doi.org/10.1016/S1161-0301(00)00058-7

[33] Salari, H., Antil, R.S., Saharawat, Y.S. (2021). Responses of onion growth and yield to different planting dates and land management practices. Agronomy Research, 1914.

[34] Asseng, S., Foster, I.A.N., Turner, N.C. (2011). The impact of temperature variability on wheat yields. Global Change Biology, 17(2): 997-1012. https://doi.org/10.1111/j.1365-2486.2010.02262.x

[35] Sutardi, Pramono, J., Widodo, S., Martini, T., Alifia, A.D., Apriyana, Y., et al. (2022). Double production of shallot (Allium cepa L var. aggregatum) based on climate, water, and soil management in sandy land. International Journal on Advanced Science, Engineering and Information Technology, 12: 1756. https://doi.org/10.18517/ijaseit.12.5.14698

[36] Apriyanti, D.H., Spreeuwers, L.J., Lucas, P.J. (2023). Deep neural networks for explainable feature extraction in orchid identification. Applied Intelligence, 53(21): 26270-26285. https://doi.org/10.1007/s10489-023-04880-2

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Shallot Crop Harvest Time and Yield Prediction Using Machine Learning Based on Farmers’ Tacit Knowledge in Brebes Regency, Indonesia

1.png

2.png