Field-Scale Spatial Variability and Uncertainty Mapping of Soil pH Using Ordinary Kriging and Cokriging with Auxiliary Soil Properties

Field-Scale Spatial Variability and Uncertainty Mapping of Soil pH Using Ordinary Kriging and Cokriging with Auxiliary Soil Properties

Marcial S. Buladaco II* | Hannah Mae F. Tandugon | Nicole Ann P. Yales | Sarah M. Casacop | Michelle Anne B. Bunquin | Sophia Alelie C. Bugia | Pearl B. Sanchez

Division of Soil Science, College of Agriculture and Food Science, Agricultural Systems Institute, University of the Philippines Los Baños, Laguna 4031, Philippines

Corresponding Author Email: 
msbuladaco@up.edu.ph
Page: 
1873-1885
|
DOI: 
https://doi.org/10.18280/ijdne.190604
Received: 
24 September 2024
|
Revised: 
12 November 2024
|
Accepted: 
20 November 2024
|
Available online: 
27 December 2024
| Citation

© 2024 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

Field-scale interpolation of soil pH on a seven-hectare field in Laguna, Philippines was performed. A total of 50 samples were collected following a random sampling approach. Cokriging methods, including those with electrical conductivity (CoKEC), organic matter (CoKOM), cation exchange capacity (CoKCEC), and clay content (CoKClay), were assessed and compared with Ordinary Kriging (OK) using cross-validation prediction errors and the prediction standard error (PSE). Using cation exchange capacity (CEC), electrical conductivity (EC), and organic matter (OM) as auxiliary variables decreased the prediction errors. Based on the RMSE, the relative improvement (RI%) of CoKCEC, CoKEC, and CoKOM were 3.19, 3.73, and 4.08%, respectively. While OK provides a good general overview, the findings demonstrate that incorporating relevant auxiliary variables like OM and EC results in more accurate predictions of soil pH. CoKEC was the most accurate method, with a low Root Mean Square Error (RMSE) of 0.2477 and a Root Mean Square Standardized Error (RMSSE) closest to 1 (0.9947), indicating a well-calibrated model. Additionally, CoKEC produced PSE maps with lower standard errors (0.85-0.242), suggesting a high level of precision in soil pH predictions. The choice of method should be guided by the specific characteristics of the study area and the availability of auxiliary data. The PSE maps clearly show the importance of considering prediction uncertainty when selecting an interpolation method for soil pH, with CoKEC offering both improved accuracy and reduced uncertainty.

Keywords: 

cokriging, interpolation, geostatistics, soil pH

1. Introduction

Soil pH, a measure of alkalinity or acidity, is a master variable that influences various soil processes and plays an essential role in affecting the availability of essential nutrients to plants. Acidification occurs as a natural process during soil formation. In humid environments, it lasts for a long time as water moves laterally or downward through the soil, leaching the products of weathering; in dry environments, soil leaching and weathering are less intense, and the pH of the soil is often alkaline or neutral [1].

Various soil properties exhibit variation across spatial scales [2, 3]. This variation is predominantly attributed to the diversity of anthropogenic and internal factors that result in complex spatial soil patterns [4, 5]. Soil pH is not a static parameter and can exhibit significant variability across different locations within a field or even within a single soil profile. The variability observed can be due to the variables of soil formation, climate, living organisms, relief, parent material, and time [6].

Routine soil testing and monitoring can provide valuable information about pH variability, allowing for targeted amendments and adjustments to optimize soil conditions for plant growth and productivity. However, traditional soil sampling and analysis can be time-consuming and expensive, often resulting in limited spatial coverage. Various interpolation techniques and strategies are explored and utilized to map the geographical variability of soil pH [7-10]. Geostatistics has been extensively utilized to forecast the spatial distribution of soil parameters [11, 12].

Mapping of various soil properties, like soil pH, is considered vital for directing decision-makers in environmental modeling, land use research, and natural resource evaluation [13]. The knowledge of the spatial distribution of soil characteristics is crucial for precision agriculture, which plays a vital role in agricultural, economic, and environmental activities. For site-specific management of agricultural practices, understanding the spatial variability of soil physicochemical properties in both their dynamic (such as water content, compaction, and organic matter) and static (such as texture and mineralogy) forms is essential. This understanding has a direct impact on the observed variability in crop yield and productivity [14, 15]. By considering these factors collectively, farmers can improve their understanding of soil pH dynamics and implement appropriate strategies to maintain optimal soil health and fertility.

In relation to mapping, it is essential to provide estimates of the uncertainties linked to predicted soil functional properties alongside the corresponding soil property maps [16]. The standard error or variance of predictions offers a quantifiable measure of uncertainty. Methods that produce both uncertainty and prediction maps are rare, with geostatistical prediction methods being among the few that account for spatial distances and structures. A key advantage of digital soil mapping lies in the increasing ability of modern techniques to generate uncertainty estimates alongside predictions [17]. For instance, model prediction errors reveal the discrepancies between observed data and model outputs [18-21]. Despite extensive use of kriging techniques, there remains a gap on the use of auxiliary soil properties in enhancing the interpolation accuracy of certain soil properties including soil pH and its uncertainty estimation at a field scale. This study addresses this gap by exploring the integration of auxiliary soil properties to improve spatial prediction of soil pH and by providing a comprehensive evaluation of model uncertainties.

Cokriging, an extension of the standard geostatistical kriging approach, achieves good results by including secondary information. It increases the interpolation accuracy by taking into account the spatial cross-correlation of the primary and secondary variables. Thus, it has been increasingly utilized in the prediction of soil properties since it offers better precision than standard kriging [22]. In pedometrics, a significant interest has been placed on using spatially correlated supplementary data to enhance the prediction quality of mapping soil properties [23-25].

The study evaluates the use of selected soil properties such as electrical conductivity (EC), organic matter (OM), cation exchange capacity (CEC), and clay content as auxiliary variables in the field-scale interpolation of soil pH as compared to ordinary kriging. Furthermore, cross-validation and uncertainty mapping using prediction standard errors were used to assess the prediction accuracy of the models. The study contributes in evaluating the effectiveness of specific soil properties to enhance field-scale interpolation of soil pH. Through rigorous cross-validation and uncertainty mapping, the research provides valuable insights into the spatial variability and prediction accuracy of soil pH mapping. These findings support more precise and reliable soil pH assessments, with significant implications for precision agriculture applications.

2. Materials and Methods

2.1 Study area

The study area is a 70,000 m2 field between 14°08’48” N and 121°15’40” E in Los Banos, Laguna, Philippines (Figure 1). It is in a relatively flat area with an elevation of 27 to 30 meters above sea level. The average annual temperature is around 27℃, and the annual precipitation is 1800 mm. The soil in the area is classified as Lipa soil series, a residual soil of volcanic tuff, and is classified as fine clayey, mixed, shallow, isohyperthermic Typic Eutrudepts [26]. It is a medium-textured (loam to clay loam) soil.

Previously, the area had been primarily designated for livestock grazing. In 2018, the area was converted into a research station dedicated to organic agriculture research. Organic farming emphasizes soil health, biodiversity, and ecological balance. Converting the area into an organic research station indicates a commitment to studying and promoting organic agricultural practices.

Figure 1. Study area location and sampling point distribution (n=50)

2.2 Soil collection and analysis

Fifty soil samples were collected at a depth of 0-20 cm, chosen based on its relevance to root zone activities and nutrient availability, as this depth is commonly influenced by agricultural management practices and is critical for plant growth.

A random sampling approach was implemented to avoid sampling bias and to allow for a representative overview of soil conditions across the field. Although stratification based on soil type, prior land use, or other factors was considered, it was ultimately not applied due to the relative homogeneity of the soil type and recent conversion to organic farming practices. This approach was thus deemed suitable to capture the field's current variability under organic management without historical land use impacts.

At each sampling point, five subsamples were collected within a 1 m diameter and combined into a composite sample. The collected soil samples were air-dried in cool, dry conditions before they were transferred to the laboratory. Following air drying, the larger aggregates were broken down into smaller particles and passed through a 2-mm sieve. The soil samples were stored for no longer than two weeks before processing to minimize alterations in soil properties. The samples were analyzed following standard laboratory protocols for five parameters: soil pH (soil/water, 1:1 ratio), clay content (hydrometer method), EC (soil/water, 1:5 ratio), CEC (ammonium acetate method), and OM (Walkley-Black method).

2.3 Exploratory data analysis

Statistical descriptors such as the minimum, maximum, mean, standard deviation, skewness, and kurtosis were used as measures of central tendency, variability around the mean, distribution, and deviation from normality (Table 1). Furthermore, the Kolmogorov-Smirnov normality test was utilized to assess the normality of the soil properties. The box-cox transformation was applied to normalize non-normal parameters. Spearman correlation was utilized to investigate the association and strength of the relationship between soil pH and the auxiliary variables.

Table 1. Descriptive statistics of the soil pH across the study area

Parameters

Min.

Max.

Mean

SD

Skewness

Kurtosis

KS p

pH

5.00

6.30

5.53

0.29

0.66

0.85

0.068

2.4 Geostatistical analysis

Geostatistics is a standard technique for statistical analysis of geospatial data, specifically soil properties, for agricultural and environmental applications [27, 28]. The selection of kriging and cokriging methods was driven by their demonstrated effectiveness in handling spatially continuous data and their ability to incorporate spatial autocorrelation, which is essential for accurate soil property mapping. Ordinary kriging, is widely recognized for its capability to estimate unknown values by weighting neighboring sample values, which is suitable for relatively homogeneous fields such as the study area. Cokriging was chosen to further improve prediction accuracy by incorporating secondary, correlated variables, which can reduce uncertainty and enhance the interpolation of soil pH when auxiliary soil properties are available. The assumptions underlying the kriging approach include stationarity of the spatial process, meaning that statistical properties do not vary across the study area, and spatial autocorrelation, assuming that points closer together have more similar values. These assumptions can influence results, as deviations from stationarity or weak autocorrelation in soil pH and auxiliary variables could impact prediction accuracy and the interpretation of results [29, 30].

The ArcGIS Pro 3.2 Geostatistical Analyst extension was utilized to analyze the data, generate interpolated surfaces, and perform post-process cross-validation. The kriging approach was used to estimate the values of soil parameters in areas where no samples had been collected. Cokriging allows for the inclusion of secondary variables that are spatially correlated with the primary variable (soil pH), leveraging cross-correlation to achieve a higher prediction precision. The spatial variance across variables can be mapped if multiple soil properties from the same locations are available. Nevertheless, it is useful that these covariables have a positive or negative correlation thus exhibit autocorrelation [29]. Semivariograms and its parameters are used to establish spatial correlation between the variables [30].

The ordinary kriging semivariogram (γi) (Eq. (1)) and cokriging cross-semivariograms (γij) (Eq. (2)) were derived using the equations:

$\gamma_i(h)=\frac{1}{2 N(h)} \sum_{i=1}^{N(h)}\left(Z_{1 i}\left(X_i\right)-Z_{1 i}\left(X_i+h\right)\right)^2$               (1)

$\gamma_{i j}(h)=\frac{1}{2 N(h)} \sum_{\mathrm{i}=1}^{\mathrm{N}(\mathrm{h})}\left\{\begin{array}{l}{\left[Z_{1 i}\left(x_i\right)-Z_{1 i}\left(x_i+h\right)\right] *} \\ {\left[Z_{2 j}\left(x_i\right)-Z_{2 j}\left(x_i+h\right)\right]}\end{array}\right\}$                   (2)

where, h is the distance, and N(h) is the pairs of Z1i(xi) and Z2j(xi) in a h + dh, a known lagged distance interval [31]. The ordinary kriging and cokriging prediction models of Z1i at x0 can be summarized as follows:

$Z_{1 i}\left(x_o\right)=\sum_{i=1}^{N_1} \alpha_i Z_{1 i}\left(x_{1 i}\right)$               (3)

$Z_{1 i}\left(x_0\right)=\sum_{i=1}^{N_1} \alpha_{1 i} Z_{1 i}\left(x_{1 i}\right)+\sum_{i=1}^{N_2} \alpha_{2 j} Z_{2 j}\left(x_{2 j}\right)$                   (4)

where, Z1i(x0) is the predicted value; α1i and α2j are weight coefficients and Z1i(x1i) and Z2j(x2j) are two regionalized variables; and N1 and N2 are the Z1i and Z2j neighborhood used in interpolation [12, 32-35].

The distribution of soil pH was predicted by the utilization of the semivariogram's best-fitting mathematical functions. Three commonly used semivariogram models, exponential (Eq. (5)), gaussian (Eq. (6)), and spherical (Eq. (7)) functions were used to identify the best-fit model based on the dataset.

$\rho(h)=C_0+C_1\left[1-e x\left(-\frac{h}{a}\right)\right] \quad$ for $\mathrm{h} \geq 0$                 (5)

$\rho(h)=C_0+C_1\left[1-\operatorname{ex}\left(-\frac{h^2}{a^2}\right)\right] \quad$ for $\mathrm{h} \geq 0$              (6)

$\begin{gathered}\rho(h)=C_0+C_1\left[1-\text { ex }\left(-\frac{h^3}{a^3}\right)\right] \text { for } \mathrm{h} \leq \alpha ; \\ C_0+C_1 \text { otherwise }\end{gathered}$               (7)

where, ρ(h) is the semivariance for h, the lag interval; C0 is the nugget; C1 is the partial sill; and α is the range to reach the sill (C0 + C1).

2.5 Cross-validation and prediction accuracy

Kriging and cokriging models were assessed using cross-validation, a leave-one-out approach. Mean error (ME), root mean square error (RMSE), average standardized error (ASE) and root mean square standardized error (RMSSE) were utilized for the evaluation. A ME close to zero and a lower RMSE value is preferred. Lower RMSE values indicate a stronger correlation coefficient of the estimated and actual values, which leads to a reduction in estimation inaccuracy [36, 37]. Also, the method with the least difference between RMSE and ASE is preferred. RMSSE close to 1 indicates accurate standard errors. ME and RMSE were derived using the following equations:

$M E=\frac{1}{n} \sum_{i=1}^n\left[Z^*\left(x_i\right)-Z^{\prime}\left(x_i\right)\right]$              (8)

$R M S E=\sqrt{\frac{1}{n} \sum_{i=1}^n\left[Z^*\left(x_i\right)-Z^{\prime}\left(x_i\right)\right]^2}$                   (9)

$A S E=\sqrt{\frac{1}{n} \sum_{i=1}^n \sigma^{* 2}\left(x_i\right)}$                   (10)

$R M S S E=\sqrt{\frac{1}{n} \sum_{i=1}^n \frac{\left[Z^*\left(x_i\right)-Z^{\prime}\left(x_i\right)\right]^2}{\sigma^{* 2}\left(x_i\right)}}$                     (11)

where, Z'(xi) and Z*(xi) are the measured and predicted value, respectively; σ* is the standard error of the predicted value, and n is the cross-validation points.

To test the improvement in prediction accuracy of the CoK technique compared to the reference method (OK), a relative improvement (RI) in RMSE was determined, as described by study [38] following Eq. (12).

$R I_{C o K}=\frac{R M S E_{O K}-R M S E_{C o K}}{R M S E_{O K}} \times 100$                    (12)

The prediction standard error maps were generated to show the prediction errors within the study area, quantifying the uncertainty or precision of the predictions to visualize the estimated uncertainty for each method.

3. Results and Discussion

3.1 Descriptive statistics

The descriptive statistics of the soil pH are shown in Table 1 while the boxplots of the auxiliary variables (EC, OM, CEC, and clay) are presented in Figure 2. The soil in the area is generally acidic, with soil pH values between 5.00 and 6.30, with a mean and standard deviation of 5.53 and 0.29, respectively. The acidic soil pH affects agricultural potential and nutrient availability, as lower pH levels can reduce the availability of certain nutrients like nitrogen and phosphorus while increasing the solubility of potentially toxic elements like aluminum. This soil acidity suggests that crops sensitive to low pH may require lime amendments or other soil management practices to optimize growth. The low EC, with a mean value of 65 µS/cm, indicates that the area has a low concentration of soluble salts. The OM content ranged from 1.37 to 4.27%, with a mean value of 2.86%. Moreover, the area is confirmed to have a medium-textured soil with a mean clay content of 31.06%. The low skewness values (<1) of pH, OM, CEC, and clay content imply normal distribution, while the low kurtosis values of pH, OM, and clay content imply light-tailed distribution. The Kolmogorov-Smirnov (KS) test showed that pH, OM, CEC, and clay content are normally distributed. The Box-Cox transformation (λ = -1) was used to normalize the soil EC. This transformation was necessary to correct for skewness in the EC data, which may affect the accuracy of interpolation models if left untransformed. Normalizing EC data ensures that it meets the assumptions of normality required by geostatistical methods, thereby improving model reliability and interpretability for spatial prediction of EC values and their relationship with soil pH.

(a)

(b)

(c)

(d)

Figure 2. Box plot with normal distribution curve of the auxiliary variables (a) EC, (b) OM, (c) CEC, and (d) Clay

Spearman correlation analysis explored the relationship between soil pH and the auxiliary variables (Table 2). A significant positive correlation (r=0.310, p<0.05) was observed between soil pH and EC, indicating that an increase in the concentration of soluble salts results in a corresponding rise in pH. Meanwhile, soil pH has a significant negative correlation (r=-0.383, p<0.05) with the clay content. Soils with higher clay content generally have higher CEC values, which can influence soil buffering capacity [39]. However, clay content alone may not directly determine pH variability, as other factors like organic matter content, type of clay, and CEC also play significant roles. Soils with a higher CEC can retain a greater amount of cations, which can influence soil pH due to the higher buffering capacity. As a result, soils with higher CEC values are generally more resistant to pH changes and exhibit lower pH variability compared to soils with lower CEC values.

Table 2. Spearman correlation coefficients (95% CI) between soil pH and the auxilliary variables

Primary Variable

EC

%OM

CEC

%Clay

Soil pH

0.310* (0.028, 0.546)

0.064 (-0.218, 0.337)

-0.021 (-0.298, 0.259)

-0.383* (-0.605, -0.107)

* Indicates significance at p ≤ 0.05.

Organic matter contains various organic compounds that can release hydrogen ions (H+) when decomposed, leading to soil acidification. Consequently, soils with higher organic matter content tend to have lower pH values. Additionally, organic matter acts as a buffer, helping to stabilize soil pH by adsorbing and neutralizing excess acidity or alkalinity. Therefore, variations in organic matter content across different soil locations can contribute to pH variability [40]. The significant correlations between soil pH and EC or clay content indicate that both have the potential to serve as auxiliary variables for cokriging, hence enhancing the accuracy of predictions.

3.2 Geostatistical analysis

Insights into potential processes that influence the spatial distributions of soil properties and precise descriptions of their spatial structure can be obtained using semivariograms [41, 42]. The attributes of the semivariograms for both OK and CoK methods and the corresponding best-fit models are presented in Table 3. Furthermore, the semivariograms of the best-fitted models for soil pH are presented in Figure 3.

(a) OK

(b) CoKEC

(c) CoKOM

(d) CoKCEC

(e) CoKClay

Figure 3. Semivariograms of the best-fit models of soil pH using (a) Ordinary kriging, (b-e) Cokriging models

Table 3. Semivariogram parameters of best-fit models

Method

Best-fit Model

Semivariogram

Nugget (C0)

Partial Sill (C1)

Sill (C0+C1)

Range (meters)

OK

S

0.0403

0.0231

0.0634

75.8421

CoKEC

G

0.0478

0.0135

0.0613

69.0770

CoKOM

E

0

0.0641

0.0641

56.2049

CoKCEC

G

0.0468

0.0139

0.0607

56.2049

CoKClay

S

0

0.0709

0.0709

58.3545

S- Spherical, G- Gaussian, E- Exponential.

The nugget (C0) signifies the microscale field variations and experimental errors within the minimum sampling spacing [43]. Nugget effects were small to none (0.0468-0.0000) indicating low experimental error and sampling error. The sill, the total variance at which the model first flattens out, were also small (0.0709-0.0607). The range, the distance limit beyond which the variable became spatially independent of another, were around 75.8421 to 56.2049 m. The nugget-to-sill ratio ((C0/(C0 + C1))*100) was utilized to define the degree of spatial dependence (DSD) of the soil pH [44]. Based on OK, the DSD of soil pH was 63.56%, indicating moderate spatial dependence. Therefore, the spatial variability of soil pH was influenced by extrinsic (soil cultivation and management practices) and intrinsic (soil-forming processes) factors.

3.3 Cross-validation

The best-fit models of the OK and CoK methods were assessed using cross-validation indicators, including ME, RMSE, ASE, and RMSSE, as shown in Table 4. The ME represents the mean difference between the predicted and measured values. It indicates whether the predictions are biased by being, on average, too high or too low. A ME closest to zero (lowest bias) is preferred. In general, regardless of the method, the ME values were extremely low. Ordinary kriging produced a positive ME (0.00018). In contrast, CoK with auxiliary variables resulted in negative MEs (-0.00002 to -0.00367). A negative ME means that the predicted pH is lower than the measured pH. The best-fit methods are considered unbiased due to the extremely low ME values. Overall, CoKEC resulted in an ME closest to zero.

Table 4. Cross-validation parameters of best-fit models

Method

Best-fit Model

Cross-validation Parameters

ME

RMSE

ASE

RMSSE

OK

S

0.00047

0.2571

0.2512

1.0182

CoKEC

G

-0.00002

0.2477

0.2476

0.9947

CoKOM

E

-0.00130

0.2468

0.2457

1.0022

CoKCEC

G

-0.00032

0.2491

0.2478

1.0107

CoKClay

S

-0.00081

0.2627

0.2152

1.1842

S- Spherical, G- Gaussian, E- Exponential.

The RMSE specifies and quantifies the degree of accuracy with which a model predicts the measured data. The model with the lowest RMSE values is most preferred. The RMSEs follow the order: CoKClay > OK > CoKCEC > CoKEC > CoKOM. Relative to the OK method, most CoK methods (CoKCEC, CoKEC, and CoKOM) improved the accuracy of predictions. The difference between the RSME values of the three CoK methods were extremely small (<0.0023). Overall, CoKOM resulted to the smallest RMSE value. Compared with OK, the use of CEC, EC, and OM as auxiliary variables resulted in RI of 3.19%, 3.73%, and 4.08%, respectively. This indicates that incorporating these properties as an auxiliary variable reduces the overall prediction error. According to Cambardellaet al. [45] and Goovaerts [46], the impact of additional information on the cokriging estimation is influenced by both the correlation between the main and additional variables, as well as their spatial continuity patterns.

The ASE represents the mean value of the standard errors of the predictions. A low ASE value is preferred. The ASEs follow the order: CoKCEC > OK > CoKEC > CoKOM > CoKClay. The CoKClay had the lowest ASE of 0.2152. Furthermore, the disparity between ASE and RMSE was computed to evaluate if there was an overestimation of variability (ASE-RMSE > 0) or an underestimation of variability (ASE-RMSE < 0) in each model. Excluding CoKCEC, ASE-RMSE were negative in the remaining models, which indicates an underestimation of variability.

The RMSSE assesses the overall error. The prediction standard errors are valid if it is close to 1. If RMSSE is greater than 1, variability is underestimated. Conversely, if the RMSSE is less than 1, variability is overestimated. In this study, most of the methods, except CoKEC, underestimate the variability with RMSSE values between 1.1842 and 1.0022. The CoKEC overestimates the variability with an RMSSE value of 0.9947. Nevertheless, as the RMSSE values are close to 1, they are well within the acceptable range indicating a well-calibrated model. The RMSSEs follow the order: CoKClay > OK > CoKCEC > CoKOM > CoKEC. Overall, CoKEC resulted in an RMSSE closest to one.

3.4 Spatial variability mapping

Interpolated maps of soil pH using five different geostatistical methods are presented in Figure 4. Ordinary Kriging (Figure 4(a)) serves as a baseline map for comparison. The map produced by OK reveals a smooth gradient of pH values, with distinct zones of higher and lower pH. The northeastern part of the area exhibits higher pH levels, possibly indicating less acidic conditions, while other areas are more acidic.

The inclusion of EC as an auxiliary variable in Cokriging results in a more refined spatial pattern of pH variability (Figure 4(b)). The CoKEC map shows that the northeastern zone with a less acidic pH is more pronounced compared to OK. This indicates that EC has a significant influence on pH distribution, possibly due to the relationship between the two variables capturing more localized variations.

Although the ME is slightly higher than CoKEC, the OM is an effective auxiliary variable for predicting soil pH. The CoKOM map (Figure 4(c)) introduces further localized variations, particularly highlighting both high and low pH zones more distinctly than OK. The influence of organic matter on soil pH is reflected in these variations, suggesting that areas with higher organic matter content may experience different buffering capacities and hence the variations in the pH levels. The areas with higher pH in the northeastern area are consistent, but the map also reveals additional spots of high and low pH, showing that organic matter plays a role in pH distribution

Incorporating CEC as auxiliary variable shows a pattern similar to CoKEC but with some differences (Figure 4(d)). The CoKCEC map reflects these characteristics, showing a more refined spatial distribution compared to OK, with CEC influencing the pH variability effectively, although not as strongly as EC or OM. The pH distribution remains consistent with the previous methods, but CEC's influence seems to smooth out the pH variability in some regions while enhancing it in others. This suggests that CEC, which influences soil's ability to retain cations, has a significant effect on the spatial distribution of soil pH. The northeastern part with a lesser acidity is consistently captured, reaffirming the area's distinct soil characteristics.

The CoKClay map displays the most localized variations among the methods (Figure 4(e)). The influence of clay content is evident, as it significantly affects soil's water retention and hence its pH buffering capacity. The map shows a more heterogeneous pattern with smaller pockets of varying pH levels. The high pH area in the northeast is still present, but the inclusion of clay content introduces additional areas of both high and low pH across the field, highlighting the intricate relationship between soil texture and pH. However, as this method produced the highest RMSE and RMSSE, the prediction error is higher when clay content is used as an auxiliary variable and the variability in predictions is overestimated suggesting that the model overfits the data, leading to less reliable predictions.

(a) OK

(b) CoKEC

(c) CoKOM

(d) CoKCEC

(e) CoKClay

Figure 4. Interpolated maps of soil pH using (a) OK, (b) CoKEC, (c) CoKOM, (d) CoKCEC, and (e) CoKClay

Overall, both OK and CoK approaches exhibited similar patterns of field-scale variability throughout the study area. However, the CoK maps provide better details compared to the OK maps. Additionally, CoKEC, CoKOM, and CoKCEC resulted in interpolated maps with less errors compared to OK. This outcome demonstrates that the inclusion of auxiliary variables enhances the accuracy of soil pH interpolation. The error of interpolation is never equal to zero, no matter how accurately the data are measured. It does decrease as the number of samples increases, but the reduction is small [47]. Since prediction is always associated with uncertainty, the goal is to minimize that uncertainty.

Similar trends have been found for other cokriged soil properties [22, 34, 48-51]. The majority of cokriging methods demonstrated better interpolation quality compared to the OK method. Hence, other soil properties may serve as secondary parameters for cokriging interpolation of soil pH. In this study, cokriging using EC as an auxiliary variable (CoKEC) was observed to be the best method for improving prediction accuracy. Based on the low prediction errors, it generates the most precise map.

3.5 Uncertainty assessment

To assess the dependability of maps to be utilized in various applications, it is critical that end users are aware of the uncertainty associated with the predicted spatial distribution [52-54]. The geographic distribution produced by interpolation algorithms inevitably includes sources of uncertainty. Figure 5 shows the PSE of the OK and CoK methods. The PSE surface is a spatial representation of the standard errors associated with the projected values at different locations. It represents the variability of the estimated values with an area, and a higher standard error indicates a lesser level of precision. It shows areas where the predictions are most uncertain. This helps in understanding how much confidence can be placed in the predicted values.

The PSE map for OK (Figure 5(a)) shows a relatively smooth gradient of uncertainty across the field, with standard error values ranging from 0.11 to 0.241. The higher standard errors are concentrated in the northeastern region, which also exhibited higher pH levels in the interpolation map. This pattern suggests that OK has more difficulty accurately predicting pH in areas with greater variability or fewer data points, leading to higher uncertainty in those regions.

The CoKEC method shows a notable reduction in PSE compared to OK, with values ranging from 0.085 to 0.242 (Figure 5(b)). The map indicates that CoKEC has lower uncertainty, particularly in areas where EC data is likely available and strongly correlated with pH. The lower standard errors in most parts of the field suggest that incorporating EC as an auxiliary variable improves the confidence in pH predictions. This aligns with the lower RMSE observed for CoKEC, indicating that the model is both more accurate and less uncertain.

The CoKOM method shows a different pattern, with higher standard errors distributed more unevenly across the field (Figure 5(c)). The PSE values range from 0.027 to 0.317, with particularly high uncertainties in specific localized areas. This pattern suggests that while CoKOM can produce accurate predictions (with the lowest RMSE), the model's uncertainty varies significantly across the field, potentially due to the spatial variability of organic matter. The presence of high standard errors in localized regions may indicate areas where organic matter data is less reliable or where its correlation with pH is weaker.

The CoKCEC method shows a relatively uniform distribution of standard errors, with values ranging from 0.09 to 0.24 indicating that CEC as an auxiliary variable helps in maintaining a consistent level of prediction uncertainty across the field (Figure 5(d)). The relatively narrow range of standard errors suggests that the influence of CEC is consistent, resulting in a stable model with moderate uncertainty. This is reflected in the RMSSE value, indicating that CoKCEC is well-calibrated regarding both prediction accuracy and uncertainty.

(a) OK

(b) CoKEC

(c) CoKOM

(d) CoKCEC

(e) CoKClay

Figure 5. Prediction standard error maps of (a) OK, (b) CoKEC, (c) CoKOM, (d) CoKCEC, and (e) CoKClay

The CoKClay method exhibits the highest variability in standard errors, with values ranging from 0.016 to 0.338 (Figure 5(e)). This wide range indicates that the inclusion of clay content as an auxiliary variable introduces significant uncertainty in certain regions of the field. The map shows many localized areas with high standard errors, suggesting that the relationship between clay content and pH may be complex or variable across the field. This is consistent with the higher RMSE and RMSSE values, indicating that while it captures fine-scale variations, it does so with considerable uncertainty.

The mapping accuracy of the OK and CoK methods was affected by the sampling pattern that was used. Areas characterized by a sparse sampling distribution exhibit a diminished level of prediction accuracy, whereas regions with a denser sampling distribution demonstrate a higher level of prediction accuracy. The edges of the maps tend to have relatively higher predicted standard error values due to the lack of sampling points within the neighborhood search area used during the interpolation process.

When interpreting standard error numbers, it is important to consider the values and range of the data. For instance, in the case of CoKEC, since the soil pH values are between 5.0 and 6.30, a standard error values between 0.085 and 0.242 indicates high prediction precision. This is because the standard errors are significantly smaller than both the pH values and the overall range of the data.

The PSE maps provided important insights into the reliability of each interpolation method. The maps presented information on the prediction error at each point on the map, making them valuable for model evaluation. CoKEC and CoKCEC exhibit the most consistent and lowest standard errors, indicating that these methods not only improve the accuracy of pH predictions but also reduce the uncertainty associated with those predictions. CoKOM, while providing accurate predictions, shows greater variability in uncertainty, which may limit its reliability in some regions of the field. CoKClay, despite capturing detailed variations in pH, introduces significant uncertainty, making it a less reliable choice for fields with highly variable clay content.

For practical applications, these PSE maps suggest that CoKEC is the most robust method for predicting field-scale soil pH with high confidence, especially in areas where the corresponding auxiliary variables are well-understood and closely related to pH. In contrast, while CoKOM and CoKClay may offer detailed insights into pH variability, their higher and more variable prediction uncertainties must be considered when using these methods for decision-making in soil management or precision agriculture.

4. Conclusions

This research highlights the potential of integrating auxiliary soil properties to enhance the accuracy of soil pH mapping, which can inform more targeted and efficient soil management practices. We demonstrated the use of other soil properties to improve the prediction accuracy of field-scale soil pH variability. In this study, ordinary kriging and cokriging with auxiliary variables, particularly EC, OM, CEC, and clay content, have been employed to interpolate and predict the spatial distribution of the soil pH. For prediction assessment, cross-validation and mapping of prediction standard errors were used to assess the mapping accuracy and precision.

Relative to OK, the CoKEC, CoKOM, and CoKCEC methods decreased the RMSE, ASE, and RMSSE, thereby improving the prediction accuracy. In terms of RMSE, the relative improvement (RI%) of CoKCEC, CoKEC, and CoKOM were 3.19, 3.73, and 4.08%, respectively. Overall, CoKEC was observed to be the best method for improving prediction accuracy, as observed in the lower prediction errors. Furthermore, the resulting CoKEC interpolated map provided more spatial details. Comparing CoKEC, CoKOM, and CoKCEC methods, the differences in errors parameters produced by cross-validation were comparatively minor. Therefore, the fact that one method has a lower RMSE does not imply that it is superior to the other.

The mapping accuracy of both OK and CoK methods was affected by the sampling pattern that was used, as observed in the prediction error maps. The generated standard error maps using OK and CoK methods exhibit a spatial distribution that is highly responsive to both the density of sampling and the chosen technique. The cross-validation metrics clearly show that the inclusion of auxiliary variables in CoK generally improves the accuracy and reliability of soil pH predictions compared to OK. For practical applications in soil management or precision agriculture, the choice of interpolation method should be based on the availability and relevance of auxiliary variables. Among the CoK methods, CoKEC, with a minimal ME and an RMSSE close to 1, making it a robust method for soil pH interpolation. Furthermore, the PSE maps clearly show the importance of considering prediction uncertainty when selecting an interpolation method for soil pH. CoK methods that utilize correlated auxiliary variables, such as EC, offer both improved accuracy and reduced uncertainty. In contrast, methods like CoKOM and CoKClay require careful consideration of the associated uncertainties to avoid overconfidence in the predictions.

Cokriging with auxiliary variables can be valuable in precision agriculture by enabling site-specific soil management practices that enhance nutrient availability and soil health. Future research should explore the integration of real-time soil sensor data with cokriging techniques to further improve prediction accuracy and spatial resolution. Additionally, examining the application of these methods in diverse soil types and agricultural systems could provide a broader understanding of their utility in various agroecological contexts.

This research supports the potential for cokriging to play a crucial role in sustainable agriculture by allowing for more data-driven and adaptive soil management practices. By leveraging multiple datasets, cokriging provides a robust tool for addressing complex agricultural challenges, ultimately fostering sustainable and productive farming practices that optimize soil resources and support long-term agricultural viability.

  References

[1] Sumner, M.E. (1999). Handbook of Soil Science. CRC Press, USA.

[2] Bloom, P.R., Skyllberg, U.L., Sumner, M.E. (2005). Soil acidity. In: Chemical Processes in Soils, 8. https://doi.org/10.2136/sssabookser8.c8

[3] Goovaerts, P. (1998). Ordinary cokriging revisited. Mathematical Geology, 30(1): 21-42. https://doi.org/10.1023/A:1021757104135

[4] Outeiro, L., Asperó, F., Úbeda, X. (2008). Geostatistical methods to study spatial variability of soil cations after a prescribed fire and rainfall. Catena, 74(3): 310-320. https://doi.org/10.1016/j.catena.2008.03.019

[5] Liu, X., Zhang, W., Zhang, M., Ficklin, D.L., Wang, F. (2009). Spatio-temporal variations of soil nutrients influenced by an altered land tenure system in China. Geoderma, 152(1-2): 23-34. https://doi.org/10.1016/j.geoderma.2009.05.022

[6] Kilic, K., Kilic, S., Kocyigit, R. (2012). Assessment of spatial variability of soil properties in areas under different land use. Bulgarian Journal of Agricultural Science, 18: 722-732. https://hdl.handle.net/20.500.12513/4202.

[7] Jenny, H. (1994). Factors of soil formation: A system of quantitative pedology. Dover Publications, UAS. 

[8] Chabala, L.M., Mulolwa, A., Lungu, O. (2014). Mapping the spatial variability of soil acidity in Zambia. Agronomy, 4(4): 452-461. https://doi.org/10.3390/agronomy4040452

[9] Mao, Y., Sang, S., Liu, S., Jia, J. (2014). Spatial distribution of pH and organic matter in urban soils and its implications on site-specific land uses in Xuzhou, China. Comptes Rendus Biologies, 337(5): 332-337. https://doi.org/10.1016/j.crvi.2014.02.008

[10] Reza, S.K., Nayak, D.C., Mukhopadhyay, S., Chattopadhyay, T., Singh, S.K. (2017). Characterizing spatial variability of soil properties in alluvial soils of India using geostatistics and geographical information system. Archives of Agronomy and Soil Science, 63(11): 1489-1498. https://doi.org/10.1080/03650340.2017.1296134

[11] Li, C., Wang, X., Qin, M. (2021). Spatial variability of soil nutrients in seasonal rivers: A case study from the Guo River Basin, China. Plos One, 16(3): e0248655. https://doi.org/10.1371/journal.pone.0248655

[12] McBratney, A.B., Minasny, B., Stockmann, U. (2018). Pedometrics. Springer Cham, USA. https://doi.org/10.1007/978-3-319-63439-5

[13] Song, G., Zhang, J., Wang, K. (2014). Selection of optimal auxiliary soil nutrient variables for cokriging interpolation. PLoS One, 9(6): e99695. https://doi.org/10.1371/journal.pone.0099695

[14] Kempen, B., Brus, D.J., Stoorvogel, J.J., Heuvelink, G. B., de Vries, F. (2012). Efficiency comparison of conventional and digital soil mapping for updating soil maps. Soil Science Society of America Journal, 76(6): 2097-2115. https://doi.org/10.2136/sssaj2011.0424

[15] Jabro, J.D., Stevens, W.B., Evans, R.G., Iversen, W.M. (2010). Spatial variability and correlation of selected soil properties in the Ap horizon of a CRP grassland. Applied Engineering in Agriculture, 26(3): 419-428. https://doi.org/10.13031/2013.29957

[16] Cruz, J.S., de Assis Júnior, R.N., Matias, S., Tamayo, J.H.C. (2011). Spatial variability of an Alfisol cultivated with sugarcane. Ciencia E Investigación Agraria, 38(1): 155-164. https://doi.org/10.4067/s0718-16202011000100015

[17] Arrouays, D., Grundy, M.G., Hartemink, A.E., Hempel, J.W., Heuvelink, G.B.M., Young Hong, S., Lagacherie, P., Lelyk, G., McBratney, A.B., McKenzie, N.J., Mendonça- Santos, M.L., Minasny, B., Montanarella, L., Odeh, I.O.A., Sanchez, P.A., Thompson, J.A., Zhang, G.L. (2014). GlobalSoilMap: Toward a fine-resolution global grid of soil properties. Advances in agronomy, 125: 93-134. https://doi.org/10.1016/B978-0-12-800137-0.00003-0

[18] Guevara, M., Arroyo, C., Brunsell, N., Cruz, C.O., Domke, G., Equihua, J., Etchevers, J., Hayes, D., Hengl, T., Ibelles, A., Johnson, K., De Jong, B., Libohova, Z., Llamas, R., Nave, L., Ornelas, J.L., Paz, F., Ressl, R., Schwartz, A., Vargas, R. (2020). Soil organic carbon across Mexico and the conterminous United States (1991-2010). Global Biogeochemical Cycles, 34(3). https://doi.org/10.1029/2019GB006219

[19] Kempen, B., Heuvelink, G.B. M., Brus, D.J., Stoorvogel, J.J. (2010). Pedometric mapping of soil organic matter using a soil map with quantified uncertainty. European Journal of Soil Science, 61(3): 333-347. https://doi.org/10.1111/j.1365-2389.2010.01232.x

[20] Malone, B.P., McBratney, A.B., Minasny, B. (2011). Empirical estimates of uncertainty for mapping continuous depth functions of soil attributes. Geoderma, 160(3-4): 614-626. https://doi.org/10.1016/j.geoderma.2010.11.013

[21] Shrestha, D.L., Solomatine, D.P. (2006). Machine learning approaches for estimation of prediction interval for the model output. Neural networks, 19(2): 225-235. https://doi.org/10.1016/j.neunet.2006.01.012

[22] Zaouche, M., Bel, L., Vaudour, E. (2017). Geostatistical mapping of topsoil organic carbon and uncertainty assessment in Western Paris croplands (France). Geoderma Regional, 10: 126-137. https://doi.org/10.1016/j.geodrs.2017.07.002

[23] Chuancheng, F.U., Zhang, H., Chen, T.U., Lianzhen, L.I., Xinghua, L.I.U., Yongming, L.U.O. (2020). Spatial interpolation of orchard soil pH using soil type and planting duration as auxiliary information. Pedosphere, 30(5): 628-637. https://doi.org/10.1016/S1002-0160(18)60045-1

[24] Hengl, T., Heuvelink, G.B., Stein, A. (2004). A generic framework for spatial prediction of soil variables based on regression-kriging. Geoderma, 120(1-2): 75-93. https://doi.org/10.1016/j.geoderma.2003.08.018

[25] Zhang, S.W., Huang, Y.F., Shen, C.Y., Ye, H.C., Du, Y.C. (2012). Spatial prediction of soil organic matter using terrain indices and categorical variables as auxiliary information. Geoderma, 171: 35-43. https://doi.org/10.1016/j.geoderma.2011.07.012

[26] Mirzaee, S., Ghorbani-Dashtaki, S., Mohammadi, J., Asadi, H., Asadzadeh, F. (2016). Spatial variability of soil organic matter using remote sensing data. Catena, 145: 118-127. https://doi.org/10.1016/j.catena.2016.05.023

[27] Carating, R.B., Galanta, R.G., Bacatio, C.D. (2014). The soils of the Philippines. In World soils book series, USA. https://doi.org/10.1007/978-94-017-8682-9

[28] Goovaerts, P. (1997). Geostatistics for natural resources evaluation. Oxford University Press, US. https://doi.org/10.1093/oso/9780195115383.001.0001

[29] Paterson, S., McBratney, A.B., Minasny, B., Pringle, M.J. (2018). Variograms of soil properties for agricultural and environmental applications. Pedometrics. pp. 623-667. https://doi.org/10.1007/978-3-319-63439-5_21

[30] Bivand, R.S., Pebesma, E., Gómez-Rubio, V. (2013). Applied Spatial Data Analysis with R. Springer New York, USA. https://doi.org/10.1007/978-1-4614-7618-4

[31] Pebesma, E.J. (2006). The role of external variables and GIS databases in geostatistical analysis. Transactions in GIS, 10(4): 615-632. https://doi.org/10.1111/j.1467-9671.2006.01015.x

[32] Yates, S.R., Warrick, A.W. (1987). Estimating soil water content using cokriging. Soil Science Society of America Journal, 51(1): 23-30. https://doi.org/10.2136/sssaj1987.03615995005100010005x

[33] Eldeiry, A.A., Garcia, L.A. (2010). Comparison of ordinary kriging, regression kriging, and cokriging techniques to estimate soil salinity using LANDSAT images. Journal of Irrigation and Drainage Engineering, 136(6): 355-364. https://doi.org/10.1061/(ASCE)IR.1943-4774.0000208

[34] Kunkel, M.L., Flores, A.N., Smith, T.J., McNamara, J.P., Benner, S.G. (2011). A simplified approach for estimating soil carbon and nitrogen stocks in semi-arid complex terrain. Geoderma, 165(1): 1-11. https://doi.org/10.1016/j.geoderma.2011.06.011

[35] Liao, K.H., Xu, S.H., Wu, J.C., Ji, S.H., Lin, Q. (2011). Cokriging of soil cation exchange capacity using the first principal component derived from soil physico-chemical properties. Agricultural sciences in China, 10(8): 1246-1253. https://doi.org/10.1016/S1671-2927(11)60116-8

[36] Wang, K., Zhang, C., Li, W. (2013). Predictive mapping of soil total nitrogen at a regional scale: A comparison between geographically weighted regression and cokriging. Applied Geography, 42: 73-85. https://doi.org/10.1016/j.apgeog.2013.04.002

[37] Fourati, H.T., Bouaziz, M., Benzina, M., Bouaziz, S. (2017). Detection of terrain indices related to soil salinity and mapping salt-affected soils using remote sensing and geostatistical techniques. Environmental monitoring and assessment, 189(4): 1-11. https://doi.org/10.1007/s10661-017-5877-7

[38] Abdennour, M.A., Douaoui, A., Barrena, J., Pulido, M., Bradaï, A., Bennacer, A., Piccini, C., Alfonso-Torreño, A. (2020). Geochemical characterization of the salinity of irrigated soils in arid regions (Biskra, SE Algeria). Acta Geochimica, 40(2): 234-250. https://doi.org/10.1007/s11631-020-00426-2

[39] Pang, S., Li, T.X., Wang, Y.D., Yu, H.Y., Li, X. (2009). Spatial interpolation and sample size optimization for soil copper (Cu) investigation in cropland soil at county scale using cokriging. Agricultural Sciences in China, 8(11): 1369-1377. https://doi.org/10.1016/S1671-2927(08)60349-1

[40] Brady, N.C., Weil, R.R., Weil, R.R. (2008). The nature and properties of soils. Upper Saddle River, USA. pp. 662-710

[41] Ritchie, G.S.P., Dolling, P.J. (1985). The role of organic matter in soil acidification. Soil Research, 23(4): 569-576. https://doi.org/10.1071/SR9850569

[42] Paz-Gonzalez, A., Taboada Castro, M.T., Vieira, S.R. (2001). Geostatistical analysis of heavy metals in a one-hectare plot under natural vegetation in a serpentine area. Canadian Journal of Soil Science, 81(4): 469-479. https://doi.org/10.4141/S00-095

[43] Qu, M., Li, W., Zhang, C. (2013). Assessing the spatial uncertainty in soil nitrogen mapping through stochastic simulations with categorical land use information. Ecological informatics, 16: 1-9. https://doi.org/10.1016/j.ecoinf.2013.04.001

[44] Burgos, P., Madejón, E., Pérez-de-Mora, A., Cabrera, F. (2006). Spatial variability of the chemical characteristics of a trace-element-contaminated soil before and after remediation. Geoderma, 130(1-2): 157-175. https://doi.org/10.1016/j.geoderma.2005.01.016

[45] Cambardella, C.A., Moorman, T.B., Novak, J. M., Parkin, T.B., Karlen, D.L., Turco, R.F., Konopka, A.E. (1994). Field-Scale variability of soil properties in central Iowa soils. Soil Science Society of America Journal, 58(5): 1501-1511. https://doi.org/10.2136/sssaj1994.03615995005800050033x

[46] Goovaerts, P. (1999). Geostatistics in soil science: State-of-the-art and perspectives. Geoderma, 89(1-2): 1-45. https://doi.org/10.1016/S0016-7061(98)00078-0

[47] Miháliková, M., Başkan, O., Dengiz, O. (2015). Capability of different interpolation models and pedotransfer functions to estimate soil hydraulic properties in Büyükçay Watershed. Environmental Earth Sciences, 74(3): 2425-2437. https://doi.org/10.1007/s12665-015-4246-5

[48] Krivoruchko, K. (2011). Spatial statistical data analysis for GIS users. Esri Press, USA.

[49] McBratney, A.B., Webster, R. (1983). How many observations are needed for regional estimation of soil properties? Soil Science, 135(3): 177-183. https://doi.org/10.1097/00010694-198303000-00007

[50] Trangmar, B.B., Yost, R.S., Uehara, G. (1986). Spatial dependence and interpolation of soil properties in West Sumatra, Indonesia: I. Anisotropic variation. Soil Science Society of America Journal, 50(6): 1391-1395. https://doi.org/10.2136/sssaj1986.03615995005000060004x

[51] Vaughan, P.J., Lesch, S.M., Corwin, D.L., Cone, D.G. (1995). Water content effect on soil salinity prediction: A geostatistical study using cokriging. Soil Science Society of America Journal, 59(4): 1146-1156. https://doi.org/10.2136/sssaj1995.03615995005900040029x

[52] Zhang, R., Myers, D.E., Warrick, A.W. (1992). Estimation of the spatial distribution of soil chemicals using pseudo—Cross—Ariograms. Soil Science Society of America Journal, 56(5): 1444-1452. https://doi.org/10.2136/sssaj1992.03615995005600050018x

[53] Angulo-Martínez, M., López-Vicente, M., Vicente-Serrano, S.M., Beguería, S. (2009). Mapping rainfall erosivity at a regional scale: A comparison of interpolation methods in the Ebro Basin (NE Spain). Hydrology and Earth System Sciences, 13(10): 1907-1920. https://doi.org/10.5194/hess-13-1907-2009

[54] Bárdossy, A., Pegram, G. (2013). Interpolation of precipitation under topographic influence at different time scales. Water Resources Research, 49(8): 4545-4565. https://doi.org/10.1002/wrcr.20307