Improving Built-up Extraction Using Spectral Indices and Machine Learning on Sentinel-2 Satellite Data in Mumbai Suburban District, India

Built-up mapping possesses a great challenge owing to the varying spectral signatures and spatial attributes of different features such as buildings, individual houses, roads


INTRODUCTION
There has been a great expansion of impervious surfaces over the last few decades due to rapid urbanization, and industrialization.Impervious surfaces or built-up are manmade structures such as built-up surfaces, concrete structures, roadways, freeways, highways, and parking lots that are water resistant [1].The information about impervious surface distribution is helpful in many applications, such as population analysis, building extraction, and land use analysis [2][3][4][5].Furthermore, this information aids in comprehending the environmental impact of impervious surfaces [6].Urban heat islands are formed when open green permeable spaces are replaced with impermeable infrastructures such as buildings, houses, roads, etc.It is an area where the temperature is considerably higher than the surrounding rural areas as a result of the release of heat trapped by impervious surfaces [7,8].
Impervious surfaces disrupt the cycle of atmospheric carbon because they displace biological vegetation, which lowers ecological output.The uncontrolled expansion of impervious surfaces can lead to various problems, such as the problem of water-logging due to surface water run-off [9], a decrease in the level of groundwater due to the non-porous nature of impervious surfaces [10], and deterioration of the water resources [11], which can disturb the natural water-cycle [12].
Remote sensing and space technology advancements in recent years have opened avenues for effective land use/cover mapping owing to powerful machine learning (ML) algorithms and the availability of high-quality, freely available satellite data.A few of these popular satellites, such as Landsat, Sentinel-2, Worldview, and IKONOS, provide high-quality data with very good temporal and spatial resolution, which are important characteristics for an effective land cover mapping scheme.However, mapping built-up and impervious surfaces possess a great challenge because of different reasons.For example, built-up surfaces are made of various materials, such as concrete, cement, gravel, brick, coal tar, and metal, and their spectral signatures can be identical to those of other materials, such as bare/fallow land, silt, etc. [8,13].
The conventional method for built-up extraction is an exhausting process that requires human expertise, field surveys, and the use of historical data [14].There are different classification techniques used for urban mappings, such as hard classification methods (unsupervised, semi-supervised, and supervised), and soft classification methods (fuzzy-based) [15,16].The unsupervised classification method for built-up extraction is not a very popular method owing to poor classification accuracy [17].Literature indicated that use of spectral indices is one popular method for built-up extraction.However, the problem with spectral indices is that they are subject to vary with the seasons, and sometimes, there may be a misclassification of built-up areas with bare/fallow land areas because of the spectral homogeneity between the two [18].The various other methods to extract impervious surfaces are spectral mixture-based analysis [19], image segmentation [20], ensemble-based learning [21], multiple regression [22], and Gray Level Co-occurrence Matrice (GLCM) features [23].The fusion technique using different satellite datasets such as optical/radar data is another promising technique of image classification that combines spectral bands with various other features such as spectral and textural features [24][25][26][27].Joshi et al. [28] performed a brief literature review of 32 papers on satellite image fusion and found that 28 out of 32 studies reported an increase in the classification accuracy for different land cover studies using the fused dataset.In order to understand the impact of fusion, 12 popular spectral indices have been computed and fused with the four-band Sentinel-2 dataset in this study.
In the past few decades, machine learning algorithms have gained immense popularity owing to their robust performance and faster execution time.These algorithms have been used extensively with different optical satellite datasets such as low spatial resolution datasets (MODIS, AVHRR) [29], medium spatial resolution datasets (Landsat) [30], and very highresolution imagery (QuickBird, Ikonos) [31,32].The optical satellite however cannot penetrate clouds.As such, Synthetic Aperture Radar (SAR) data has become popular due to its unique ability to penetrate clouds [33,34].Very few studies have exploited the potential of high-resolution satellite datasets for built-up mapping [35].High-resolution imagery can address the problems of mixed pixels arising due to lowresolution imagery and thereby increase classification accuracy [36].Besides this, the classification performance is influenced by many factors, such as the complexity of the research area, the spatial/spectral/temporal resolution of the satellite, and seasonal variability [37][38][39].
The recently launched (Sentinel 2A and 2B) passive optical satellites provide multispectral data useful for land cover mapping and other applications because of high spatial/temporal resolution, short revisit times, and free data availability [39].The multispectral bands of the Sentinel-2 data are available at 10 m spatial resolution that can be effectively used for mapping of built-up areas in comparison to the Landsat series data [40,41].Non-parametric algorithms (Decision Tree, RF, SVM, and knowledge-based learning methods) have proven to be extremely useful in recent decades when compared to traditional classifiers such as maximum likelihood due to their robust performance and good fault tolerance [42,43].Das et al. [44] used four ML algorithms, i.e., RF, SVM, Artificial Neural Network (ANN), and Gradient Boosting, to extract building features using very high-resolution imagery.This study found that ANN performed considerably in linear homogeneous building distribution.RF, on the other hand, demonstrated great accuracy in the non-linear and diverse dispersion of urban buildings.Barman and Mustak [45] used SVM with linear and Radial basis function (RBF) kernel to extract the building footprints of Kolkata city in India.Abdi [46] used four ML algorithms (RF, SVM, XGBoost, and Deep learning (DL)) to obtain effective classification results.The results indicated that SVM gave the best performance followed by XGBoost, RF, and DL.Hence, most of the studies suggested the use of non-parametric ML classifiers for built-up extraction owing to robust performance and faster execution.
Spectral indices derived from the spectral bands can be calculated using various mathematical equations that minimize the impact of shadow due to clouds or mountains and enhance the spectral characteristics of the image.The efficiency and performance of these spectral indices vary with respect to study areas owing to different topography and geographical conditions.Chen et al. [47] used six different indices to extract built-up area on different study areas using different ML algorithms.The results indicated that the biophysical composition index (BCI) and combinational build-up index (CBI) were disturbed by the presence of water bodies while three indices (combinational biophysical composition index (CBCI), index-based built-up index (IBI), and normalized difference built-up index (NDBI) were influenced by study area.ENDISI gave the best performance among all six indices with a higher degree of separability and overall accuracy of 91%.Shi et al. [48] found that Impervious surface extraction using Sentinel-2 had reported less error than those created using Landsat data WorldView-3 as well.Valdiviezo-N et al. [49] discussed various built-up indices techniques along with their applications for urban extraction.The results have shown that built-up indices gave a better performance in moist/humid seasons.Osgouei et al. [50] used a combination of indices for built-up mapping which gave better results in comparison to the ten-band Sentinel-2 dataset.Kebedea et al. [51] extracted impervious surfaces based on SDI on Sentinel-2 images by using 7 different built-up indices using an SVM classifier.The results indicated that spectral indices can be used to extract impervious surfaces efficiently.Previously, similar studies based on SDI and histogram overlap were conducted [52,53].
Literature indicated that accurate built-up extraction is a challenging task in the domain of remote sensing and it becomes more critical as different LULC classes have signature overlapping issue.It has been observed from the literature that the potential of SDI is not properly investigation in the area of built-up extraction.In addition, it is also found that the efficiency of advance ensemble technique such as extreme gradient boosting has not been utilized with the fused dataset to obtain effective built-up maps.
This paper aims to understand the impact of different spectral indices on a few popular machine learning classifiers for built-up extraction.Furthermore, to analyze the separability between the bare/fallow land and built-up areas, a few statistical measures such as the SDI and histogram have been utilized.The main objective is to map built-up surfaces using pixel-based, supervised ML classification techniques.The following are the research objectives that are covered in this study: 1. Built-up extraction using spectral bands and twelve popular spectral indices by employing the ML classifiers.What is the impact of spectral indices fusion on the accuracy of ML classifiers when comparing two datasets, i.e., Dataset-1 which consists of purely spectral bands (4 Band Dataset), and Dataset-2, which consists of spectral bands fused with the twelve indices? 2. Analyzing the degree of separability between built-up and bare/fallow land using SDI measures, and perform histogram analysis.Furthermore, to understand the feature importance of spectral indices in addition to spectral bands, the variable importance of the fused dataset has been computed for the respective classifiers 3. Performance evaluation of the XGBoost classifier in comparison to traditional ML techniques such as RF, SVM, and KNN for built-up extraction on both datasets.This paper is organized in the follower manner: Section 2 describe the material and methods adopted for this research work, this section consists of two sub-sections, first one describes the selected study area and second sub-section presents the pre-processing required for data.Section 3 demonstrates the methodology used in this study.The methodology section discusses step-by-step workflow of the study and also provide a brief description of all the ML classifiers (KNN, SVM, RF and XGBoost) used in the study.This section also presents the SDI, as well as discusses Spectral Curve Analysis for various LULC class.Section 4 presents the results and provide in-depth analysis of the outcomes obtained in the study.Section 5 emphasizes on the major conclusions of the study.

Study area
In this research work, the chosen study area is the Mumbai sub-urban district.This city is located in the Konkan Division of Maharashtra state, India.In the state of Maharashtra, it is the second smallest district.Here the population of the study area is 9.36 million.This study site i.e.Mumbai suburban district is covered by the 19° 16' 7.6872"N to 18° 58' 43.7268"N latitude and 72°46 ' 33.0492"E to 72° 58 ' 49.656"E longitude of spatial dimension.In the fast growing developing country, infrastructure is growing at a rapid rate, due to which small town are continuously converting into the big cities.At the same time the agricultural land is converting into high rise building and industrial area.In the field of remote sensing a huge variety of satellite datasets are available to address various classification problems.The commonly used satellite datasets are Landsat series, SPOT Sentinel-1, 2, Rapid Eye, Worldview datasets are available to address different sets of applications.To address this specific application for built-up mapping we require a medium spatial resolution dataset.Therefore, Sentinel-2 dataset has been chosen as it provides the freely available data at 10m spatial resolution.In order to obtain the effective classification maps, it is necessary that the input imagery must be cloud free or with minimum cloud cover.For this research work, Sentinel-2 dataset of 12th January 2021 were utilized.Figure 1 demonstrated the selected study area located in the Maharashtra state of India.Here for better representation, False Color Composite (FCC) format and True Color Composite (TCC) format both are shown along with the location of study area in state and country.

Pre-processing of data
The Sentinel-2A and Sentinel-2B are twin optical satellites which were launched by European Space Agency (ESA) in the year 2015 and 2017 respectively [54].There are thirteen spectral bands provided by the Sentinel-2 dataset (4 bands at 10 m spatial resolution (Red (R), Blue (B), Green (G), and Near Infrared Band (NIR)), six bands at 20m spatial resolution (Veg.(RE)1 Band 5), (Veg.(RE)2 Band 6), (Veg.(RE)3 Band 7), Vegetation Red Edge (8A), Short Wave Infrared Band 1 (SWIR-1), Short Wave Infrared Band 2 (SWIR-2)), and three bands at 60m spatial resolution (Coastal Aerosols, Water Vapor, and Short Wave Infrared Cirrus)).Scientist and researchers around the world utilized Sentinel-2 dataset for different land use/cover studies, such as crop/vegetation mapping [55,56], flood mapping [57,58], forest classification [59,60], surface water mapping [61,62], etc.The images were acquired in Level-1C Top of Atmospheric (TOA) conditions in UTM Zone 43 N projection under a clear sky with a minimum cloud cover.The Level-1C images suffer from the effects of absorption and scattering, for which there is a need to pre-process them.Pre-processing is a crucial step to retain the true values of surface reflectance, the images need to be atmospherically corrected.The pre-processing operations were carried out using Sen2cor processor [63].After preprocessing, the images are converted to Level-2A (Bottom of the Atmosphere format), which can be used for further analysis.

MACHINE LEARNING CLASSIFIERS
This study implemented four machine learning techniques for mapping of built-up area from other LULC classes.A brief description of all machine learning classifiers has been provided in the subsequent section.

Random Forest
Breiman [64] introduced a widely used ML technique that integrates the technique of bootstrapping with feature selection procedures to generate decision trees, with different trees generating a subset of features and training data after replacement.Because of this, there is a decrease in the variance which increases the overall performance of the classifier.It can be used with categorical as well as continuous variables.It can be used for classification and regression tasks.
The unlabeled data is categorized based on a voting scheme.The most common voting mechanism is majority voting.The RF classifier is one of the top-performing classifiers since it is simple to parameterize and avoids overfitting [65].The RF model's performance is defined by two primary factors: • Ntree: number of trees used in aggregation.
• Mtry: number of randomly selected features or predictors used to separate the nodes.

Support Vector Machine
Vapnik and Chervonenkis [66] proposed this algorithm and implemented it as a binary linear classifier.Thereafter, Vapnik [67] made some changes to modify this in order to obtain better performance.It finds an optimal hyperplane that distinguishes data points with the greatest margin or distance between the two classes (known as a maximum-margin hyperplane).It is space efficient and works efficiently even when training data samples are small.The type of kernel used for SVM implementation has a significant impact on its performance.Depending upon the types of applications, different kernels are available such as linear, sigmoid, string function, graph, etc. [68].

KNN
KNN is a robust and popular ML algorithm that categorizes the test data samples from the available class labels by measuring the distance between the test samples and k nearest trained samples [69].The distance can be calculated using different metrics (Minkowski, Euclidean, and Manhattan), of which the Euclidean distance is the most popular distance metric.Here, K, a tuning parameter, regulates how well the classifier performs.A high value of K over fits the model while a low value will give an unstable decision boundary [70].

Extreme gradient boosting
The first description of this algorithm was given by Tianqi and Carlos in 2016 [71].Boost is an improvised scalable version [72] of GBM (Gradient Boosting Machine) that works by building various models that are initially weak learners and make weaker predictions.These weak learning models are used to make stronger models that make better predictions.It works by defining an objective function that uses a loss function in combination with a regularization parameter that enables parallelism, thereby increasing the computation speed of the model.Different objective functions are available, such as softmax, hardmax, etc.The advantage of XGBoost is that it supports parallel computations and is ten times faster than conventional GBM.Various studies have shown the comparable and better performance of XGB Boost over the traditional classifiers such as RF and SVM [73,74].

METHODOLOGY USED
Figure 2 illustrates the proposed methodology.The proposed methodology consists of following major steps: (i) Data Acquisition and pre-processing, (ii) Creation of reference data samples, (iii) Implementation and training of machine learning classifiers, (iv) Assessment of accuracy for each ML model.In this work, a supervised machine learning classification technique is used for Land Use/Cover Classification (LULC) on Sentinel-2 datasets.Each ML model has been trained with the reference dataset.This reference dataset has been created using Google Earth imagery.For this study, two datasets have been created.The first dataset, referred to as Dataset-1, has been formed by stacking four spectral bands i.e.R (10m), G (10m), B (10m), and NIR (10m)).For the second dataset, referred to as Dataset-2, twelve spectral indices (Table 1) have been computed, thereafter these computed indices are integrated with Dataset-1 to form Dataset-2.Studies have shown that down-sampling results in better classification accuracy compared to upsampling, which also helps in preserving spectral information [75,76].To calculate a few indices that involve the usage of Shortwave Infrared Bands (Shortwave IR1 (20m) and Shortwave IR2 (20m)) were down-sampled to 10m spatial resolution.In this study, four machine learning algorithm namely RF, SVM KNN and XGBoost have been used.Literature indicated that RF and SVM are most popular ML algorithm used for a wide range of classification problem using satellite data [23,25,55,56,58,68].KNN has been used as a baseline classifier.In addition, this study considered advance ensemble classifier i.e.XGBoost.It has been observed that very few studies utilized the XGBoost classifier using satellite imagery.Therefore, in this study, we have chosen XGBoost classifier to explore its potential for effective built-up extraction and compare its performance with wellknown ML classifier.
To evaluate the performance of ML classifiers various accuracy measures have been used such as overall accuracy and kappa coefficient are computed as an overall measure for a classifier.Whereas, precision and recall are computed for a specific LULC class for each classifier.These values are calculated using True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN).Kappa Coefficient is popular measure utilized for various remote sensing.Kappa value varies from 0 to 1.This measure demonstrates the difference between the actual agreement and the agreement expected by chance.A higher value of kappa 1 indicated that the resultant classified image and the reference image are identical.Therefore, the higher value of kappa is desirable.The above discussed accuracy measures are calculated using Eqs.( 1)-( 4) respectively.

Spectral discrimination index
Mapping of built-up and impervious surfaces possess a great challenge because of different reasons.For example, built-up surfaces are made of various materials, such as concrete, cement, gravel, brick, coal tar, and metal, and their spectral signatures can be identical to those of other materials, such as bare/fallow land, silt, etc.As such, this study exploits the separability between the two classes using the SDI.The SDI is used to measure the degree of difference between different land cover classes [88].SDI can be calculated using Eq. ( 5).Here  1 ,  2 represents the mean index values for classes 1 and 2 respectively, whereas  1 ,  2 represents the standard deviation of classes 1 and 2 respectively.If SDI values are less than 1, there is spectral homogeneity between the classes and the ability to differentiate is poor, whereas if SDI values lie between 1 and 3 histogram means can be well distinguished and classes can be moderately separated.However, if SDI values are greater than 3, there is no overlap between the spectral signatures, and classes can be separated perfectly.Table 2 and Figure 3 show the SDI values obtained for the built-up surfaces and bare/fallow land.

Spectral Curve Analysis for different land cover classes
Figure 4 shows that different land cover classes can be easily separated from each other based on their spectral signature.However, it was challenging to separate the bare/fallow land from the built-up areas as they shared a similar spectral signature, and hence it became necessary to investigate their separability using a combination of different spectral indices.The spectral signature curve also shows that the spectral profile of fallow land and built-up areas are almost identical for bands (RE1, RE2, RE3, NIR, Vegetation Red Edge (8A)), indicating a partial separability between the two.Whereas, there was better separability for bands Blue, Green, SWIR1, and SWIR2 for the built-up and fallow land.This also highlights the fact that most of the proposed indices for builtup exploit the characteristics of bands Blue (10m), Green (10m), Red (10m), NIR (10m), SWIR1 (20m), and SWIR2 (20m).

RESULT AND DISCUSSION
This study aimed to map built-up land cover class using Sentinel-2 imagery and ML models.Evaluation metrics included histogram and Standard Deviation Index.The dataset comprised spectral bands and an integration of spectral indices.Shortwave infrared bands were resampled to 10m spatial resolution using Nearest Neighbors.Stratified random sampling procedure with 70% training and 30% testing was employed.Here, total number of samples is equal to 4503 pixels.For the implementation R programming language has been used.All the ML models have been trained on default values of tuning parameters.Results utilized down-sampling for better classification accuracy and spectral information preservation information [76,77].The study enhances understanding of ML classifier performance in built-up feature extraction from Sentinel-2 imagery, with implications for urban mapping and land cover studies.To obtain the classification maps, the first three classes were clubbed together and referred to as "pervious surfaces" while the other class is referred to as "built-up surfaces".Table 3 shows the accuracy metrics attained for the RF, SVM, KNN and XGBboost.Table 4 shows the feature importance of the spectral bands and spectral indices for the RF, SVM, KNN, and XGBoost classifiers, respectively.All twelve spectral indices are calculated and results are shown in Figure 5.

Classification results
• Four classes were identified: water body, vegetation, fallow land, and built-up areas.Pervious surfaces included water, vegetation, and fallow land, while builtup surfaces comprised roads, parks, and other impervious structures.
• Tables 5 and 6 show the accuracy measures for the RF, SVM, KNN, and XGBoost classified dataset for 4-band data and 16-band data respectively.
• To assess the performance of classification, various accuracy metrics were calculated (overall accuracy (OA), Kappa score (k), Precision, Recall through the confusion matrices for the respective classifiers.
• The OA obtained for Dataset-

Variable importance analysis
• Feature importance analysis revealed the significant contribution of spectral indices along with spectral bands to classification.There is no model-specific method for calculating the variable importance of KNN and SVM; they are all calculated based on loss squared variable importance, which is identical for both models.Figure 7 displays the comparative analysis of the variable importance of different ML models with respect to the spectral bands and the twelve spectral indices.

Model limitations
• Acknowledging challenges in misclassification with bare/fallow land due to spectral similarity, the study highlights the effectiveness of spectral indices in mitigating this issue.• Further insights into the limitations of the approach, such as potential challenges under specific environmental conditions, could provide a more comprehensive understanding.

Conclusion and efficacy of approach
• The study successfully demonstrated the efficacy of integrating spectral indices with spectral bands in improving built-up surface extraction accuracy.• XGBoost emerged as the top-performing classifier, showcasing the potential of the proposed approach for mapping built-up and impervious surfaces.
• The misclassification of built-up areas with bare land was reduced when the dataset was integrated with spectral indices, underscoring the practical utility of this enhancement.Figure 8 displays the KNN, RF, SVM and XGBoost classified data maps obtained for datasets 1 and 2.

CONCLUSION
The major findings of this study are as follows: 1. Key findings and highlights • Integration of spectral indices with spectral bands resulted in a notable increase in classification accuracy.• Results demonstrated a significant rise of 4.81% (RF), 3.99% (SVM), 3.33% (KNN), and 5.4% (XGBoost) in overall accuracy (OA) after integrating selected spectral indices.
• The outcome of this study suggests that the ENDISI and MNDWI are very useful spectral indices for built-up extraction with a higher degree of separability for built-up and bare/fallow land separation.

Quantified improvements in accuracy metrics
• All machine learning classifiers demonstrated good accuracy, ranging nearly from 86% to 89% for the 4-band dataset and 89% to 94% for the 16-band dataset.• The extreme gradient boosting model exhibited the best overall performance with accuracy values of 88.90% and 94.30% for the 4-band and 16-band datasets, respectively.

Real world applications and impact
• The study's findings have direct implications for accurately mapping built-up surfaces, crucial for urban planning, infrastructure development, and land-use management.
• Highlighting the effectiveness of specific spectral indices like BSI, ENDISI, NDBI, MNDWI, and UI in separating built-up and bare/fallow land classes enhances the utility of remote sensing data in land cover analysis.

Future work and applications
• The study suggests future investigations into the impact of seasonal variation on the performance of spectral indices, acknowledging their susceptibility to variations in seasonal patterns.• Limitations include the need for further research to address potential challenges and refine the proposed approach for broader applicability.Future studies could extend the proposed approach to different geographical areas to evaluate its generalizability and robustness.

Figure 1 .
Figure 1.(a) Study area located in Maharashtra, India; (b) Maharashtra state district-wise division; (c) Study area boundary; (d) Study area in false color Composite; (e) Study area shown as the true color composite of surface reflectance images of 12th January 2021

Figure 3 .
Figure 3. SDI Values between built-up and bare/fallow land for different indices

Figure 4 .
Figure 4. Spectral signature curve for the selected LULC classes for Sentinel-2 bands

Figure 6 .
Figure 6.Histograms of different indices showing degrees of overlap between built-up class and bare/fallow land class

Figure 7 .Figure 8 .
Figure 7.Comparison of the variable importance of classifiers concerning spectral bands and spectral indices

Table 1 .
Various indices and their description

Table 2 .
Values of SDI between built-up and bare/fallow class Figure 2. Proposed classification methodology used in the study

Table 3 .
Accuracy metrics for the Dataset-1 and the Dataset-2

Table 5 .
Accuracy measures obtained by RF, SVM, KNN and XGBoost for Dataset-1

Table 6 .
Accuracy measures obtained by RF, SVM, KNN and XGBoost for Dataset-2