Indonesia, a nation of over 17,000 islands spanning 1.9 million square kilometers, faces a critical challenge in creating detailed maps to support its development goals.
With only 3% of the country covered by large-scale topographic maps (1:5000 scale), traditional methods like manual stereo-plotting and field surveys are too slow to meet urgent needs for urban planning, disaster management, and environmental conservation.
A groundbreaking study published in Remote Sensing in 2025 offers a solution: a deep learning framework that automates land cover classification using very-high-resolution satellite imagery.
The Challenge of Mapping Indonesia’s Topography
Indonesia’s size and complexity make mapping a monumental task. The Geospatial Information Agency (BIG), responsible for national mapping, currently produces 13,000 square kilometers of topographic maps annually.
At this rate, mapping the entire country would take over a century. Even if forested areas—which cover nearly half of Indonesia—are excluded, completing the remaining terrain would still require 60 years.
This slow progress clashes with national priorities like the One Map Policy, introduced in 2016 to standardize maps across sectors and avoid conflicts in land use. Scaling this policy to 1:5000 maps is essential but far behind schedule.
Topographic maps are detailed representations of natural and human-made features on Earth’s surface, including elevation (hills, valleys), water bodies, roads, buildings, and vegetation.
They serve as foundational tools for infrastructure planning, disaster response, and environmental monitoring. For Indonesia, creating these maps at a 1:5000 scale (where 1 cm on the map equals 50 meters on the ground) is critical for precision in projects like road construction or flood modeling.

Land cover data, a subset of topographic maps, refers to the physical material on Earth’s surface, such as forests, urban areas, or water. Unlike land use (which describes how humans utilize the land, e.g., residential or industrial zones), land focus on observable features.
Accurate land cover maps help governments track deforestation, monitor urban sprawl, or assess agricultural productivity. Traditionally, analysts manually label these features pixel by pixel using aerial photos or satellite images, a process that is both time-consuming and prone to human error.
For example, identifying roads or small buildings in dense urban areas can take days of meticulous work. The 2025 study addresses this bottleneck by replacing manual efforts with artificial intelligence, specifically deep learning, to automate land cover classification.
AI-Driven Satellite Imagery Analysis
The research focused on Mataram City, a small but diverse urban area on Lombok Island, as a test case. The team used Pleiades satellite imagery from 2015, which included high-resolution panchromatic (0.5 meters) and multispectral (2 meters) data.
Panchromatic images capture fine spatial details in grayscale, while multispectral images provide color and infrared information across specific wavelength ranges (e.g., red, green, blue, near-infrared).
To combine these strengths, the researchers applied a technique called pan-sharpening, which merges the high-resolution grayscale data with lower-resolution color imagery. This process produced crisp, detailed images with a resolution of 0.5 meters, ideal for detecting small features like roads or individual buildings.
Pan-sharpening is vital because it retains the rich spectral information of multispectral data while enhancing spatial clarity, ensuring that colors align accurately with physical features.
Next, the team extracted additional information from the images to improve classification accuracy. They calculated the Normalized Difference Vegetation Index (NDVI), a measure of plant health derived from near-infrared (NIR) and red light reflection.
Healthy vegetation reflects more near-infrared light and absorbs more red light due to chlorophyll activity. The formula NDVI=(NIR−Red)/(NIR+Red) produces values between -1 and 1, where higher values indicate denser, healthier vegetation.
NDVI is invaluable for distinguishing forests, farmland, and urban green spaces. For instance, in this study, NDVI helped differentiate between lush plantations and bare soil.
Texture analysis was another key step. Using a statistical method called the Gray-Level Co-occurrence Matrix (GLCM), the researchers quantified patterns in the imagery, such as the roughness of agricultural fields versus the smoothness of paved roads.
GLCM works by analyzing how often pairs of pixels with specific values and spatial relationships (e.g., horizontally adjacent) occur in an image. From this matrix, metrics like homogeneity (uniformity of pixel values), contrast (local intensity variations), and entropy (randomness of pixel distribution) are calculated.
These texture metrics helped the AI model differentiate between similar-looking land cover types—for example, distinguishing between asphalt roads and dark soil patches.
To simplify the data, the team applied Principal Component Analysis (PCA), a technique that identifies the most significant patterns in a dataset. PCA reduces redundancy by transforming correlated variables (e.g., multiple texture bands) into a smaller set of uncorrelated components.
In this study, PCA condensed five texture bands into two principal components while retaining 95% of the original information. This streamlined the input for the deep learning model, improving both accuracy and computational efficiency.
U-Net Deep Learning for Land Cover Topography
The heart of the study was a deep learning model based on the U-Net architecture, a type of convolutional neural network (CNN) widely used in image segmentation tasks.
Named for its U-shaped design, U-Net consists of two main parts: an encoder that analyzes the image to extract hierarchical features (e.g., edges, textures) and a decoder that reconstructs the image with pixel-wise labels.
The encoder uses convolutional layers and pooling to downsample the image, capturing broad patterns, while the decoder upsamples the data to restore spatial resolution. Skip connections between encoder and decoder layers preserve fine details, enabling precise boundary detection—a critical feature for mapping narrow roads or irregularly shaped buildings.

The model used a ResNet34 backbone—a pre-trained network renowned for its depth and efficiency. ResNet34 belongs to the residual network family, which introduces “shortcut connections” to bypass layers, mitigating the vanishing gradient problem (where deep networks struggle to learn due to diminishing updates during training).
By leveraging ResNet34’s ability to recognize complex patterns from ImageNet (a massive image database), the model required less training data and time to adapt to satellite imagery.
Training the model required 1,440 image tiles, each 512×512 pixels, covering six land cover classes: buildings, roads, agricultural land, bare land, plantations, and water bodies.
The dataset had inherent imbalances; roads and water bodies made up just 3.7% and 4.2% of the samples, respectively, while buildings and agricultural land accounted for over 25% each. Despite this challenge, the model was trained over 200 epochs—a balance between accuracy and computational cost—with a batch size of 2 due to memory constraints.
An epoch refers to one complete pass of the training data through the model, while batch size determines how many samples are processed before updating the model’s parameters. Smaller batch sizes reduce memory usage but may slow training.
Enhancing Maps with Morphological Processing
Even the best AI models produce errors, such as misclassifying isolated pixels or creating jagged edges around features. To address this, the researchers applied morphological processing, a technique that smooths out imperfections using operations like erosion and dilation.
Erosion removes thin layers of pixels from object boundaries, eliminating tiny misclassified patches, while dilation adds pixels to expand object boundaries, filling gaps in linear features like roads.
These operations rely on a structuring element (a small matrix) that slides over the image to modify pixel values. The optimal kernel size for these operations (5×5 pixels) was determined through semi-variance analysis, a geostatistical method that quantified spatial patterns in the imagery.
Semi-variance measures how much pixel values differ at varying distances, helping identify the scale at which texture features (e.g., building clusters) are most distinct.
AI Boosts Mapping Speed and Accuracy
The model achieved an initial accuracy of 84% (kappa score = 0.79), which rose to 86% (kappa = 0.81) after post-processing. The kappa score (Cohen’s kappa) measures agreement between predicted and actual classifications, adjusting for random chance.
A score of 0.81 indicates “almost perfect” agreement, surpassing the 0.61–0.80 range considered “substantial.” Water bodies and plantations were classified with near-perfect accuracy (97% and 96%, respectively), while roads—challenged by their thin, linear shape and shadows—reached 85%.

Buildings and agricultural land also performed well, with F1-scores of 88% and 83%. The F1-score, a harmonic mean of precision and recall, balances false positives and false negatives, making it ideal for evaluating imbalanced datasets.
The efficiency gains were even more striking. Traditional stereo-plotting, which involves manually labeling features in 3D aerial imagery, takes nine days per map sheet (5.29 km²) for buildings and vegetation.
The AI-driven approach reduced this to 43 minutes per sheet—a 250-fold improvement. Training the model required 17 hours initially, but once trained, it could classify vast areas with minimal human intervention. Scaling this system could allow Indonesia to map 9,000 km² annually, cutting the projected completion time from over a century to just 15 years.
AI Mapping Advances Global Sustainability
The implications extend far beyond Indonesia. Automated land cover classification supports global efforts like the UN Sustainable Development Goals (SDGs). For instance, tracking deforestation (SDG 15) or urban expansion (SDG 11) becomes faster and more precise.
In disaster-prone regions, such as flood-prone areas, up-to-date maps can identify vulnerable communities and plan evacuation routes.
Farmers also benefit; accurate land cover data enables precision agriculture, optimizing water use and crop yields by monitoring soil health and vegetation stress via NDVI.
However, challenges remain. The model’s performance on underrepresented classes like roads highlights the need for balanced training data. Future work could incorporate transfer learning, a technique where a model pre-trained on one task (e.g., general image recognition) is fine-tuned for a specific application (e.g., road detection in satellite imagery).
This reduces the need for massive labeled datasets, which are costly to create. Testing advanced architectures like U-Net3+, which enhances feature aggregation across scales, or transformer-based models (which excel at capturing long-range dependencies in images) could further improve accuracy.
However, integrating Lidar (Light Detection and Ranging) or radar data could also enhance results, especially in cloudy regions where optical satellites struggle.
Conclusion: A New Era for Geospatial Science
This study marks a turning point in topographic mapping. By automating land cover classification, countries can produce accurate maps faster and cheaper than ever before. For Indonesia, this technology is not just a convenience—it’s a necessity to manage its rapid urbanization, protect its forests, and prepare for climate-related disasters.
As AI and satellite technology advance, the vision of real-time, high-resolution mapping is within reach, empowering governments and communities to build a more sustainable future.
Reference: Hakim, Y.F.; Tsai, F. Deep Learning-Based Land Cover Extraction from Very-High-Resolution Satellite Imagery for Assisting Large-Scale Topographic Map Production. Remote Sens. 2025, 17, 473. https://doi.org/10.3390/rs17030473
Satellite Imagery




