Benchmarking Computational Intelligence Techniques for Accurate Land Use and Land Cover Classification Using Sentinel-2 Imagery: A Comparative Analysis of CNNs, Vision Transformers, and Random Forests
Keywords:
Land Use and Land Cover (LULC) Classification, Sentinel-2 Imagery, Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), Random ForestAbstract
Recent advancements in computational intelligence have transformed the landscape of remote sensing, particularly in land use and land cover (LULC) classification. This study investigates and benchmarks the performance of three prominent approaches—Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and Random Forests—on high-resolution Sentinel-2 imagery to classify heterogeneous land cover types with improved precision and interpretability. A robust methodological pipeline was designed, including preprocessing, model training, validation, and spatial visualization. Evaluation metrics such as accuracy, precision, recall, and F1-score were computed to compare model effectiveness. Results revealed that ViTs outperformed both CNNs and Random Forests, achieving superior generalization across spectrally complex classes like medium and dense residential areas. CNNs demonstrated strength in local spatial feature extraction, while Random Forests provided quick classification but with reduced accuracy for mixed-use zones. The study further employed Grad-CAM and attention visualization techniques for explainability, highlighting model decision regions. Our findings validate the growing role of deep learning and transformer-based models in LULC mapping and suggest hybrid or ensemble strategies for optimal performance. The outcomes provide valuable insights for urban planning, environmental monitoring, and geospatial policy-making.
