Machine Learning Achieves High Accuracy with 99% Variance Retention in Hyperspectral Imaging

Summarize this article with:
Hyperspectral imaging offers detailed spectral information for analysing environments and materials, but the sheer volume of data and strong correlations between features often hinder machine learning applications, particularly when limited ground-truth data exists. Parisa Parand from Allameh Tabataba’i University and Mahmoud Samadpour from K. N. Toosi University of Technology, along with their colleagues, tackled this challenge by investigating how Principal Component Analysis (PCA) can improve machine learning performance with hyperspectral data. Their work demonstrates that reducing the initial 150 spectral bands to just two principal components retains over 99% of the data’s variance, simplifies data complexity, and crucially, enhances predictive accuracy. By training a machine learning model on this reduced dataset, the team achieved a remarkably high coefficient of determination of 94.7%, establishing PCA as a powerful technique for efficient and accurate analysis in hyperspectral imaging workflows. Hyperspectral Imaging and Soil Moisture Estimation with Dimensionality Reduction. High dimensionality and strong feature correlation pose significant challenges for machine learning models, especially when ground-truth datasets are limited. This study investigates a hyperspectral dataset comprising 150 spectral bands with soil moisture as the target variable. The optimal number of principal components was determined to be two, retaining more than 99% of the total variance, as supported by analysis of the covariance matrix and eigenvalue distribution. Projecting the data onto these components improved visualization and interpretability compared to the original high-dimensional space, revealing a clearer separation of target values and decreasing data complexity. A Random Forest regression model trained on the PCA-transformed data achieved a coefficient of determination (R2) of 94.7%, demonstrating that PCA-based feature reduction can enhance computational efficiency while preserving strong predictive capability in hyperspectral machine learning workflows., Optical imaging systems increasingly capture large volumes of high-dimensional data, enabling detailed characterization of materials and environmental conditions. Hyperspectral optical imaging provides dense spectral information by recording reflectance or radiance values across dozens to hundreds of contiguous wavelength bands. These multidimensional datasets offer rich feature representations but present computational and analytical challenges due to their size, redundancy, and strong band-to-band correlations. Machine learning (ML) methods have become essential tools for extracting meaningful information from high-dimensional optical datasets, supporting tasks such as material identification, biomedical tissue analysis, food quality monitoring, environmental assessment, and precision agriculture. While research has focused on hyperspectral classification, fewer studies have explored hyperspectral regression, where the goal is to estimate continuous physical or chemical parameters. Hyperspectral regression problems are often impacted by the curse of dimensionality, requiring large datasets for reliable generalization. Because hyperspectral bands exhibit strong correlations and redundant information, the intrinsic dimensionality is frequently lower than the spectral resolution. Dimensionality reduction is therefore essential for efficient and robust hyperspectral machine learning pipelines. Among available techniques, PCA remains widely used due to its simplicity, interpretability, and computational efficiency. PCA orthogonally transforms the data into a new coordinate system where the principal components capture the maximum variance. This study demonstrates that PCA-based dimensionality reduction significantly enhances machine learning performance in hyperspectral optical imaging. By compressing spectral bands into two principal components while retaining over 99% of the variance, PCA reduced redundancy, improved computational efficiency, and enabled clearer separation of spectral patterns relevant to soil moisture. The results confirm that PCA is an effective strategy for simplifying high-dimensional hyperspectral datasets without compromising predictive accuracy, supporting the development of more interpretable and scalable machine learning models for optical imaging applications. Future work should evaluate alternative feature extraction methods and test the framework across diverse imaging scenarios and sensor platforms.,.
Hyperspectral Soil Moisture Prediction with Machine Learning This study pioneers a methodology for enhancing machine learning performance with hyperspectral imaging data, specifically addressing the challenges posed by high dimensionality and feature correlation. Scientists employed a dataset comprising 679 samples, each characterized by a continuous soil moisture value and 125 spectral bands ranging from 450nm to 950nm, acquired using a Cubert UHD 285 hyperspectral snapshot camera during a field campaign in Germany. The system captured 50×50 pixel images with an approximate spectral resolution of 4nm, while reference soil moisture values were obtained using a TRIME-PICO Time-Domain Reflectometry sensor. To ensure reproducibility, the research team implemented a controlled experimental framework, initializing a fixed random state using a Python-based workflow. Data preparation involved importing the dataset using the pandas library and decomposing it into a feature matrix and a target vector. Prior to PCA, all spectral features underwent standardization, ensuring each band contributed equally to the analysis. Researchers then trained a Random Forest regression model on the PCA-transformed data to evaluate the impact of dimensionality reduction on predictive performance. This approach demonstrated a coefficient of determination (R2) of 94.7%, indicating that PCA-based feature reduction effectively preserves strong predictive capability while enhancing efficiency in hyperspectral machine learning workflows. Researchers addressed the challenges posed by high dimensionality, specifically 125 spectral bands, by reducing the dataset to just two principal components while retaining over 99% of the original variance. Analysis of the covariance matrix, eigenvalue distribution, and a scree plot validated this dimensionality reduction, confirming that additional components contribute marginally to the overall data representation.
The team measured soil moisture, ranging from approximately 25% to 43%, with the majority of samples centered around 32%, indicating a moderately narrow distribution across the measurement period. A correlation heatmap revealed strong inter-band correlations, particularly at 642 nm and 742 nm, indicating spectral redundancy within the measured wavelengths. This redundancy supports the use of dimensionality reduction techniques to lower computational complexity while preserving essential information.
Applying Principal Component Analysis (PCA), a Random Forest regression model trained on the transformed data achieved a coefficient of determination (R²) of 0.947, demonstrating that approximately 94.7% of soil moisture variation can be explained by hyperspectral features. Projection of the data onto the first two principal components showed clear clustering, effectively separating samples based on soil moisture levels. These findings confirm that PCA is an effective approach for simplifying high-dimensional hyperspectral datasets without compromising predictive performance. This supports the development of more interpretable and scalable machine learning models for optical imaging applications. Future work will explore alternative feature extraction methods and test the framework across diverse imaging scenarios. 👉 More information 🗞 Assessing the Effect of PCA-Based Dimensionality Reduction on Machine Learning Performance in Hyperspectral Optical Imaging 🧠 ArXiv: https://arxiv.org/abs/2512.15544 Tags:
