Soil property-driven fertilization in slash pine orchards: a stacking framework with PLSR and neural networks

Jiaming Sun; Cong Xu; Haotian Zhao; Xianyin Ding; Qifu Luan; Jiaming Sun; Cong Xu; Haotian Zhao; Xianyin Ding; Qifu Luan

doi:10.48130/smartfor-0025-0002

Accurate assessment of soil nutrients in slash pine (Pinus elliottii Engelm.) seed orchards is crucial for precision fertilization; however, conventional chemical and stoichiometric methods struggle to provide rapid, joint diagnostics for multiple nutrients. To address this gap, a Stacked Single Target (SST) framework was developed that combines Partial Least Squares Regression (PLSR) and neural networks within a stacked generalization architecture from near-infrared (NIR) spectra. In a slash pine seed orchard, 115 tree-based soil sampling points were established, from which laboratory measurements of soil nutrients and NIR spectral data were obtained to train and validate the model. The SST framework first uses clusters of PLSR models to extract features related to each target variable and then employs a neural-network meta-learner on these outputs to perform Multi-target Regression (MTR). The results show that the SST framework achieved a coefficient of determination (R²) of 0.88, and a root-mean-square error (RMSE) of 0.093 for soil organic matter, representing an increase in R² of approximately 0.25, and a substantial reduction in RMSE compared with a standalone PLSR model. For macronutrient grade classification, the average classification accuracy reached 93%. Even under a limited sample size, the SST model maintained stable performance, alleviating the overfitting issues that commonly occur in small, heterogeneous sites. Overall, by combining traditional PLSR with neural networks in a stacked generalization framework, this study provides a scalable and rapid diagnostic tool for soil nutrient assessment that can support precision fertilization decisions in constrained environments, helping to balance productivity gains with the mitigation of fertilizer pollution.

HTML

Introduction

Slash pine (Pinus elliottii Engelm.) seed orchards are critical for supplying genetically superior germplasm to support afforestation and carbon sequestration programs in southern China^[1,2]. For established orchards in their prime production phase, effective nutrient management through fertilization is one of the most practical levers for enhancing seed yield^[3], with studies in conifer species demonstrating over 18% gains in seed and kernel weight from optimized fertilization^[4].

There are limited options for improving seed orchard yields. During the pre-establishment phase, cultivation materials can be selected. Meanwhile, in the early establishment phase, the seed orchard can undergo thinning and topping. However, for many seed orchards in their prime production period, tending and fertilization remain the most effective ways to increase yields^[3,5]. Yet, traditional fertilization, which is often based on experience, is unsustainable^[6]. Consequently, precision fertilization is urgently needed to prevent fertilizer pollution during large-scale afforestation^[7]. Precision fertilization has arisen as a scientific and environmentally friendly strategy to address the issues of under- and over-fertilization^[8,9]. For seed orchards, soil texture varies across regions. Thus, a refined soil condition assessment method should be established specifically for seed orchards. Traditional approaches to understanding forest soil nutrients involve collecting soil samples and conducting chemical analysis in laboratories. The specialized methods required for different elements result in extended detection times and elevated costs, which hinder the widespread adoption of precision fertilization^[10]. While the application of machine learning in soil composition prediction is gaining traction^[11,12], research focusing on soil composition prediction models for seed orchards remains scarce. The primary challenge in developing precision fertilization models for seed orchards stems from their characteristically small scale, which is intentionally designed for seed production and scientific research. These orchards, typically under 10 hectares in size, exhibit high spatial heterogeneity but yield limited soil sample sizes, often fewer than 200. This data scarcity is a critical constraint, as conventional machine learning models generally require larger datasets of over 500 samples to mitigate overfitting^[13]. Despite the proven efficacy of Near-Infrared (NIR) spectroscopy for rapid soil assessment in broad agriculture^[14], dedicated, data-efficient modeling frameworks tailored to the specific context of seed orchards remain scarce.

Previous efforts to predict soil properties from spectroscopic data have relied heavily on traditional Partial Least Squares Regression (PLSR), which has been widely used to estimate soil organic matter as well as macronutrients such as nitrogen, phosphorus, and potassium, owing to its robustness under high-dimensional, collinear conditions^[15,16]. However, the linear nature of PLSR limits its ability to capture complex non-linear relationships and interactions in soil–spectra responses. To improve predictive performance, many studies have explored alternative machine learning models, including random forests, support vector machines, gradient boosting algorithms and artificial neural networks, often in combination with different spectral and feature-analysis techniques such as visible–near-infrared (VIS–NIR)^[17,18], mid-infrared (MIR), and hyperspectral imaging, together with principal component analysis, variable importance in projection (VIP) or band-selection methods^[19,20]. While these approaches have enhanced soil nutrient prediction in various agricultural settings, they typically require relatively large sample sizes to avoid overfitting, and their applicability and stability remain uncertain in small, highly heterogeneous seed orchards.

To address this challenge, this study turns to Multi-target Regression (MTR), an inductive transfer method that leverages the domain-specific information embedded within the training signals of related tasks. We hypothesize that by exploiting the inherent correlations between different soil nutrients, MTR can implicitly expand the effective data volume and bolster generalization, thereby mitigating over-fitting risks in data-scarce contexts^[21,22]. The implementation of an effective MTR framework, however, necessitates an organizational structure capable of coordinating multiple tasks and facilitating generalization learning. For this purpose, neural networks present a favorable choice, as their architecture has been demonstrated to foster robust generalization learning across various tasks^[23−25].

Within a stacked generalization architecture, the choice of base model is critical. While various machine learning models can serve this role^[26], we specifically opted for PLSR. PLSR is a well-regarded algorithm in soil spectroscopy studies^[27], prized for its robustness and superior performance with high-dimensional data on limited samples^[28,29]. However, PLSR's inherent limitations in complex, non-linear modeling is precisely where the neural network meta-learner complements it, capturing intricate patterns from the PLSR outputs for final prediction.

This integrated modeling approach is supplied by rapid and cost-effective data acquisition using Near-Infrared (NIR) spectroscopy. NIR spectroscopy interacts with fundamental chemical bonds (e.g., C-H, O-H), enabling the analysis of key soil constituents like organic matter and macronutrients^[14,29], making it an ideal partner for machine learning-based soil diagnostics. Therefore, by marrying NIR spectroscopy with a novel stacked generalization framework that combines PLSR-base learners and a neural network meta-learner, we aim to create a hybrid model for comprehensive soil nutrient prediction in slash pine seed orchards. Our goals include:

(1) Leveraging hybrid models to address the limited data issue in small-scale site modeling by diversifying data types.

(2) Investigate the predictive ability of hybrid models for different soil elements and the ranking ability of the optimal model.

[1]	Li Y, Sun H, Tomasetto F, Jiang J, Luan Q. 2022. Spectrometric prediction of nitrogen content in different tissues of slash pine trees. Plant Phenomics 2022:9892728 doi: 10.34133/2022/9892728 CrossRef Google Scholar
[2]	Zhang Y, Diao S, Ding X, Sun J, Luan Q, et al. 2023. Transcriptional regulation modulates terpenoid biosynthesis of under drought stress. Industrial Crops and Products 202:116975 doi: 10.1016/j.indcrop.2023.116975 CrossRef Google Scholar
[3]	Liesebach H, Liepe K, Bäucker C. 2021. Towards new seed orchard designs in Germany - A review. Silvae Genetica 70(1):84−98 doi: 10.2478/sg-2021-0007 CrossRef Google Scholar
[4]	Loewe-Muñoz V, del Río R, Delard C, Balzarini M. 2023. Effect of fertilization on Pinus pinea cone to seed and kernel yields. Forest Ecology and Management 545:121249 doi: 10.1016/j.foreco.2023.121249 CrossRef Google Scholar
[5]	Alan M, Sabuncu R, Ezen T, Kaplan S. 2018. The effects of top pruning on growth and production of conelets and cones in ten seed orchards of different ages. Sumarski List 142(5-6):269−82 doi: 10.31298/sl.142.5-6.1 CrossRef Google Scholar
[6]	Zhu YG, Meharg AA. 2015. Protecting global soil resources for ecosystem services. Ecosystem Health and Sustainability 1(3):1−4 doi: 10.1890/ehs15-0010.1 CrossRef Google Scholar
[7]	Hou ZM, Xiong Y, Luo JS, Fang YL, Haris M, et al. 2023. International experience of carbon neutrality and prospects of key technologies: lessons for China. Petroleum Science 20(2):893−909 doi: 10.1016/j.petsci.2023.02.018 CrossRef Google Scholar
[8]	Bacenetti J, Paleari L, Tartarini S, Vesely FM, Foi M, et al. 2020. May smart technologies reduce the environmental impact of nitrogen fertilization? A case study for paddy rice. Science of The Total Environment 715:136956 doi: 10.1016/j.scitotenv.2020.136956 CrossRef Google Scholar
[9]	Bai Q, Zhang X, Luo H, Li G. 2021. Control system for auto-targeting precision variable-rate fertilization of fruit trees in a greenhouse orchard. Transactions of the Chinese Society of Agricultural Engineering 37(12):28−35 doi: 10.11975/j.issn.1002-6819.2021.12.004 CrossRef Google Scholar
[10]	Termin D, Linker R, Baram S, Raveh E, Ohana-Levi N, et al. 2023. Dynamic delineation of management zones for site-specific nitrogen fertilization in a citrus orchard. Precision Agriculture 24(4):1570−92 doi: 10.1007/s11119-023-10008-w CrossRef Google Scholar
[11]	Wang Y, Chen S, Hong Y, Hu B, Peng J, et al. 2023. A comparison of multiple deep learning methods for predicting soil organic carbon in Southern Xinjiang, China. Computers and Electronics in Agriculture 212:108067 doi: 10.1016/j.compag.2023.108067 CrossRef Google Scholar
[12]	Liu K, Wang Y, Wang X, Sun Z, Song Y, et al. 2023. Characteristic bands extraction method and prediction of soil nutrient contents based on an analytic hierarchy process. Measurement 220:113408 doi: 10.1016/j.measurement.2023.113408 CrossRef Google Scholar
[13]	Wang F, Zhang S, Zhu P, Chen L, Zhu Y, et al. 2023. The effects of fertility and synchronization variation on seed production in two Chinese fir clonal seed orchards. Scientific Reports 13(1):627 doi: 10.1038/s41598-022-27151-5 CrossRef Google Scholar
[14]	Zhang Y, Luan Q, Jiang J, Li Y. 2021. Prediction and Utilization of Malondialdehyde in Exotic Pine Under Drought Stress Using Near-Infrared Spectroscopy. Frontiers in Plant Science 12:735275 doi: 10.3389/fpls.2021.735275 CrossRef Google Scholar
[15]	Trontelj ml J, Chambers O. 2021. Machine learning strategy for soil nutrients prediction using spectroscopic method. Sensors 21(12):4208 doi: 10.3390/s21124208 CrossRef Google Scholar
[16]	Salehi-Varnousfaderani B, Honarbakhsh A, Tahmoures M, Akbari M. 2022. Soil erodibility prediction by vis-NIR spectra and environmental covariates coupled with GIS, regression and PLSR in a watershed scale, Iran. Geoderma Regional 28:e00470 doi: 10.1016/j.geodrs.2021.e00470 CrossRef Google Scholar
[17]	Singha C, Swain KC, Sahoo S, Govind A. 2023. Prediction of soil nutrients through PLSR and SVMR models by VIs-NIR reflectance spectroscopy. The Egyptian Journal of Remote Sensing and Space Sciences 26(4):901−18 doi: 10.1016/j.ejrs.2023.10.005 CrossRef Google Scholar
[18]	Wang Z, Chen S, Lu R, Zhang X, Ma Y, et al. 2024. Non-linear memory-based learning for predicting soil properties using a regional vis-NIR spectral library. Geoderma 441:116752 doi: 10.1016/j.geoderma.2023.116752 CrossRef Google Scholar
[19]	Dangal SRS, Sanderman J, Wills S, Ramirez-Lopez L. 2019. Accurate and precise prediction of soil properties from a large mid-infrared spectral library. Soil Systems 3:11 doi: 10.3390/soilsystems3010011 CrossRef Google Scholar
[20]	Tao C, Jia M, Wang G, Zhang Y, Zhang Q, et al. 2024. Time-sensitive prediction of NO₂ concentration in China using an ensemble machine learning model from multi-source data. Journal of Environmental Sciences 137:30−40 doi: 10.1016/j.jes.2023.02.026 CrossRef Google Scholar
[21]	Caruana R. 1997. Multitask learning. Machine Learning 28(1):41−75 doi: 10.1023/A:1007379606734 CrossRef Google Scholar
[22]	Tedesco D, de Almeida Moreira BR, Barbosa MR Jr, Papa JP, da Silva RP. 2021. Predicting on multi-target regression for the yield of sweet potato by the market class of its roots upon vegetation indices. Computers and Electronics in Agriculture 191:106544 doi: 10.1016/j.compag.2021.106544 CrossRef Google Scholar
[23]	Zeng Z, Liang N, Yang X, Hoi S. 2018. Multi-target deep neural networks: theoretical analysis and implementation. Neurocomputing 273:634−42 doi: 10.1016/j.neucom.2017.08.044 CrossRef Google Scholar
[24]	Rodríguez-Pérez R, Bajorath J. 2021. Evaluation of multi-target deep neural network models for compound potency prediction under increasingly challenging test conditions. Journal of Computer-Aided Molecular Design 35(3):285−295 doi: 10.1007/s10822-021-00376-8 CrossRef Google Scholar
[25]	Sun X, Han D, Liu Y, Feng F, Sui X. 2017. Responses of soil physicochemical properties and soil microorganism characteristics regareding as carbon metabolism in original Korean pine forest. Journal of Nanjing Forestry University 41(5):18−26 doi: 10.3969/j.issn.1000-2006.201609042 CrossRef Google Scholar
[26]	Santana EJ, Rodrigues dos Santos F, Mastelini SM, Melquiades FL, Barbon S Jr. 2021. Improved prediction of soil properties with multi-target stacked generalisation on EDXRF spectra. Chemometrics and Intelligent Laboratory Systems 209:104231 doi: 10.1016/j.chemolab.2020.104231 CrossRef Google Scholar
[27]	Askari MS, O'Rourke SM, Holden NM. 2015. Evaluation of soil quality for agricultural production using visible-near-infrared spectroscopy. Geoderma 243:80−91 doi: 10.1016/j.geoderma.2014.12.012 CrossRef Google Scholar
[28]	Oliveira DLB, de Souza Pereira LH, Schneider MP, Silva YJAB, Nascimento CWA, et al. 2021. Bio-inspired algorithm for variable selection in i-PLSR to determine physical properties, thorium and rare earth elements in soils from Brazilian semiarid region. Microchemical Journal 160:105640 doi: 10.1016/j.microc.2020.105640 CrossRef Google Scholar
[29]	Wang Z, Miao Z, Yu X, He F. 2023. Vis-NIR spectroscopy coupled with PLSR and multivariate regression models to predict soil salinity under different types of land use. Infrared Physics & Technology 133:104826 doi: 10.1016/j.infrared.2023.104826 CrossRef Google Scholar
[30]	Gillon D, Houssard C, Joffre R. 1999. Using near-infrared reflectance spectroscopy to predict carbon, nitrogen and phosphorus content in heterogeneous plant material. Oecologia 118(2):173−82 doi: 10.1007/s004420050716 CrossRef Google Scholar
[31]	Min M, Lee WS, Kim YH, Bucklin RA. 2006. Nondestructive detection of nitrogen in Chinese cabbage leaves using VIS-NIR spectroscopy. HortScience 41(1):162−66 doi: 10.21273/HORTSCI.41.1.162 CrossRef Google Scholar
[32]	Paszke A, Gross S, Massa F, Lerer A, Bradbury J, et al. 2019. Pytorch: an imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, Canada, 8−14 Dec., 2019. USA: NeurIPS. https://neurips.cc/Conferences/2019
[33]	Hunter JD. 2007. Matplotlib: a 2D graphics environment. Computing in Science & Engineering 9(3):90−95 doi: 10.1109/MCSE.2007.55 CrossRef Google Scholar
[34]	Waskom ML. 2021. Seaborn: statistical data visualization. Journal of Open Source Software 6(60):3021 doi: 10.21105/joss.03021 CrossRef Google Scholar
[35]	Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, et al. 2011. Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12(85):2825−30 Google Scholar
[36]	Spyromitros-Xioufis E, Tsoumakas G, Groves W, Vlahavas I. 2016. Multi-target regression via input space expansion: treating targets as inputs. Machine Learning 104(1):55−98 doi: 10.1007/s10994-016-5546-z CrossRef Google Scholar
[37]	Wegelin JA, et al. 2000. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical report 371. USA: University of Washington. https://stat.uw.edu/research/tech-reports/survey-partial-least-squares-pls-methods-emphasis-two-block-case
[38]	Liu R, Li Y, Tao L, Liang D, Zheng HT. 2022. Are we ready for a new paradigm shift? A survey on visual deep MLP. Patterns 3(7):100520 doi: 10.1016/j.patter.2022.100520 CrossRef Google Scholar
[39]	Yang Y, Liu B, Wang J, Chen Y, Ren Y. 2023. An improved multi-objective brainstorming algorithm with the application of rapeseed germination characteristics optimization. Computers and Electronics in Agriculture 209:107865 doi: 10.1016/j.compag.2023.107865 CrossRef Google Scholar
[40]	Mnih V, Heess NMO, Graves A, kavukcuoglu K. 2014. Recurrent models of visual attention. Advances in neural information processing systems 27
[41]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, et al. 2017. Attention is all you need. Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, Canada, 8−13 Dec., 2014. https://neurips.cc/Conferences/2014
[42]	Akiba T, Sano S, Yanase T, Ohta T, Koyama M. 2019. Optuna: a next-generation hyperparameter optimization framework. Proceedings of the 25^th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Anchorage AK USA: ACM. pp. 2623−31 doi: 10.1145/3292500.3330701
[43]	Ozaki Y, Tanigaki Y, Watanabe S, Nomura M, Onishi M. 2022. Multiobjective tree-structured parzen estimator. Journal of Artificial Intelligence Research 73:1209−50 doi: 10.1613/jair.1.13188 CrossRef Google Scholar
[44]	Mack J, Hatten J, Sucre E, Roberts S, Leggett Z, et al. 2014. The effect of organic matter manipulations on site productivity, soil nutrients, and soil carbon on a southern loblolly pine plantation. Forest Ecology and Management 326:25−35 doi: 10.1016/j.foreco.2014.04.008 CrossRef Google Scholar
[45]	Hussain A, Jamil MA, Abid K, Chen L, Khan K, et al. 2023. Variations in soil phosphorus fractionations in different water-stable aggregates under litter and inorganic fertilizer treatment in Korean pine plantation and its natural forest. Heliyon 9(6):e17261 doi: 10.1016/j.heliyon.2023.e17261 CrossRef Google Scholar
[46]	Rutkowska B, Szulc W, Łabętowicz J. 2009. Influence of soil fertilization on concentration of microelements in soil solution of sandy soil. Journal of Elementology 14(2):349−55 doi: 10.5601/jelem.2009.14.2.15 CrossRef Google Scholar
[47]	Seligman NG, van Keulen H, Spitters CJT. 1992. Weather, soil-conditions and the interannual variability of herbage production and nutrient-uptake on annual Mediterranean grasslands. Agricultural and Forest Meteorology 57(4):265−79 doi: 10.1016/0168-1923(92)90123-L CrossRef Google Scholar
[48]	Borchani H, Varando G, Bielza C, Larrañaga P. 2015. A survey on multi-output regression. WIREs Data Mining and Knowledge Discovery 5(5):216−233 doi: 10.1002/widm.1157 CrossRef Google Scholar
[49]	Melki G, Cano A, Kecman V, Ventura S. 2017. Multi-target support vector regression via correlation regressor chains. Information Sciences 415:53−69 doi: 10.1016/j.ins.2017.06.017 CrossRef Google Scholar
[50]	Kawamura K, Nishigaki T, Andriamananjara A, Rakotonindrina H, Tsujimoto Y, et al. 2021. Using a one-dimensional convolutional neural network on visible and near-infrared spectroscopy to improve soil phosphorus prediction in Madagascar. Remote Sensing 13(8):1519 doi: 10.3390/rs13081519 CrossRef Google Scholar
[51]	Reda R, Saffaj T, Ilham B, Saidi O, Issam K, et al. 2019. A comparative study between a new method and other machine learning algorithms for soil organic carbon and total nitrogen prediction using near infrared spectroscopy. Chemometrics and Intelligent Laboratory Systems 195:103873 doi: 10.1016/j.chemolab.2019.103873 CrossRef Google Scholar
[52]	Shi J, Song G. 2016. Soil type database of China: a nationwide soil dataset based on the Second National Soil Survey. China Scientific Data 1(2):1−12 doi: 10.11922/csdata.170.2015.0033 CrossRef Google Scholar
[53]	Barredo Arrieta A, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, et al. 2020. Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion 58:82−115 doi: 10.1016/j.inffus.2019.12.012 CrossRef Google Scholar

Soil element	Testing method
Organic Matter (OM)	Loss on Ignition (LOI)
Organic Carbon (OC)	Dry Combustion-Infrared Spectrometry
Total Nitrogen (N)	Semi-micro Kjeldahl
Total Phosphorus (P)	Molybdenum blue colorimetry
Total Potassium (K)	Hydrofluoric Acid
Available Boron (B)	Azomethine-H colorimetric
pH value	Potentiometer

Model		Search range	Hyper-parameters
FC	Plsr_fc	50−1,000	130
	Original_fc	50−1,000	500
	Combined	50−1,000	630
	Output	Fixed	7
	Batch_size	16−64	32
	Initial_lr	1.00E-05 to 1.00E-02	1.50E-04
	Epoch	100−1,000	700
MTL	Shard_layers	50−500	185
	Task_layers	200−1,000	625*7
	Output	Fixed	1*7
	Batch_size	16−64	32
	Initial_lr	1.00E-04 to 1.00E-02	1.00E-03
	Epoch	500−5,000	3,000
TSM	Input_embedding	128−512	320
	nhead	4−64	32
	d_model	128−512	320
	Encoder_num	2-12	6
	Task_layers	200−1,000	320*7
	Output	Fixed	1*7
	Batch_size	16−64	32
	Initial_lr	1.00E-05 to 1.00E-03	1.00E-04
	Epoch	500−2,000	1,300

Statistic	OC (g/kg)	OM (g/kg)	N (g/kg)	P (g/kg)	B (mg/kg)	K (g/kg)	pH
Maximum value (Max)	6.9491	11.7562	0.8383	0.5521	0.6615	14.4560	5.4118
Minimum value (Min)	4.3854	7.7722	0.3456	0.3300	0.5169	11.3024	4.5962
Mean value (Mean)	5.8620	10.0869	0.6043	0.4628	0.6070	13.3284	4.9814
Median value (50%)	5.8561	10.0706	0.5951	0.4638	0.6161	13.4462	4.9723
Standard deviation (STD)	0.4249	0.6617	0.0927	0.0395	0.0311	0.6191	0.1642

Nutrient	Grade 1 threshold	Grade 3 threshold	Fertilizer recommendation	Critical application period
P	> 0.5 g/kg	< 0.4 g/kg	40 g P₂O₅/tree	Spring bud differentiation
K	> 14 g/kg	< 12 g/kg	35 g KCl/tree	Early rainy season
N	> 0.7 g/kg	< 0.5 g/kg	Reduce 30% N fertilizer	Avoid summer application

{{lists.name}}

Soil property-driven fertilization in slash pine orchards: a stacking framework with PLSR and neural networks

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors