[1]

Barbu A, She Y, Ding L, Gramajo G. 2017. Feature selection with annealing for computer vision and big data learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(2):272−286

doi: 10.1109/TPAMI.2016.2544315
[2]

She Y, Shen J, Barbu A. 2023. Slow kill for big data learning. IEEE Transactions on Information Theory 69(9):5936−5955

doi: 10.1109/TIT.2023.3273179
[3]

Tibshirani R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Methodological 58(1):267−288

doi: 10.1111/j.2517-6161.1996.tb02080.x
[4]

Zou H, Hastie T. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology 67(2):301−320

doi: 10.1111/j.1467-9868.2005.00503.x
[5]

Zou H. 2006. The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101(476):1418−1429

doi: 10.1198/016214506000000735
[6]

Fan J, Li R. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96(456):1348−1360

doi: 10.1198/016214501753382273
[7]

Zhang CH. 2010. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics 38(2):894−942

doi: 10.1214/09-AOS729
[8]

Yuan M, Lin Y. 2006. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology 68(1):49−67

doi: 10.1111/j.1467-9868.2005.00532.x
[9]

Wang M, Tian GL. 2019. Adaptive group lasso for high-dimensional generalized linear models. Statistical Papers 60(5):1469−1486

doi: 10.1007/s00362-017-0882-z
[10]

Wei F, Huang J, Li H. 2011. Variable selection and estimation in high-dimensional varying-coefficient models. Statistica Sinica 21(4):1515−1540

doi: 10.5705/ss.2009.316
[11]

Ravikumar P, Lafferty J, Liu H, Wasserman L. 2009. Sparse additive models. Journal of the Royal Statistical Society Series B: Statistical Methodology 71(5):1009−1030

doi: 10.1111/j.1467-9868.2009.00718.x
[12]

Schmidt-Hieber J. 2020. Nonparametric regression using deep neural networks with ReLU activation function. The Annals of Statistics 48(4):1875−1897

doi: 10.1214/19-aos1875
[13]

Nakada R, Imaizumi M. 2020. Adaptive approximation and generalization of deep neural network with intrinsic dimensionality. Journal of Machine Learning Research 21(174):1−38

[14]

Kohler M, Langer S. 2021. On the rate of convergence of fully connected deep neural network regression estimates. The Annals of Statistics 49(4):2231−2249

doi: 10.1214/20-aos2034
[15]

Jiao Y, Shen G, Lin Y, Huang J. 2023. Deep nonparametric regression on approximate manifolds: nonasymptotic error bounds with polynomial prefactors. The Annals of Statistics 51(2):691−716

doi: 10.1214/23-aos2266
[16]

Siegel JW. 2023. Optimal approximation rates for deep relu neural networks on sobolev and besov spaces. Journal of Machine Learning Research 24(357):1−52

[17]

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15(1):1929−1958

[18]

Liang F, Li Q, Zhou L. 2018. Bayesian neural networks for selection of drug sensitive genes. Journal of the American Statistical Association 113(523):955−972

doi: 10.1080/01621459.2017.1409122
[19]

Ghosh S, Yao J, Doshi-Velez F. 2019. Model selection in Bayesian neural networks via horseshoe priors. Journal of Machine Learning Research 20(182):1−46

[20]

Sun Y, Song Q, Liang F. 2022. Consistent sparse deep learning: theory and computation. Journal of the American Statistical Association 117(540):1981−1995

doi: 10.1080/01621459.2021.1895175
[21]

Sun Y, Song Q, Liang F. 2022. Learning sparse deep neural networks with a spike-and-slab prior. Statistics & Probability Letters 180:109246

doi: 10.1016/j.spl.2021.109246
[22]

Wen W, Wu C, Wang Y, Chen Y, Li H. 2016. Learning structured sparsity in deep neural networks. NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016. Vol. 29. Red Hook, NY, USA: Curran Associates, Inc. pp. 2082–2090 https://proceedings.neurips.cc/paper_files/paper/2016/file/41bfd20a38bb1b0bec75acf0845530a7-Paper.pdf (Accessed March 20, 2026)

[23]

Scardapane S, Comminiello D, Hussain A, Uncini A. 2017. Group sparse regularization for deep neural networks. Neurocomputing 241:81−89

doi: 10.1016/j.neucom.2017.02.029
[24]

Bungert L, Roith T, Tenbrinck D, Burger M. 2022. A Bregman learning framework for sparse neural networks. Journal of Machine Learning Research 23(192):1−43

[25]

Li G, Wang G, Ding J. 2023. Provable identifiability of two-layer ReLU neural networks via LASSO regularization. IEEE Transactions on Information Theory 69(9):5921−5935

doi: 10.1109/tit.2023.3274152
[26]

Guo Y, She Y, Barbu A. 2021. Network pruning via annealing and direct sparsity control. 2021 International Joint Conference on Neural Networks (IJCNN). Shenzhen, China, 18−22 July 2021. New Jersey: IEEE. pp. 1−8 doi: 10.1109/ijcnn52387.2021.9533741

[27]

Jantre S, Bhattacharya S, Maiti T. 2025. Spike-and-slab shrinkage priors for structurally sparse Bayesian neural networks. IEEE Transactions on Neural Networks and Learning Systems 36(6):11176−11188

doi: 10.1109/tnnls.2024.3485529
[28]

Dinh VC, Ho LS. 2020. Consistent feature selection for analytic deep neural networks. NIPS '20: Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020, eds. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H. Vol. 33. Red Hook, NY, USA: Curran Associates, Inc. pp. 2420−2431 https://proceedings.neurips.cc/paper_files/paper/2020/file/1959eb9d5a0f7ebc58ebde81d5df400d-Paper.pdf (Accessed March 20, 2026)

[29]

Chen Y, Gao Q, Liang F, Wang X. 2021. Nonlinear variable selection via deep neural networks. Journal of Computational and Graphical Statistics 30(2):484−492

doi: 10.1080/10618600.2020.1814305
[30]

Lemhadri I, Ruan F, Abraham L, Tibshirani R. 2021. Lassonet: a neural network with feature sparsity. Journal of Machine Learning Research 22(127):1−29

[31]

Yang Z, Zheng S, Tang N. 2026. Supervised predictive modeling of high-dimensional data with group ℓ0-norm constrained neural networks. Journal of Computational and Graphical Statistics 00:1−14

doi: 10.1080/10618600.2025.2581774
[32]

Yuan XT, Li P, Zhang T. 2018. Gradient hard thresholding pursuit. Journal of Machine Learning Research 18(166):1−43

[33]

Yang R, Song Y. 2024. Nonparametric expectile regression meets deep neural networks: a robust nonlinear variable selection method. Statistical Analysis and Data Mining: The ASA Data Science Journal 17(6):e70002

doi: 10.1002/sam.70002
[34]

Zhao P, Yu B. 2006. On model selection consistency of lasso. Journal of Machine Learning Research 7(90):2541−2563

[35]

She Y. 2009. Thresholding-based iterative selection procedures for model selection and shrinkage. Electronic Journal of Statistics 3:384−415

doi: 10.1214/08-ejs348
[36]

Agarwal A, Negahban SN, Wainwright MJ. 2012. Stochastic optimization and sparse statistical recovery: optimal algorithms for high dimensions. Advances in Neural Information Processing Systems 25 (NIPS 2012), Lake Tahoe, NV, USA. Red Hook, NY, USA: Curran Associates, Inc. pp. 1547–1555 https://proceedings.neurips.cc/paper_files/paper/2012/file/5751ec3e9a4feab575962e78e006250d-Paper.pdf (Accessed March 20, 2026).

[37]

McInerney A, Burke K. 2025. A statistical modelling approach to feedforward neural network model selection. Statistical Modelling 25(4):323−342

doi: 10.1177/1471082x241258261
[38]

Nguyen N, Needell D, Woolf T. 2017. Linear convergence of stochastic iterative greedy algorithms with sparse constraints. IEEE Transactions on Information Theory 63(11):6869−6895

doi: 10.1109/tit.2017.2749330
[39]

Sun L, Barbu A. 2025. Stochastic feature selection with annealing and its applications to streaming data. Journal of Nonparametric Statistics 37(3):580−597

doi: 10.1080/10485252.2025.2456767
[40]

Zou H, Hastie T, Tibshirani R. 2007. On the "degrees of freedom" of the lasso. The Annals of Statistics 35(5):2173−2192

doi: 10.1214/009053607000000127
[41]

Du J, Li Z, Gu Z, Feng L. 2025. A nonparametric statistics approach to feature selection in deep neural networks with theoretical guarantees. arXiv 2512.13565

doi: 10.48550/arXiv.2512.13565
[42]

Guo Y, Wu YN, Barbu A. 2021. A study of local optima for learning feature interactions using neural networks. 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18−22 July 2021. New Jersey: IEEE. pp. 1−8 doi: 10.1109/ijcnn52387.2021.9533833

[43]

He K, Zhang X, Ren S, Sun J. 2015. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7−13 December 2015. New Jersey: IEEE. pp. 1026−1034 doi: 10.1109/iccv.2015.123

[44]

Liang F, Xue J, Jia B. 2022. Markov neighborhood regression for high-dimensional inference. Journal of the American Statistical Association 117(539):1200−1214

doi: 10.1080/01621459.2020.1841646
[45]

Sun L, Liang F. 2022. Markov neighborhood regression for statistical inference of high-dimensional generalized linear models. Statistics in Medicine 41(20):4057−4078

doi: 10.1002/sim.9493
[46]

Pedregosa F, Varoquaux C, Gramfort A, Michel V, Thirion B, et al. 2011. Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12(85):2825−2830

[47]

Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, et al. 2012. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483:603−607

doi: 10.1038/nature11003
[48]

Hadley KE, Hendricks DT. 2014. Use of NQO1 status as a selective biomarker for oesophageal squamous cell carcinomas with greater sensitivity to 17-AAG. BMC Cancer 14:334

doi: 10.1186/1471-2407-14-334
[49]

Guyon I, Gunn S, Ben-Hur A, Dror G. 2004. Result analysis of the NIPS 2003 feature selection challenge. Advances in Neural Information Processing Systems 17 (NIPS 2004), eds. Saul L, Weiss Y, Bottou L. Cambridge, MA: MIT Press. pp. 545–552. https://proceedings.neurips.cc/paper_files/paper/2004/file/5e751896e527c862bf67251a474b3819-Paper.pdf (Accessed March 20, 2026)