| [1] |
Barbu A, She Y, Ding L, Gramajo G. 2017. Feature selection with annealing for computer vision and big data learning. |
| [2] |
She Y, Shen J, Barbu A. 2023. Slow kill for big data learning. |
| [3] |
Tibshirani R. 1996. Regression shrinkage and selection via the lasso. |
| [4] |
Zou H, Hastie T. 2005. Regularization and variable selection via the elastic net. |
| [5] |
Zou H. 2006. The adaptive lasso and its oracle properties. |
| [6] |
Fan J, Li R. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. |
| [7] |
Zhang CH. 2010. Nearly unbiased variable selection under minimax concave penalty. |
| [8] |
Yuan M, Lin Y. 2006. Model selection and estimation in regression with grouped variables. |
| [9] |
Wang M, Tian GL. 2019. Adaptive group lasso for high-dimensional generalized linear models. |
| [10] |
Wei F, Huang J, Li H. 2011. Variable selection and estimation in high-dimensional varying-coefficient models. |
| [11] |
Ravikumar P, Lafferty J, Liu H, Wasserman L. 2009. Sparse additive models. |
| [12] |
Schmidt-Hieber J. 2020. Nonparametric regression using deep neural networks with ReLU activation function. |
| [13] |
Nakada R, Imaizumi M. 2020. Adaptive approximation and generalization of deep neural network with intrinsic dimensionality. Journal of Machine Learning Research 21(174):1−38 |
| [14] |
Kohler M, Langer S. 2021. On the rate of convergence of fully connected deep neural network regression estimates. |
| [15] |
Jiao Y, Shen G, Lin Y, Huang J. 2023. Deep nonparametric regression on approximate manifolds: nonasymptotic error bounds with polynomial prefactors. |
| [16] |
Siegel JW. 2023. Optimal approximation rates for deep relu neural networks on sobolev and besov spaces. Journal of Machine Learning Research 24(357):1−52 |
| [17] |
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15(1):1929−1958 |
| [18] |
Liang F, Li Q, Zhou L. 2018. Bayesian neural networks for selection of drug sensitive genes. |
| [19] |
Ghosh S, Yao J, Doshi-Velez F. 2019. Model selection in Bayesian neural networks via horseshoe priors. Journal of Machine Learning Research 20(182):1−46 |
| [20] |
Sun Y, Song Q, Liang F. 2022. Consistent sparse deep learning: theory and computation. |
| [21] |
Sun Y, Song Q, Liang F. 2022. Learning sparse deep neural networks with a spike-and-slab prior. |
| [22] |
Wen W, Wu C, Wang Y, Chen Y, Li H. 2016. Learning structured sparsity in deep neural networks. NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016. Vol. 29. Red Hook, NY, USA: Curran Associates, Inc. pp. 2082–2090 https://proceedings.neurips.cc/paper_files/paper/2016/file/41bfd20a38bb1b0bec75acf0845530a7-Paper.pdf (Accessed March 20, 2026) |
| [23] |
Scardapane S, Comminiello D, Hussain A, Uncini A. 2017. Group sparse regularization for deep neural networks. |
| [24] |
Bungert L, Roith T, Tenbrinck D, Burger M. 2022. A Bregman learning framework for sparse neural networks. Journal of Machine Learning Research 23(192):1−43 |
| [25] |
Li G, Wang G, Ding J. 2023. Provable identifiability of two-layer ReLU neural networks via LASSO regularization. |
| [26] |
Guo Y, She Y, Barbu A. 2021. Network pruning via annealing and direct sparsity control. 2021 International Joint Conference on Neural Networks (IJCNN). Shenzhen, China, 18−22 July 2021. New Jersey: IEEE. pp. 1−8 doi: 10.1109/ijcnn52387.2021.9533741 |
| [27] |
Jantre S, Bhattacharya S, Maiti T. 2025. Spike-and-slab shrinkage priors for structurally sparse Bayesian neural networks. |
| [28] |
Dinh VC, Ho LS. 2020. Consistent feature selection for analytic deep neural networks. NIPS '20: Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020, eds. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H. Vol. 33. Red Hook, NY, USA: Curran Associates, Inc. pp. 2420−2431 https://proceedings.neurips.cc/paper_files/paper/2020/file/1959eb9d5a0f7ebc58ebde81d5df400d-Paper.pdf (Accessed March 20, 2026) |
| [29] |
Chen Y, Gao Q, Liang F, Wang X. 2021. Nonlinear variable selection via deep neural networks. |
| [30] |
Lemhadri I, Ruan F, Abraham L, Tibshirani R. 2021. Lassonet: a neural network with feature sparsity. Journal of Machine Learning Research 22(127):1−29 |
| [31] |
Yang Z, Zheng S, Tang N. 2026. Supervised predictive modeling of high-dimensional data with group ℓ0-norm constrained neural networks. |
| [32] |
Yuan XT, Li P, Zhang T. 2018. Gradient hard thresholding pursuit. Journal of Machine Learning Research 18(166):1−43 |
| [33] |
Yang R, Song Y. 2024. Nonparametric expectile regression meets deep neural networks: a robust nonlinear variable selection method. |
| [34] |
Zhao P, Yu B. 2006. On model selection consistency of lasso. Journal of Machine Learning Research 7(90):2541−2563 |
| [35] |
She Y. 2009. Thresholding-based iterative selection procedures for model selection and shrinkage. |
| [36] |
Agarwal A, Negahban SN, Wainwright MJ. 2012. Stochastic optimization and sparse statistical recovery: optimal algorithms for high dimensions. Advances in Neural Information Processing Systems 25 (NIPS 2012), Lake Tahoe, NV, USA. Red Hook, NY, USA: Curran Associates, Inc. pp. 1547–1555 https://proceedings.neurips.cc/paper_files/paper/2012/file/5751ec3e9a4feab575962e78e006250d-Paper.pdf (Accessed March 20, 2026). |
| [37] |
McInerney A, Burke K. 2025. A statistical modelling approach to feedforward neural network model selection. |
| [38] |
Nguyen N, Needell D, Woolf T. 2017. Linear convergence of stochastic iterative greedy algorithms with sparse constraints. |
| [39] |
Sun L, Barbu A. 2025. Stochastic feature selection with annealing and its applications to streaming data. |
| [40] |
Zou H, Hastie T, Tibshirani R. 2007. On the "degrees of freedom" of the lasso. |
| [41] |
Du J, Li Z, Gu Z, Feng L. 2025. A nonparametric statistics approach to feature selection in deep neural networks with theoretical guarantees. |
| [42] |
Guo Y, Wu YN, Barbu A. 2021. A study of local optima for learning feature interactions using neural networks. 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18−22 July 2021. New Jersey: IEEE. pp. 1−8 doi: 10.1109/ijcnn52387.2021.9533833 |
| [43] |
He K, Zhang X, Ren S, Sun J. 2015. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7−13 December 2015. New Jersey: IEEE. pp. 1026−1034 doi: 10.1109/iccv.2015.123 |
| [44] |
Liang F, Xue J, Jia B. 2022. Markov neighborhood regression for high-dimensional inference. |
| [45] |
Sun L, Liang F. 2022. Markov neighborhood regression for statistical inference of high-dimensional generalized linear models. |
| [46] |
Pedregosa F, Varoquaux C, Gramfort A, Michel V, Thirion B, et al. 2011. Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12(85):2825−2830 |
| [47] |
Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, et al. 2012. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. |
| [48] |
Hadley KE, Hendricks DT. 2014. Use of NQO1 status as a selective biomarker for oesophageal squamous cell carcinomas with greater sensitivity to 17-AAG. |
| [49] |
Guyon I, Gunn S, Ben-Hur A, Dror G. 2004. Result analysis of the NIPS 2003 feature selection challenge. Advances in Neural Information Processing Systems 17 (NIPS 2004), eds. Saul L, Weiss Y, Bottou L. Cambridge, MA: MIT Press. pp. 545–552. https://proceedings.neurips.cc/paper_files/paper/2004/file/5e751896e527c862bf67251a474b3819-Paper.pdf (Accessed March 20, 2026) |