Adaptive Elastic Net on High-Dimensional Sparse Data with Multicollinearity: Application to Lipomatous Tumor Classification

Authors

  • Narumol Sudjai Department of Orthopaedic Surgery, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand https://orcid.org/0009-0008-5016-4827
  • Monthira Duangsaphon Department of Mathematics and Statistics, Faculty of Science and Technology, Thammasat University, Pathum Thani 12120, Thailand https://orcid.org/0000-0003-2360-6323
  • Chandhanarat Chandhanayingyong Department of Orthopaedic Surgery, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand https://orcid.org/0000-0002-9473-8836

DOI:

https://doi.org/10.6000/1929-6029.2024.13.04

Keywords:

Diagnostic classification, High-dimensional sparse data, Machine-learning, Multicollinearity, Penalized logistic regression, Penalty function

Abstract

Predictive models can experience instabilities because of the combination of high-dimensional sparse data and multicollinearity problems. The adaptive Least Absolute Shrinkage and Selection Operator (adaptive Lasso) and adaptive elastic net were developed using the adaptive weight on penalty term. These adaptive weights are related to the power order of the estimators. Therefore, we concentrate on the power of adaptive weight on these penalty functions. This study purposed to compare the performances of the power of the adaptive Lasso and adaptive elastic net methods under high-dimensional sparse data with multicollinearity. Moreover, we compared the performances of the ridge, Lasso, elastic net, adaptive Lasso, and adaptive elastic net in terms of the mean of the predicted mean squared error (MPMSE) for the simulation study and the classification accuracy for a real-data application. The results of the simulation and the real-data application showed that the square root of the adaptive elastic net performed best on high-dimensional sparse data with multicollinearity.

References

Johnson CN, Ha AS, Chen E, Davidson D. Lipomatous soft-tissue tumors. J Am Acad Orthop Surg 2018; 26(22): 779-88. https://doi.org/10.5435/JAAOS-D-17-00045 DOI: https://doi.org/10.5435/JAAOS-D-17-00045

Burusapat C, Wongprakob N, Wanichjaroen N, Pruksapong C, Satayasoontorn K. Atypical lipomatous tumor/well-differentiated liposarcoma with intramuscular lipoma-like component of the thigh. Case Rep Surg 2020; 2020: 8846932. https://doi.org/10.1155/2020/8846932 DOI: https://doi.org/10.1155/2020/8846932

Thavikulwat AC, Wu JS, Chen X, Anderson ME, Ward A, Kung J. Image-guided core needle biopsy of adipocytic tumors: diagnostic accuracy and concordance with final surgical pathology. AJR Am J Roentgenol 2021; 216(4): 997-1002. https://doi.org/10.2214/ajr.20.23080 DOI: https://doi.org/10.2214/AJR.20.23080

Makalic E, Schmidt DF. Review of modern logistic regression methods with application to small and medium sample size problems. In: Li J, ed. AI 2010: Advances in Artificial Intelligence. Berlin, Heidelberg: Springer 2011; 213-22. DOI: https://doi.org/10.1007/978-3-642-17432-2_22

Sudjai N, Duangsaphon M. Liu-type logistic regression coefficient estimation with multicollinearity using the bootstrapping method. Science, Engineering and Health Studies 2020; 14(3): 203-14. https://doi.org/10.14456/sehs.2020.19

Sudjai N, Siriwanarangsun P, Lektrakul N, Saiviroonporn P, Maungsomboon S, Phimolsarnti R, et al. Tumor-to-bone distance and radiomic features on MRI distinguish intramuscular lipomas from well-differentiated liposarcomas. J Orthop Surg Res 2023; 18(1): 255. https://doi.org/10.1186/s13018-023-03718-4 DOI: https://doi.org/10.1186/s13018-023-03718-4

Sudjai N, Siriwanarangsun P, Lektrakul N, Saiviroonporn P, Maungsomboon S, Phimolsarnti R, et al. Robustness of radiomic features: two-dimensional versus three-dimensional MRI-based feature reproducibility in lipomatous soft-tissue tumors. Diagnostics 2023; 13(2): 258. https://doi.org/10.3390/diagnostics13020258 DOI: https://doi.org/10.3390/diagnostics13020258

Hosmer DW, Lemeshow SJ. Applied logistic regression. 3 ed. New Jersey: Wiley 2013. DOI: https://doi.org/10.1002/9781118548387

Kleinbaum DG, Klein M. Logistic regression: a self-learning text. 3rd ed. New York: Springer 2010. DOI: https://doi.org/10.1007/978-1-4419-1742-3

Senaviratna NAMR, Cooray TMJA. Multicollinearity in binary logistic regression model. In: Thapa N, editor. Theory and practice of mathematics and computer science. 1st ed. West Bengal: BP International 2021; pp. 11-9. DOI: https://doi.org/10.9734/bpi/tpmcs/v6/2417E

Belsley DA, Kuh E, Welsch RE. Regression diagnostics: Identifying influential data and sources of collinearity. New York: John Wiley & Sons 1980. DOI: https://doi.org/10.1002/0471725153

Brimacombe M. High-dimensional data and linear models: a review. Open Access Med Stat 2014; 4: 17-27. https://doi.org/10.2147/OAMS.S56499 DOI: https://doi.org/10.2147/OAMS.S56499

Kastrin A, Peterlin B. Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data. Expert Syst Appl 2010; 37: 5178-85. https://doi.org/10.1016/j.eswa.2009.12.074 DOI: https://doi.org/10.1016/j.eswa.2009.12.074

Pavlou M, Ambler G, Seaman S, De Iorio M, Omar RZ. Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events. Stat Med 2016; 35(7): 1159-77. https://doi.org/10.1002/sim.6782 DOI: https://doi.org/10.1002/sim.6782

Hosseinnataj A, Bahrampour A, Baneshi M, Zolala F, Nikbakht R, Torabi M, et al. Penalized Lasso methods in health data: application to trauma and influenza data of Kerman. Journal of Kerman University of Medical Sciences 2019; 26(6): 440-9. https://doi.org/10.22062/jkmu.2019.89573

Zou H, Hastie T. Regularization and variable selection via the elastic Net. J R Stat Soc Series B Stat Methodol 2005; 67(2): 301-20. https://doi.org/10.1111/j.1467-9868.2005.00503.x DOI: https://doi.org/10.1111/j.1467-9868.2005.00503.x

Zou H. The adaptive Lasso and Its oracle properties. J Am Stat Assoc 2006; 101(476): 1418-29. https://doi.org/10.1198/016214506000000735 DOI: https://doi.org/10.1198/016214506000000735

Zou H, Zhang HH. On the adaptive elastic-net with a diverging number of parameters. Ann Stat 2009; 37(4): 1733-51. https://doi.org/10.1214/08-AOS625 DOI: https://doi.org/10.1214/08-AOS625

Kamalapathy PN, Ramkumar DB, Karhade AV, Kelly S, Raskin K, Schwab J, et al. Development of machine learning model algorithm for prediction of 5-year soft tissue myxoid liposarcoma survival. J Surg Oncol 2021; 123(7): 1610-7. https://doi.org/10.1002/jso.26398 DOI: https://doi.org/10.1002/jso.26398

Kamalapathy PN, Gonzalez MR, de Groot TM, Ramkumar D, Raskin KA, Ashkani-Esfahani S, et al. Prediction of 5-year survival in soft tissue leiomyosarcoma using a machine learning model algorithm. J Surg Oncol 2023. https://doi.org/10.1002/jso.27514 DOI: https://doi.org/10.1002/jso.27514

Hastie T, Tibshirani T, Friedman JB. The Elements of statistical learning: data mining inference and prediction. 2nd ed. Berlin/Heidelberg: Springer 2009. DOI: https://doi.org/10.1007/978-0-387-84858-7

Cherkassky V, Mulier F. Learning from data: concepts, theory, and methods. 2nd ed. New Jersey: John Wiley and Sons 2006. DOI: https://doi.org/10.1002/9780470140529

Hardin J, Garcia SR, Golan D. A method for generating realistic correlation matrices. Ann Appl Stat 2013; 7(3): 1733-62, 30. https://doi.org/10.1214/13-AOAS638 DOI: https://doi.org/10.1214/13-AOAS638

Downloads

Published

2024-03-29

How to Cite

Sudjai, N. ., Duangsaphon, M. ., & Chandhanayingyong, C. . (2024). Adaptive Elastic Net on High-Dimensional Sparse Data with Multicollinearity: Application to Lipomatous Tumor Classification. International Journal of Statistics in Medical Research, 13, 30–40. https://doi.org/10.6000/1929-6029.2024.13.04

Issue

Section

General Articles