Integration of Stacking Ensemble and Explainable AI for Taxpayer Compliance Risk Profiling
DOI:
https://doi.org/10.59261/jequi.v8i2.301Keywords:
Tax Non-compliance, Ensemble Learning, Hybrid Resampling, Stacking Classifier, Explainable AIAbstract
Background: Non-compliance of corporate taxpayers is one of the biggest challenges for the Tax Authorities especially because no tangible data is available on corporate tax avoidance making tax evasion micro data sets have the characteristics of extreme class imbalance that are well known from the real-world tax data.
Objective: To develop an accurate and transparent tax non-compliance prediction using an Ensemble Learning based prediction model incorporating Hybrid Resampling methods, and Explainable Artificial Intelligence (XAI).
Methods: The dataset, which consists of 49159 observations, is extracted from the administrative record of Directorate General of Taxes where the ratio of imbalance about 18.81:1. In the former strategy, three hybrid resampling techniques (SMOTE-Tomek, SMOTEENN, Borderline-SMOTE Tomek) were integrated with tree-based classifiers (Random Forest, XGBoost, LightGBM) to act as the base-learners. These were all combined using two ensemble architectures, Stacking Classifier and Voting Classifier to utilize their respective predictive capabilities. We used the SHAP and LIME methods to break the black-box nature of the algorithm to interpret the predictive decisions.
Results: Experimental results revealed that the best classification was achieved with the Stacking Classifier, yielding an Accuracy of 97.03% along with the minority class F1-Score of 0.7309 In turn, the strongest discrimination in probability was found for the Voting Classifier with an ROC-AUC metric 0.9859 Consequently, the XAI analysis confirmed that pure financial ratio being utterly secondary are dominated in the prediction of the non-compliance risk and shows that absolute financial scale indicators (e.g. Tax Payment Amount, Total Assets) and administrative profile characteristics (e.g. MSME Taxpayer Status, Non-Effective Status) are overwhelmingly DC dominated.
Conclusion: The choice of Ensemble Learning provides an analytically sound and interpretable early warning system of tax audits beneficial for real risk-based audits with its composure of hybrid resampling and interpretability (XAI).
Downloads
References
Alrasheedi, M. A., Ijaz, S., Alrashdi, A. M., & Lee, S.-W. (2025). Advanced Tax Fraud Detection: A Soft-Voting Ensemble Based on GAN and Encoder Architecture. Mathematics, 13(4), 642. https://doi.org/10.3390/math13040642
Astini, Y., & Setiawati, E. (2025). Tax Education for MSMEs: Solutions for Compliance and Business Optimization. Society: Jurnal Pengabdian Masyarakat, 4(4), 540–547. https://doi.org/10.55824/jpm.v4i4.594
Diamendia, T., & Setyowati, M. S. (2021). Analisis kebijakan compliance risk management berbasis machine learning pada Direktorat Jenderal Pajak. Indonesian Treasury Review: Jurnal Perbendaharaan, Keuangan Negara Dan Kebijakan Publik, 6(3), 289–298. https://doi.org/10.33105/itrev.v6i3.401
Febriminanto, R. D., & Wasesa, M. (2022). Machine learning analytics for predicting tax revenue potential. Indonesian Treasury Review: Jurnal Perbendaharaan, Keuangan Negara Dan Kebijakan Publik, 7(3), 193–205. https://doi.org/10.33105/itrev.v7i3.497
Lee, C.-W., Fu, M.-W., Wang, C.-C., & Azis, M. I. (2025). Evaluating machine learning algorithms for financial fraud detection: Insights from Indonesia. Mathematics, 13(4), 600. https://doi.org/10.3390/math13040600
Mardiasmo, M. B. A. (2016). Perpajakan–Edisi Terbaru. Penerbit Andi.
Masrom, S., Rahman, R. A., Mohamad, M., Abd Rahman, A. S., & Baharun, N. (2022). Machine learning of tax avoidance detection based on hybrid metaheuristics algorithms. IAES International Journal of Artificial Intelligence, 11(3), 1153. https://doi.org/10.11591/ijai.v11.i3.pp1153-1163
Mustofa, S., Emon, Y. R., Mamun, S. Bin, Akhy, S. A., & Ahad, M. T. (2025). A novel AI-driven model for student dropout risk analysis with explainable AI insights. Computers and Education: Artificial Intelligence, 8, 100352. https://doi.org/10.1016/j.caeai.2024.100352
Nabrawi, E., & Alanazi, A. (2023). Fraud detection in healthcare insurance claims using machine learning. Risks, 11(9), 160. https://doi.org/10.3390/risks11090160
Nataliawati, R., Yaumi, S., & Kusumawati, F. Y. (2024). Implementation of Artificial Neural Networks as a Method for Early Detection of Tax Evasion Behavior in Indonesia. Accounting and Finance Studies, 4(4), 273–284. https://doi.org/10.47153/afs44.11332024
Nugroho, W. C. (2023). Koneksi Politik, Gender Diversity, Inovasi dan Kesadaran Kewajiban Pajak Perusahaan. E-Jurnal Akuntansi, 33(10), 2612–2626.
Nuryani, N., Mutiara, A. B., Wiryana, I. M., Purnamasari, D., & Putra, S. N. W. (2024). Artificial intelligence model for detecting tax evasion involving complex network schemes. Aptisi Transactions on Technopreneurship (ATT), 6(3), 339–356. https://doi.org/10.34306/att.v6i3.436
Park, M. S., Son, H., Hyun, C., & Hwang, H. J. (2021). Explainability of machine learning models for bankruptcy prediction. Ieee Access, 9, 124887–124899. https://doi.org/10.1109/ACCESS.2021.3110270
Prastiwi, D., & Diamastuti, E. (2023). Building trust and enhancing tax compliance: The role of authoritarian procedures and respectful treatment in Indonesia. Journal of Risk and Financial Management, 16(8), 375. https://doi.org/10.3390/jrfm16080375
Rizal, M., Permana, N., & Qalbia, F. (2024). Transformasi Sistem Perpajakan Di Era Digital: Tantangan, Inovasi, Dan Kebijakan Adaptif. Citizen: Jurnal Ilmiah Multidisiplin Indonesia, 4(4), 340–348. https://doi.org/10.53866/jimi.v4i4.648
Rosid, A. (2023). Artificial neural networks for predicting taxpaying behaviour of Indonesian firms. Scientax, 4(2), 174–204. https://doi.org/10.52869/st.v4i2.526
Shane, A., Wijaya, H. J. T., & Soepriyanto, G. (2025). Taxpayers’ awareness and perception of machine learning in enhancing tax compliance in Indonesia. Edelweiss Applied Science and Technology, 9(10), 801–814.
Slemrod, J. (2019). Tax compliance and enforcement. Journal of Economic Literature, 57(4), 904–954. https://doi.org/10.1257/jel.20181437
Tran, T.-N., & Nguyen, Q.-D. (2024). Research on the influence of genetic algorithm parameters on XGBoost in load forecasting. Engineering, Technology & Applied Science Research, 14(6), 18849–18854. https://doi.org/10.48084/etasr.8863
van Brederode, R. F. (2019). Countermeasures to tax fraud, evasion and avoidance: A critical review. Ethics and Taxation, 323–358. https://doi.org/10.1007/978-981-15-0089-3_13
Wahab, R., & Bakar, A. (2021). Digital economy tax compliance model in Malaysia using machine learning approach. Sains Malaysiana, 50(7), 2059–2077. https://doi.org/10.17576/jsm-2021-5007-20
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Heru Pratama Agung, Suharjito

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA). that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.



