Integration of Stacking Ensemble and Explainable AI for Taxpayer Compliance Risk Profiling

Heru Pratama Agung; Suharjito Suharjito

doi:10.59261/jequi.v8i2.301

Authors

Heru Pratama Agung Universitas Bina Nusantara
Suharjito Universitas Bina Nusantara

DOI:

https://doi.org/10.59261/jequi.v8i2.301

Keywords:

Tax Non-compliance, Ensemble Learning, Hybrid Resampling, Stacking Classifier, Explainable AI

Abstract

Background: Non-compliance of corporate taxpayers is one of the biggest challenges for the Tax Authorities especially because no tangible data is available on corporate tax avoidance making tax evasion micro data sets have the characteristics of extreme class imbalance that are well known from the real-world tax data.

Objective: To develop an accurate and transparent tax non-compliance prediction using an Ensemble Learning based prediction model incorporating Hybrid Resampling methods, and Explainable Artificial Intelligence (XAI).

Methods: The dataset, which consists of 49159 observations, is extracted from the administrative record of Directorate General of Taxes where the ratio of imbalance about 18.81:1. In the former strategy, three hybrid resampling techniques (SMOTE-Tomek, SMOTEENN, Borderline-SMOTE Tomek) were integrated with tree-based classifiers (Random Forest, XGBoost, LightGBM) to act as the base-learners. These were all combined using two ensemble architectures, Stacking Classifier and Voting Classifier to utilize their respective predictive capabilities. We used the SHAP and LIME methods to break the black-box nature of the algorithm to interpret the predictive decisions.

Results: Experimental results revealed that the best classification was achieved with the Stacking Classifier, yielding an Accuracy of 97.03% along with the minority class F1-Score of 0.7309 In turn, the strongest discrimination in probability was found for the Voting Classifier with an ROC-AUC metric 0.9859 Consequently, the XAI analysis confirmed that pure financial ratio being utterly secondary are dominated in the prediction of the non-compliance risk and shows that absolute financial scale indicators (e.g. Tax Payment Amount, Total Assets) and administrative profile characteristics (e.g. MSME Taxpayer Status, Non-Effective Status) are overwhelmingly DC dominated.

Conclusion: The choice of Ensemble Learning provides an analytically sound and interpretable early warning system of tax audits beneficial for real risk-based audits with its composure of hybrid resampling and interpretability (XAI).

Downloads

Download data is not yet available.

References

Alrasheedi, M. A., Ijaz, S., Alrashdi, A. M., & Lee, S.-W. (2025). Advanced Tax Fraud Detection: A Soft-Voting Ensemble Based on GAN and Encoder Architecture. Mathematics, 13(4), 642. https://doi.org/10.3390/math13040642

Astini, Y., & Setiawati, E. (2025). Tax Education for MSMEs: Solutions for Compliance and Business Optimization. Society: Jurnal Pengabdian Masyarakat, 4(4), 540–547. https://doi.org/10.55824/jpm.v4i4.594

Diamendia, T., & Setyowati, M. S. (2021). Analisis kebijakan compliance risk management berbasis machine learning pada Direktorat Jenderal Pajak. Indonesian Treasury Review: Jurnal Perbendaharaan, Keuangan Negara Dan Kebijakan Publik, 6(3), 289–298. https://doi.org/10.33105/itrev.v6i3.401

Febriminanto, R. D., & Wasesa, M. (2022). Machine learning analytics for predicting tax revenue potential. Indonesian Treasury Review: Jurnal Perbendaharaan, Keuangan Negara Dan Kebijakan Publik, 7(3), 193–205. https://doi.org/10.33105/itrev.v7i3.497

Lee, C.-W., Fu, M.-W., Wang, C.-C., & Azis, M. I. (2025). Evaluating machine learning algorithms for financial fraud detection: Insights from Indonesia. Mathematics, 13(4), 600. https://doi.org/10.3390/math13040600

Mardiasmo, M. B. A. (2016). Perpajakan–Edisi Terbaru. Penerbit Andi.

Masrom, S., Rahman, R. A., Mohamad, M., Abd Rahman, A. S., & Baharun, N. (2022). Machine learning of tax avoidance detection based on hybrid metaheuristics algorithms. IAES International Journal of Artificial Intelligence, 11(3), 1153. https://doi.org/10.11591/ijai.v11.i3.pp1153-1163

Mustofa, S., Emon, Y. R., Mamun, S. Bin, Akhy, S. A., & Ahad, M. T. (2025). A novel AI-driven model for student dropout risk analysis with explainable AI insights. Computers and Education: Artificial Intelligence, 8, 100352. https://doi.org/10.1016/j.caeai.2024.100352

Nabrawi, E., & Alanazi, A. (2023). Fraud detection in healthcare insurance claims using machine learning. Risks, 11(9), 160. https://doi.org/10.3390/risks11090160

Nataliawati, R., Yaumi, S., & Kusumawati, F. Y. (2024). Implementation of Artificial Neural Networks as a Method for Early Detection of Tax Evasion Behavior in Indonesia. Accounting and Finance Studies, 4(4), 273–284. https://doi.org/10.47153/afs44.11332024

Nugroho, W. C. (2023). Koneksi Politik, Gender Diversity, Inovasi dan Kesadaran Kewajiban Pajak Perusahaan. E-Jurnal Akuntansi, 33(10), 2612–2626.

Nuryani, N., Mutiara, A. B., Wiryana, I. M., Purnamasari, D., & Putra, S. N. W. (2024). Artificial intelligence model for detecting tax evasion involving complex network schemes. Aptisi Transactions on Technopreneurship (ATT), 6(3), 339–356. https://doi.org/10.34306/att.v6i3.436

Park, M. S., Son, H., Hyun, C., & Hwang, H. J. (2021). Explainability of machine learning models for bankruptcy prediction. Ieee Access, 9, 124887–124899. https://doi.org/10.1109/ACCESS.2021.3110270

Prastiwi, D., & Diamastuti, E. (2023). Building trust and enhancing tax compliance: The role of authoritarian procedures and respectful treatment in Indonesia. Journal of Risk and Financial Management, 16(8), 375. https://doi.org/10.3390/jrfm16080375

Rizal, M., Permana, N., & Qalbia, F. (2024). Transformasi Sistem Perpajakan Di Era Digital: Tantangan, Inovasi, Dan Kebijakan Adaptif. Citizen: Jurnal Ilmiah Multidisiplin Indonesia, 4(4), 340–348. https://doi.org/10.53866/jimi.v4i4.648

Rosid, A. (2023). Artificial neural networks for predicting taxpaying behaviour of Indonesian firms. Scientax, 4(2), 174–204. https://doi.org/10.52869/st.v4i2.526

Shane, A., Wijaya, H. J. T., & Soepriyanto, G. (2025). Taxpayers’ awareness and perception of machine learning in enhancing tax compliance in Indonesia. Edelweiss Applied Science and Technology, 9(10), 801–814.

Slemrod, J. (2019). Tax compliance and enforcement. Journal of Economic Literature, 57(4), 904–954. https://doi.org/10.1257/jel.20181437

Tran, T.-N., & Nguyen, Q.-D. (2024). Research on the influence of genetic algorithm parameters on XGBoost in load forecasting. Engineering, Technology & Applied Science Research, 14(6), 18849–18854. https://doi.org/10.48084/etasr.8863

van Brederode, R. F. (2019). Countermeasures to tax fraud, evasion and avoidance: A critical review. Ethics and Taxation, 323–358. https://doi.org/10.1007/978-981-15-0089-3_13

Wahab, R., & Bakar, A. (2021). Digital economy tax compliance model in Malaysia using machine learning approach. Sains Malaysiana, 50(7), 2059–2077. https://doi.org/10.17576/jsm-2021-5007-20