Klasifikasi Dokumen Akademik Berbasis XGBoost untuk Pemetaan Tujuan Pembangunan Berkelanjutan (SDGs) di Universitas Lampung

Authors

  • Rahman Taufik Universitas Lampung
  • Arsitoteles
  • Candra Wijaya
  • Rakhmat Herlambang

DOI:

https://doi.org/10.23960/komputasi.v13i2.329

Keywords:

dokumen akademik, klasifikasi, sdgs, xgboost

Abstract

Pemetaan kontribusi institusi pendidikan tinggi terhadap Sustainable Development Goals merupakan tantangan krusial untuk akuntabilitas global dan capaian World Class University. Meskipun model-model canggih rentan terhadap overfitting dan menuntut sumber daya komputasi besar pada data yang tidak seimbang, penelitian ini mengeksplorasi algoritma XGBoost sebagai solusi efisien untuk klasifikasi SDGs pada dokumen akademik universitas. Penelitian ini menggunakan dataset sebanyak 148136 dokumen, diproses dengan TF−IDF, dan dioptimasi dengan hyperparameter tuning serta class sample weighting untuk mitigasi imbalance.  Hasil evaluasi menunjukkan model yang stabil dengan accuracy 0.92, precision 0.92, recall 0.89, dan F1−score 0.90 pada dataset uji. Meskipun kinerja agregat tinggi, analisis log loss dan confusion matrix mengindikasikan adanya overfitting lokal pada kategori minoritas, yang menyebabkan recall rendah di kelas-kelas tersebut. Secara keseluruhan, model XGBoost terbukti valid sebagai alat ukur efektif untuk memetakan kontribusi universitas terhadap SDGs, sekaligus memberikan panduan strategis berbasis data untuk mengidentifikasi celah dan mendorong keseimbangan capaian WCU

Downloads

Download data is not yet available.

References

United Nations General Assembly, "Transforming our world: the 2030 Agenda for Sustainable Development (A/RES/70/1)," 2015. [Online]. Available: https://sustainabledevelopment.un.org/post2015/transformingourworld/publication

W. L. Filho, J. Sierra, E. Price, J. H. P. P. Eustachio, A. Novikau, M. Kirrane, and A. L. Salvia, "The role of universities in accelerating the sustainable development goals in Europe," Scientific Reports, vol. 14, no. 1, p. 15464, 2024.

E. De la Poza, P. Merello, A. Barberá, and A. Celani, "Universities’ reporting on SDGs: Using the impact rankings to model and measure their contribution to sustainability," Sustainability, vol. 13, no. 4, p. 2038, 2021. [4]

Q. Li, H. Peng, J. Li, C. Xia, R. Yang, L. Sun, and L. He, "A survey on text classification: From traditional to deep learning," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 13, no. 2, pp. 1-41, 2022.

F. M. Kwale, "A critical review of k means text clustering algorithms," International Journal of Advanced Research in Computer Science, vol. 4, no. 9, pp. 1-9, 2013.

A. A. Khan, M. S. Bashir, A. Batool, M. S. Raza, and M. A. Bashir, "K‐Means Centroids Initialization Based on Differentiation Between Instances Attributes," International Journal of Intelligent Systems, p. 7086878, 2024.

L. Wang, "Text sentiment analysis method based on support vector machine and long short-term memory network," in Proc. 2023 4th Int. Conf. Computing, Networks and Internet of Things, 2023, pp. 87-91.

Y. Huang, Y. Jiang, T. Hasan, Q. Jiang, and C. Li, "A topic BiLSTM model for sentiment classification," in Proc. 2nd Int. Conf. Innovation in Artificial Intelligence, 2018, pp. 143-147.

A. Duan and R. C. Raga, "BiLSTM model with Attention mechanism for multi-label news text classification," in 2024 4th International Conference on Neural Networks, Information and Communication (NNICE), 2024, pp. 566-569.

T. Chen and C. Guestrin, "XGBoost: A scalable tree boosting system," in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2016, pp. 785–794.

H. Schütze, C. D. Manning, and P. Raghavan, Introduction to Information Retrieval. Cambridge, U.K.: Cambridge University Press, 2008.

Z. Abidin and A. Junaidi, "Text stemming and lemmatization of regional languages in Indonesia: a systematic literature review," Journal of Information Systems Engineering and Business Intelligence, vol. 10, no. 2, pp. 217-231, 2024.

Pusat Informasi dan Humas, Fakultas Vokasi Universitas Airlangga, "Pedoman kata kunci SDGs penelitian dan publikasi ilmiah," Universitas Airlangga, 2025. [Online]. Available: https://vokasi.unair.ac.id/wp-content/uploads/2025/05/Pedoman-Kata-Kunci-SDGs-Penelitian-dan-Publikasi-Ilmiah-1_opt.pdf

S. Qaiser and R. Ali, "Text mining: use of TF-IDF to examine the relevance of words to documents," International Journal of Computer Applications, vol. 181, no. 1, pp. 25-29, 2018.

M. Ester, H. P. Kriegel, and X. J. G. A. Xu, "XGBoost: A scalable tree boosting system," in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pp. 785–794, 2016.

Downloads

Published

2025-10-30

Issue

Section

Articles