Research Article
BibTex RIS Cite

Improving classification performance for an imbalanced educational dataset example using SMOTE

Year 2019, Special Issue 2019, 485 - 489, 31.10.2019
https://doi.org/10.31590/ejosat.638608

Abstract

With technology, a lot of data is formed in digital environments. One of the areas with intensive data is educational data sets. By analyzing educational data sets, students' situatiokjgjjööÖns can be predicted by foreseeing. In this way, students can be assisted by anticipating situations such as drop-out due to failure. Educational institutions can take measures to prevent such dropouts and reduce student drop-out. Thus, financial losses of students and educational institutions can be prevented. In this study, the data of five separate associate degree students who were enrolled in Amasya University Distance Education Center in 2016-2017 were used. These are associate degree programs in child development, medical documentation and secretarial, electricity, mechatronics, and internet and network technologies. It was estimated whether the students could graduate or not at the end of the IV. Semester with looking at their I. and II. semester course notes. These data were analyzed by k nearest neighbor (K-NN) and KStar algorithms. Some of the data were obtained from the distance education center as imbalanced data due to the low number of students. In Educational Data Mining, researchers usually overlook the balance of the distribution on a dataset. Unbalanced data can seriously affect the success of classification. Synthetic minority oversampling technique (SMOTE) method was applied to these unbalanced data and how it affected the success of classification was examined. First, the raw data were analyzed with K-nearest neighbors classifier and KStar classifier. In this study, the analysis results of these five chapters are given in tables and comparatively. In this study, it has been seen that SMOTE oversampling method increase the classification success. In areas where unstable data such as educational data mining may exist, higher classification accuracy can be achieved with the help of different oversampling methods.

References

  • Aydemir, E. (2019). Ders Geçme Notlarının Veri Madenciliği Yöntemleriyle Tahmin Edilmesi. Avrupa Bilim ve Teknoloji Dergisi, (15), 70-76.
  • Bunkhumpornpat, C., Sinapiromsaran, K., & Lursinsap, C. (2009). Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Pacific-Asia conference on knowledge discovery and data mining, Berlin, Germany.
  • Çölkesen, İ., & Kavzoğlu, T. (2011).Örnek tabanlı k-star algoritması ile uzaktan algılanmış görüntülerin sınıflandırılması. UFUAB VI.Teknik Sempozyumu, Belek, Antalya.
  • Ge, Y., Yue, D., & Chen, L. (2017). Prediction of wind turbine blades icing based on MBK-SMOTE and random forest in imbalanced data set. IEEE Conference on Energy Internet and Energy System Integration (EI2), Changsha, China.
  • Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232), 1012-1014.
  • Güldal H., Çakıcı, Y. (2017). Eğitsel Veri Madenciliği. 12th International Balkan Education and Science Congress, Nessebar, Bulgaria.
  • Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. International conference on intelligent computing, Berlin, Germany.
  • Kalıpsız, O., & Cihan, P. (2015). Öğrenci Proje Anketlerini Sınıflandırmada En İyi Algoritmanın Belirlenmesi. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 8(1), 41-49.
  • Öztürk, A. (2018). Açık ve uzaktan öğrenme ortamlarında eğitsel veri madenciliği. Açıköğretim Uygulamaları ve Araştırmaları Dergisi, 4(2), 10-13.
  • Peña-Ayala, A. (Ed.). (2013). Educational data mining: applications and trends (Vol. 524). Springer.
  • Pristyanto, Y., Pratama, I., & Nugraha, A. F. (2018). Data level approach for imbalanced class handling on educational data mining multiclass classification. International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia
  • Sultana, M., Haider, A., & Uddin, M. S. (2016). Analysis of data mining techniques for heart disease prediction. 3rd International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), Dakka, Bangladeş.
  • Tallo, T. E., & Musdholifah, A. (2018). The Implementation of Genetic Algorithm in Smote (Synthetic Minority Oversampling Technique) for Handling Imbalanced Dataset Problem. 4th International Conference on Science and Technology (ICST), Yogyakarta, Indonesia.
  • Zeng, M., Zou, B., Wei, F., Liu, X., & Wang, L. (2016). Effective prediction of three common diseases by combining SMOTE with Tomek links technique for imbalanced medical data. May, 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), Chongqing, China.

Improving classification performance for an imbalanced educational dataset example using SMOTE

Year 2019, Special Issue 2019, 485 - 489, 31.10.2019
https://doi.org/10.31590/ejosat.638608

Abstract

With technology, a lot of
data is formed in digital environments. One of the areas with intensive data is
educational data sets. By analyzing educational data sets, students' situatiokjgjjööÖns
can be predicted by foreseeing. In this way, students can be assisted by
anticipating situations such as drop-out due to failure. Educational
institutions can take measures to prevent such dropouts and reduce student
drop-out. Thus, financial losses of students and educational institutions can
be prevented. In this study, the data of five separate associate degree
students who were enrolled in Amasya University Distance Education Center in
2016-2017 were used. These are associate degree programs in child development,
medical documentation and secretarial, electricity, mechatronics, and internet
and network technologies. It was estimated whether the students could graduate
or not at the end of the IV. Semester with looking at their I. and II. semester
course notes. These data were analyzed by k nearest neighbor (K-NN) and KStar
algorithms. Some of the data were obtained from the distance education center
as imbalanced data due to the low number of students.
In
Educational
Data Mining, researchers usually overlook the balance
of the distribution on a dataset. Unbalanced
data can seriously affect the success of classification. Synthetic minority
oversampling technique (SMOTE) method was applied to these unbalanced data and
how it affected the success of classification was examined. First, the raw data
were analyzed with K-nearest neighbors classifier and KStar classifier. In this
study, the analysis results of these five chapters are given in tables and
comparatively. In this study, it has been seen that SMOTE oversampling method
increase the classification success. In areas where unstable data such as
educational data mining may exist, higher classification accuracy can be
achieved with the help of different oversampling methods.

References

  • Aydemir, E. (2019). Ders Geçme Notlarının Veri Madenciliği Yöntemleriyle Tahmin Edilmesi. Avrupa Bilim ve Teknoloji Dergisi, (15), 70-76.
  • Bunkhumpornpat, C., Sinapiromsaran, K., & Lursinsap, C. (2009). Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Pacific-Asia conference on knowledge discovery and data mining, Berlin, Germany.
  • Çölkesen, İ., & Kavzoğlu, T. (2011).Örnek tabanlı k-star algoritması ile uzaktan algılanmış görüntülerin sınıflandırılması. UFUAB VI.Teknik Sempozyumu, Belek, Antalya.
  • Ge, Y., Yue, D., & Chen, L. (2017). Prediction of wind turbine blades icing based on MBK-SMOTE and random forest in imbalanced data set. IEEE Conference on Energy Internet and Energy System Integration (EI2), Changsha, China.
  • Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232), 1012-1014.
  • Güldal H., Çakıcı, Y. (2017). Eğitsel Veri Madenciliği. 12th International Balkan Education and Science Congress, Nessebar, Bulgaria.
  • Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. International conference on intelligent computing, Berlin, Germany.
  • Kalıpsız, O., & Cihan, P. (2015). Öğrenci Proje Anketlerini Sınıflandırmada En İyi Algoritmanın Belirlenmesi. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 8(1), 41-49.
  • Öztürk, A. (2018). Açık ve uzaktan öğrenme ortamlarında eğitsel veri madenciliği. Açıköğretim Uygulamaları ve Araştırmaları Dergisi, 4(2), 10-13.
  • Peña-Ayala, A. (Ed.). (2013). Educational data mining: applications and trends (Vol. 524). Springer.
  • Pristyanto, Y., Pratama, I., & Nugraha, A. F. (2018). Data level approach for imbalanced class handling on educational data mining multiclass classification. International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia
  • Sultana, M., Haider, A., & Uddin, M. S. (2016). Analysis of data mining techniques for heart disease prediction. 3rd International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), Dakka, Bangladeş.
  • Tallo, T. E., & Musdholifah, A. (2018). The Implementation of Genetic Algorithm in Smote (Synthetic Minority Oversampling Technique) for Handling Imbalanced Dataset Problem. 4th International Conference on Science and Technology (ICST), Yogyakarta, Indonesia.
  • Zeng, M., Zou, B., Wei, F., Liu, X., & Wang, L. (2016). Effective prediction of three common diseases by combining SMOTE with Tomek links technique for imbalanced medical data. May, 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), Chongqing, China.
There are 14 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Articles
Authors

Yavuz Ünal

Ahmet Sağlam This is me 0000-0002-2616-8253

Osman Kayhan

Publication Date October 31, 2019
Published in Issue Year 2019 Special Issue 2019

Cite

APA Ünal, Y., Sağlam, A., & Kayhan, O. (2019). Improving classification performance for an imbalanced educational dataset example using SMOTE. Avrupa Bilim Ve Teknoloji Dergisi485-489. https://doi.org/10.31590/ejosat.638608