A homogeneous distribution of students in a class is accepted as a key factor for overall success in primary education. A class of students with similar attributes normally increases academic success. It is also a fact that general academic success might be lower in some classes where students have different intelligence and academic levels. In this study, a class distribution model is proposed by using some data science algorithms over a small number of students’ dataset. With unsupervised and semi‑supervised learning methods in machine learning and data mining, a group of students is equally distributed to classes, taking into account some criteria. This model divides a group of students into clusters by the considering students’ different qualitative and quantitative characteristics. A draft study is carried out by predicting the effectiveness and efficiency of the presented approaches. In addition, some process elements such as quantitative and qualitative characteristics of a student, data acquisition style, digitalization of attributes, and creating a future prediction are also included in this study. Satisfactory and promising experimental results are received using a set of algorithms over collected datasets for classroom scenarios. As expected, a clear and concrete evaluation between balanced and unbalanced class distributions cannot be performed since these two scenarios for the class distributions cannot be applicable at the same time.

Keywords: Unsupervised and semi-supervised methods, class distribution, classroom homogeneity, ability grouping, similar academic performance.


Adams-Byers, J., Whitsell, S. S., & Moon, S. M. (2004). Gifted students' perceptions of the academic and social/emotional effects of homogeneous and heterogeneous grouping. Gifted Child Quarterly, 48(1), 7-20.

Alpaydin, E. (2021). Introduction to machine learning. Adaptive Computation and Machine Learning series, MIT press.

Backer, E., & Jain, A. K. (1981). A clustering performance measure based on fuzzy set decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, (1), 66-75.

Basu, S., Banerjee, A., & Mooney, R. (2002). Semi-supervised clustering by seeding. In Proceedings of 19th International Conference on Machine Learning (ICML-2002).

Bellinger, G., Castro, D., & Mills, A. (2004). Data, information, knowledge, and wisdom.  

Bosworth, R. (2014). Class size, class composition, and the distribution of student achievement. Education Economics, 22(2), 141-165.

Briggs, M. (2020). Comparing academically homogeneous and heterogeneous groups in an active learning physics class. Journal of College Science Teaching, 49(6), 76-82.

Bulut, F. (2016). Çok katmanli algilayicilar ile doğru meslek tercihi [Right career choice using multi-layer perceptron]. Anadolu University Journal of Science and Technology A-Applied Sciences and Engineering17(1), 97‑109.

Bulut, F., & Amasyali, M. F. (2016). Katı kümeleme ve yeni bir geçiş fonksiyonuyla uzman karışımlarında sınıflandırma [Classification in mixture of experts using hard clustering and a new gate function]. Journal of the Faculty of Engineering and Architecture of Gazi University31(4), 1017-1025.

Bulut, F. (2020). The minimum ratio of preserving the dataset similarity in resampling: (1− 1/e). International Journal of Information Technology12(1), 231-244.

Bulut, F., Bektaş, M., & Yavuz, A. (2021). Efficient path planning of drone swarms over clustered human crowds in social events. International Journal of Intelligent Unmanned Systems.

Çelenk, S. (2008). İlköğretim okulları birinci sınıf öğrencilerinin ilkokuma ve yazma öğretimine hazırlık düzeyleri [Level of readiness for reading and writing education among first grade students of primary schools]. Abant İzzet Baysal University Journal of the Faculty of Education, 8(1), 83-90.

Feng, X., & Murray, A. T. (2018). Allocation using a heterogeneous space Voronoi diagram. Journal of Geographical Systems, 20(3), 207-226.

Filatova, O. A. (2015). Cultural attributes of students to make student-centered approach successful. International Journal of Languages, Literature and Linguistics, 1(1), 20-24.

Gabaldon-Estevan, D. (2020). Heterogeneity versus homogeneity in schools: A review of the educational value of classroom interaction. Education Sciences, 10(11), 335.

Gao, Y., Lin, T., Zhang, Y., Luo, S., & Nie, F. (2021). Robust principal component analysis based on discriminant information. IEEE Transactions on Knowledge and Data Engineering.

Groth, D., Hartmann, S., Klie, S., & Selbig, J. (2013). Principal components analysis. Computational Toxicology: Volume II, 527-547.

Hady, M. F. A., & Schwenker, F. (2013). Semi-supervised learning. Handbook on Neural Information Processing, 215-239.

Hallam, S., & Ireson, J. (2001). Ability grouping in education. Ability Grouping in Education, 1-240.

Han, X., Zhu, Y., Ting, K. M., & Li, G. (2023). The impact of isolation kernel on agglomerative hierarchical clustering algorithms. Pattern Recognition, 139, 109517.

Hodum, J. (2016). Ability grouping for academic growth in the elementary school. Union University.

Holmes, W., Bialik, M., & Fadel, C. (2019). Artificial intelligence in education. Boston: Center for Curriculum Redesign.

Ian, HW, & Eibe, F. (2005). Data mining: Practical machine learning tools and techniques.

Jayalakshmi, T., & Santhakumaran, A. (2011). Statistical normalization and back propagation for classification. International Journal of Computer Theory and Engineering, 3(1), 1793-8201.

Johnson, D. W., & Johnson, R. T. (2009). An educational psychology success story: Social interdependence theory and cooperative learning. Educational Researcher, 38(5), 365-379.

Kaufman, L., & Rousseeuw, P. J. (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.

Koray, Ö., & Tatar, A. G. N. (2003). Primary school students' misconceptions about mass and weight and the distribution of these misconceptions according to the 6th, 7th and 8th grade levels. Pamukkale University Journal of Education Faculty, 13(13), 187-198.

Kuh, G. D., Kinzie, J. L., Buckley, J. A., Bridges, B. K., & Hayek, J. C. (2006). What matters to student success: A review of the literature (Vol. 8). Washington, DC: National Postsecondary Education Cooperative.

Lebedina-Manzoni, M. (2004). To what students attribute their academic success and failure. Education, 124(4), 699-709.

Lu, F., & Anderson, M. L. (2015). Peer effects in microenvironments: The benefits of homogeneous classroom groups. Journal of Labor Economics, 33(1), 91-122.

Mair, P. (2018). Factor analysis. In Modern Psychometrics with R (pp. 17-61). Springer, Cham.

Maulik, U., & Bandyopadhyay, S. (2002). Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on pattern analysis and machine intelligence, 24(12), 1650-1654.

Motegi, R., & Seki, Y. (2023). SMLSOM: The shrinking maximum likelihood self-organizing map. Computational Statistics & Data Analysis, 182, 107714.

Mulkey, L. M., Catsambis, S., Steelman, L.C., & Crain, R. L. (2005). The long-term effects of ability grouping in mathematics: A national investigation. Social Psychology of Education, 8(2), 137-177.

Oakes, P. J., Haslam, S. A., Morrison, B., & Grace, D. (1995). Becoming an in-group: Reexamining the impact of familiarity on perceptions of group homogeneity. Social Psychology Quarterly, 52-60.

Oakes, J. (2005). Keeping track: How schools structure inequality (2nd ed.). Yale University Press.

Puntambekar, S. (2022). Distributed scaffolding: scaffolding students in classroom environments. Educational Psychology Review, 34(1), 451-472.

Rivkin, S. G., & Schiman, J. C. (2015). Instruction time, classroom quality, and academic achievement. The Economic Journal, 125(588), F425-F448.

Rytivaara, A. (2011). Flexible grouping as a means for classroom management in a heterogeneous classroom. European Educational Research Journal, 10(1), 118-128.

Salwana, E., Hamid, S., & Yasin, N. M. (2017). Student academic streaming using clustering technique. Malaysian Journal of Computer Science, 30(4), 286-299.

Schullery, N. M., & Schullery, S. E. (2006). Are heterogeneous or homogeneous groups more beneficial to students?. Journal of Management Education, 30(4), 542-556.

Sheridan, K., Puranik, T. G., Mangortey, E., Pinon-Fischer, O. J., Kirby, M., & Mavris, D. N. (2020). An application of DBSCAN clustering for flight anomaly detection during the approach phase. In AIAA Scitech 2020 Forum (p. 1851).

Shields, C. M. (1995). A comparison study of student attitudes and perceptions in homogeneous and heterogeneous classrooms. Roeper Review, 17(4), 234-238.

Slavin, R. E. (1996). Research on cooperative learning and achievement: What we know, what we need to know. Contemporary Educational Psychology, 21(1), 43-69.

Taghizabet, A., Tanha, J., Amini, A., & Mohammadzadeh, J. (2023). A semi-supervised clustering approach using labeled data. Scientia Iranica, 30(1), 104-115.

Wang, Q., Wang, F., Ren, F., Li, Z., & Nie, F. (2021). An effective clustering optimization method for unsupervised linear discriminant analysis. IEEE Transactions on Knowledge and Data Engineering.

Website,, Received: 28/Feb/2024

Wyman, P. J., & Watson, S. B. (2020). Academic achievement with cooperative learning using homogeneous and heterogeneous groups. School Science and Mathematics, 120(6), 356-363.

Xiao, B., Wang, Z., Liu, Q., & Liu, X. (2018). SMK-means: an improved mini batch k-means algorithm based on mapreduce with big data. Comput. mater. Continua, 56(3), 365-379.

Xiao, C., Hong, S., & Huang, W. (2023). Optimizing graph layout by t-SNE perplexity estimation. International Journal of Data Science and Analytics, 15(2), 159-171.

Xie, Y., Wu, D., & Qiang, Z. (2023). An improved mixture model of gaussian processes and its classification expectation–maximization algorithm. Mathematics, 11(10), 2251.

Yoleri, S. (2013). The effects behavior problems in preschool children have on their school adjustment. Education, 134(2), 218-226.

Yoleri, S. (2014). The effects of age, gender, and temperament traits on school adjustment for preschool children. e-International Journal of Educational Research, 5(2), 54-66.

Yoleri, S. (2015). Preschool children's school adjustment: indicators of behavior problems, gender, and peer victimisation. Education 3-13, 43(6), 630-640.

Zhang, S., Zhang, C., & Yang, Q. (2003). Data preparation for data mining. Applied artificial intelligence, 17(5-6), 375-381.

Zhu, X., Zhang, S., Jin, Z., Zhang, Z., & Xu, Z. (2010). Missing value estimation for mixed-attribute data sets. IEEE Transactions on Knowledge and Data Engineering, 23(1), 110-121.






Research Article