Identifying slow learners in an e-learning environment using k-means clustering approach

: Currently, the majority of e-learning lessons created and disseminated advocate a “one-size-fits-all” teaching philosophy. The e-learning environment, however, includes slow learners in a noticeable way, just like in traditional classroom settings. Learning analytics of educational data from a learning management system (LMS) have been considered by the researchers as a potential means of identifying slow e-learners and supporting, contesting, and altering present educational practices in e-learning. We used the students’ rates of learning and grade points along with the total learning time, which is calculated from the time series log data, to cluster the learners. The rate at which a student learns determines whether he or she is a slow learner, an average learner, or a gifted learner. For classifying learners, we followed a step-by-step procedure that included instructional design to create a dataset, learning analytics of the dataset, and a machine learning strategy to cluster e-learners. The system has been adequately integrated with the methods for measuring student learning. A strategy based on the revised Bloom’s Taxonomy is offered for the assessment of learners. The K-Means clustering approach is used to group learners who have similar performance without collecting a learner’s previous academic records or demographic information. In the experimental evaluation, 7.7% of e-learners are grouped as slow learners, while advanced learners make up 61.3 percent of the student body and average learners make up 31 percent. According to the study, there is a correlation between learning rate and academic success, with fast learners having a lower learning rate.


Introduction
Today, e-learning outperforms traditional learning as a means of instruction and study.This creative method takes into account how students can study anytime, anywhere, and employs a variety of educational teaching strategies in a rich and varied setting.Recommender systems have recently been used to support individual learning in an online learning environment.Unquestionably, personalised learning occurs when elearning environments make cautious attempts to plan, build, and implement educational experiences that correspond to their learners' requirements, goals, abilities, and interests (Bourkoukou & El Bachari, 2018).Asynchronous e-learning may be the best option for creating a novel learning environment because asynchronous environments regard students as autonomous learners.There are no time constraints for learning, and students can work on online tasks whenever they want.The ability to categorise people as fast, average, or slow learners provides a framework for evaluating learning requirements and assessing each person's potential (Munje et al., 2021).The objective of the research was to identify slow e-learners and propose an intelligent tutoring system that could provide a better education platform and improve learning efficiency.
A slow learner refers to someone who learns at a slower rate than the average learner; these are children whose academic performance falls below the average for their age group.According to Kirk, the rate of learning determines whether a child is a slow learner, an average learner, or a gifted learner, and the slow-learning child is not mentally retarded because he is capable of achieving a moderate level of academic success, albeit at a slower rate than the average child (Kirk, 1962).With more time and assistance, the slow learner can achieve a moderate level of academic success.If such children's needs are not met, they will fail and drop out of school.Slow learners benefit from a methodical, step-by-step approach as well as extra time and assistance (Vasudevan, 2017).
The purpose of this research is to advance a concept that can be used to identify slow learners in e-learning based on their rate of learning, enhance their performance, and guide them toward becoming better learners.Remedial education is most beneficial to slow learners because a remedial programme focuses on meeting a learner where he is and guiding him to greater achievement from there (Vasudevan, 2017).A technologyenhanced learning environment is needed to support these learners, boost their cognitive abilities, and enable them to plan, oversee, and assess their learning.In this e-learning environment, slow learners will benefit from recommendations and feedback that will help them perform at their highest levels, and an intelligent instruction system will guarantee them the highest possible level of education.It is worth mentioning that this work concentrates on grouping learners, leaving the recommendation phase for future work because it falls outside the purview of this study.

Review of literature
A review of existing works that implemented the identification of slow learners in elearning was carried out.The studies that dealt with classifying learners as slow, average, or fast were also considered.However, it was discovered that a limited amount of research on the classification of learners in digital learning exists in the literature today; this area of study has not been thoroughly explored, possibly as a result of the widely used in-person learning format that was generally available until recently.e-Learning has grown in popularity as a result of the spread of technology around the globe and the increase in access to information since it enables people to learn new skills without a physical mentor instructing them.Given that more students are accessing educational content online and creating more data flows, it is clear that the field of e-learning may make a substantial contribution to the idea of big data (Moubayed et al., 2020).

The focus of e-learning research
The majority of e-learning research relied on demographic information, past academic achievement, online learning activities, and in-class study performance to predict student outcomes and suggest e-learning materials.Student achievement prediction was the focus of these studies rather than learner categorization.However, studies on detecting slow learners in a classroom setting have already been investigated and are now better understood.In institutes of higher learning, predictions of student performance are vital; final grades are typically used by India's higher education institutions to assess students' performance.To forecast student success based on learning activity logs, researchers employed real-time datasets from e-learning platforms (Mahboob et al., 2017).The use of multimedia tools to construct visually appealing activities improves the learning experience for students.Educators can provide students with a wide range of resources that they are unable to exemplify in the classroom due to time constraints (Widodo et al., 2022).
The most common way to suggest e-learning content to users is through recommended systems.To address the issue of recommending the appropriate e-learning content to the user, an intelligent framework utilising a Machine Learning (ML) algorithm with a Random Forest classifier is proposed.This framework categorises the elearning content based on its levels of difficulty and offers the learner the most appropriate content based on their level of knowledge (Thomas & Chandra, 2020).The study (Geetha et al., 2021) revealed that it can be beneficial for administrators, educators, and students to predict a student's performance before the final exam so that decisions can be made in time to prevent students from failing.Additionally, the application of sentimental analyses can provide information to enhance the student's performance in the upcoming term.
A group of researchers looked at methods currently used to provide e-learning content based on the prior knowledge of learners.Existing tools for generating e-learning content were examined to create individualised learning content reflecting the prior competencies of the learners.They described a step-by-step procedure for developing and disseminating personalised information in the paper.The methodology intends to develop and deliver personalised online learning based on the integration of an analytical complex including assessment tools, a database of necessary competencies, training materials, and the learners' past knowledge and skills (Blagoev et al., 2021).
The study conducts a thorough assessment of the multicriteria decision-making (MCDM) methods used in e-learning.Understanding the importance of utilising MCDM to evaluate e-learning is the key goal.The Information System Success Model (D&M model), put forth by Delone and McLean in 1992, was embraced by the bulk of earlier studies.One of the findings was that the original D&M model may be improved upon by incorporating the traits of learners, instructors, user interfaces, and learning communities (Hii et al., 2022).
The interest was focused on suggesting a system that can offer a better educational environment to increase learning effectiveness since slow learners, in a conventional classroom, would need more time and assistance.An intelligent teaching system is built on the foundation of the student's learning model.Three elements make up the learning model's overall structure: the student's basic characteristics, his or her personal interestsparticularly those related to learningand the student's individual success in terms of learning.
Because each student absorbs knowledge in a unique way, it is unavoidable to provide e-learning with the capacity to adapt to varied student preferences.It is also critical to put forth significant effort in the implementation of e-learning, notably in delivering more comprehensive material with an adequate number of graphically based information, such as images, videos, and innovative games.The study's goal is to assess the usability of a personalised adaptive e-learning system designed based on student's learning styles and basic knowledge levels.According to the research findings, the usability of the adaptive e-learning system for students was well acknowledged in all categories of usability (Hariyanto et al., 2020).

ML approaches in e-learning
Learning analytics and educational data mining technologies have been used to analyse learners' behaviours in an LMS.The performance of learners is predicted using the revised numeric dataset produced from the server logs by applying classification methods.A Moodle-based LMS allows for interactive activities that combine simulations, short videos, virtual experiments, and games for both curricular and extracurricular teaching, improving constructivist-based interactive learning for both learners, particularly slow learners, and instructors to develop skills for intelligent information and technological communication (Arumugam et al., 2019).In the study (Mohammad & Mahmoud, 2014), Expectation Maximisation and K-Mean, two ML clustering techniques, were used to identify the best learning pattern for slow learners in elementary school.The suggested integrated e-learning and mining system's development stages were explained.
In their effort to gauge the amount of student involvement, unsupervised learning algorithms have been used to cluster students according to their online activity and interactions.To provide a more complete picture of students' engagement, the measures taken into consideration combine interaction-related and effort-related indicators.Course instructors would find it easier to identify disengaged students and identify the factors that promote engagement with this grouping or clustering (Moubayed et al., 2020).The study suggested two ML methods to forecast students' learning performance and explain the projected outcomes to help students identify areas for improvement.The work focused on connecting the e-learning aspects of students to their performance outcomes (Wang et al., 2019).
The purpose of the research was to determine whether it is possible to anticipate the challenges that students will face during a subsequent session of a digital design course.ML techniques were used to analyse the data logged by a design suite and technology-enhanced learning system (Hussain et al., 2018).The study demonstrated the potential for employing proxy variables to enhance teaching strategies.Proxy variables can be used as indicators for interventions to support students and act as predictors of students' online behaviours.Instructors can decide what they need to do to improve their teaching and students' learning with the aid of proxy variables that are built on reliable evidence (Kim et al., 2016).
Decision trees, Naive Bayes classifiers, and artificial neural networks are among the most prevalent data mining approaches used to predict and classify students' variables, according to findings (Abu Saa et al., 2019).Some researchers used smart card records for student performance prediction (Ma et al., 2020).According to (Swamydoss et al., 2019), users are considered slow learners if they are very sluggish in learning the material, have recently registered, haven't engaged with the material, or are unable to comprehend and complete the assessment.An intelligent system-based e-learning model has been proposed in the study to categorise learner characteristics and pick the right course materials for the right learner characteristics.

Student classification based on learning rate
Slow learners will inevitably need more time than average and talented learners to reach the desired competence level (Reddy, 1997).The article "Time and Learning" states that learners' learning rates vary by a ratio of nearly 5:1.With that example, the 5% of students who learn the slowest take around five times longer than the 5% of students who learn the fastest to meet the criterion (Bloom, 1974).The learning rate, which is the number of topics learned each hour, was the main variable of interest in the experiment.The rate was calculated by dividing the total number of items that were correct by the total amount of time, which was then multiplied by 3,600 to get the rate in items per hour; the rate was calculated for every chapter (Arlin & Webster, 1983).
Bloom suggested that rather than giving every student the same amount of teaching time and allowing learning to vary, perhaps we might mandate that every student or almost every student attain specific levels of achievement by permitting time to vary.In other words, we should provide learners with the time and instruction required to get them all to a reasonable level of learning.The following method can be used to determine the learning rate if an 80% mastery level is expected: 20 to 25 objective-type test items might be created based on the subject matter after advising the students to conduct a comprehensive study.The teacher should then record how long it takes each student to achieve an overall score of 80%.Each student's rate of learning will be determined by how long it takes him to reach an 80% mastery level (Bloom, 1968).
When it comes to e-learning, data mining of log files can be used to identify slow learners as people who have a high rate of learning, i.e., it takes too long to acquire an 80% competence level.In our study, the learning rate was calculated using server logs of student activity from the Moodle LMS, as specified by Bloom.It is the time taken by a student to achieve an overall score of 80%.Student clusters were identified along with the learning rate and other features of the dataset.The study aims to investigate and perform research on an intelligent teaching system for slow e-learners that can provide a better learning environment and boost learning outcomes.The intelligent instruction system can use e-learning tools and standards to address each learner's learning necessities and preferences.

Project background and research questions
Students' learning is individualised to meet their needs, empowering them to make the best decisions at any given time.Personalising the e-learning experience and maintaining students' interest and motivation rank as two of the primary issues facing e-learning.
Clustering of e-learners will help us provide proper assistance to improve their performance.We adopted a methodical process when categorising learners.The procedures comprised instructional design to construct a dataset using an LMS, learning analytics of the dataset, and machine learning to cluster e-learners.An LMS is a robust, integrated system that helps teachers and students engage in a range of activities throughout the digital learning process.Teachers can communicate with students, generate web-based quizzes and course notes, and monitor and evaluate their progress by using an LMS.Students use it for education, collaboration, and communication (Widodo et al., 2022).The log data from an LMS is a key resource for acquiring knowledge about students' learning behaviours in a digital learning environment.

Dataset
The proposed system uses a primary data source to create the dataset.By producing econtent and uploading it to the LMS for undergraduate students, the researchers amassed significant amounts of log data.The students were enrolled and registered for the elearning course.The preprocessed data of server log files, a grade sheet of 17 test scores, a summative assessment score, and the time taken to complete these attempts by 155 undergraduate students from the Moodle LMS were included in the dataset.The system applies the K-Means cluster-predict methodology to the dataset.A step-by-step process of clustering learners, which includes instructional design and learning analytics of the educational data, was implemented.This study demonstrates how a behavioural strategy can efficiently support subject mastery using the revised Bloom's Taxonomy.For the purpose of assessing students' performance on all of the course's topics, the question papers had a standard structure.According to the cognitive dimension of Bloom's taxonomy, the marks for the questions were assigned based on the degree of skill complexity.
The total time taken to learn is an important factor in grouping the learners in this work.Total time is calculated from the time series data in log files.The types of activities that exist in the e-learning course are shown in Table 1.The time taken by each student to attain an 80% mastery level indicates his or her rate of learning.Naturally, slow learners will take more time to attain the specified mastery level than average and gifted students (Reddy, 1997).The mastery level needed to achieve an overall score of 80% for each student was evaluated using the results of 17 formative assessments and a summative assessment.We assessed mastery level using a calculation based on the overall score percentage attained in the first attempt, even if it was less than 80%.Each course participant's learning rate was calculated as (total time in minutes / overall score percentage) × 80. Python 3.8 was used for data preparation and preprocessing.

Determining optimal clusters
There are two major approaches to finding the optimal number of clusters: domain knowledge and a data-driven approach.Domain knowledge.Domain knowledge may provide some insight into determining the number of clusters.For example, in the case of clustering student data sets, if we have prior knowledge of performance (fast, average, slow), then K = 3.K values driven by domain knowledge give more relevant insights.
Data-driven approach.If domain knowledge is unavailable, mathematical methods can assist in determining the appropriate number of clusters.Within-cluster variance is a measure of the compactness of the cluster.The lower the value of withincluster variance, the higher the compactness of the cluster formed (Shi et al., 2021).The most popular method for determining the optimal number of clusters to use is the Elbow method, which takes advantage of within-cluster variance.

Evaluation metrics
The process of assessing the goodness of clustering algorithm results is referred to as cluster validation.Generally, cluster validity measures are categorized into 3 classes (Koutroumbas & Theodoridis, 2009).

•
Internal cluster validation.The clustering result is evaluated based on the data clustered itself (internal information) without reference to external information.
• External cluster validation.Clustering results are evaluated based on some externally known results, such as externally provided class labels.
• Relative cluster validation.The clustering results are evaluated by varying different parameters for the same algorithm (e.g., changing the number of clusters).

Clustering quality
There are several ways to measure the robustness of a clustering algorithm.Any assessment metric reduces the available data to a single value for comparing clustering outcomes.There are two major types of measures to assess clustering performance: extrinsic measures that require ground truth labels and intrinsic measures that do not require ground truth labels.Some clustering performance measures are the Silhouette Coefficient, Calinski-Harabasz (CH) Index, Davies-Bouldin (DB) Index, etc.

Silhouette analysis
The Silhouette score is used to evaluate the quality of clusters created using clustering algorithms in terms of how well samples are clustered with other samples that are similar to each other.Silhouette analysis can be used to determine the degree of separation between clusters.The coefficient can take values in the interval [-1, 1].We want the coefficients to be as big as possible and close to 1 to have good clusters (Shi et al., 2021).

CH index
The CH index is an internal cluster validation index.The CH index, also known as the Variance Ratio Criterion, is the ratio of all clusters' sum of between-cluster and intercluster dispersion.The better the performance, the higher the score.When clusters are dense and well-separated, the score rises, which relates to the standard definition of a cluster.The score can be calculated quickly.

DB index
The DB index is an internal evaluation scheme, where the validation of how well the clustering has been done is made using quantities and features inherent to the dataset.This index signifies the average "similarity" between clusters, where the similarity is a measure that compares the distance between clusters with the size of the clusters themselves.A lower DB index relates to a model with better separation between the clusters.

Experimental evaluation
Our study focuses on identifying slow learners in an e-learning environment by evaluating the rate of learning and time spent on an LMS for learning.As a result of our study, we have proposed a conceptual model for the identification of slow learners in an e-learning environment.Educators can better comprehend the varied degrees of cognitive demand by conducting a taxonomy study of learning behaviours.Both instructors and learners will benefit from the study's improved Bloom's taxonomy analysis, which was used to analyse the cognitive process dimension, the knowledge dimension, and the learning rate.A systematic methodology is applied to find the advanced, average, and slow e-learners in a course based on the revised Bloom's Taxonomy levels using the K-Means clustering approach.
The GPA and total time taken to learn were used to assess each student's mastery level, and the learners were classified as "advanced", "average", or "slow" using the K-Means clustering method.The footprint factors for K-Means clustering are its scalability, efficiency, and simplicity.Additionally, it can manage extensive data sets with ease.This approach will help identify an individual's learning requirements.The best method for calculating the cluster count is the Elbow method, which uses within-cluster variance.In our study, the optimal value of k was determined to be 3 using the Elbow method, which is shown in Fig. 1.The value of k will change if most of the learners fall into the average or fast categories.
The K-Means clustering algorithm analysed the input data, which comprised 59 features for each student (mark, time taken to complete test, grade of 18 assessment tests, average mark, CGPA, total time spent learning, learning rate, and student ID), and discovered natural groups or clusters in feature space, such as clusters 0 (advanced), 1 (average), and 2 (slow).The classification of learners aids in the progression of the learning process and the completion of the course with special learning efforts.Fig. 2 shows a graphical depiction of the clustering results.61.3% of the students are classified as advanced learners and 31% as average students.In the experimental evaluation, twelve out of 155 students, or 7.7% of the elearners, were clustered as slow learners.In accordance with the findings of the experimental research, slow learners require additional time and resources, exactly like in a classroom setting, combined with techniques to improve their knowledge and cognitive process dimensions.Additionally, it has been discovered that there is a correlation between academic achievement and learning rate, with slow learners having a greater learning rate and advanced learners having a lower learning rate.According to Bloom, learning rates vary across students by a factor of around 5:1 (Bloom, 1974).As per our research, the 5% of students who learn the least quickly need almost seven times longer to complete the requirement than the 5% of students who learn the most quickly.
Once clustering is done, how well the clustering has performed can be quantified by a number of metrics.Ideal clustering is characterised by a minimal intra-cluster distance and a maximal inter-cluster distance.The intrinsic measures that do not require ground truth labels, such as the Silhouette Coefficient, the CH Index, and the DB Index, were used in the study to measure the goodness of split since we had no prior knowledge about the labels.The results of the analysis are as follows: • Silhouette analysis was used to determine the degree of separation between clusters, and the value of the coefficient was 0.21567.Since the value is greater than zero, the clusters are not very close to the neighbouring clusters.
• When the CH Index is measured, the score is 45.77305.The score shows that the clusters are dense and separated.

•
The DB Index is 1.66364.A lower DB index relates to a model with better separation between the clusters.

Results and benefits
The cognitive ability of each knowledge item is examined using Bloom's taxonomy through the student's test performance, and the overall cognitive ability of the student is formed in a comprehensive manner.The results of the experiment demonstrate a relationship between learning pace and academic accomplishment, with fast learners having a lower rate of learning and slow learners having a greater rate.
• The method measures the learning rate, which will be the basis for identifying slow learners in e-learning using formative and summative assessment approaches and computing the total time taken to learn.
• A strategy based on revised Bloom's Taxonomy is used for analyzing the cognitive process and knowledge dimensions of learners in an e-learning course.
• The K-Means clustering is used to categorise the learners as clusters 0 (advanced), 1 (average), and 2 (slow) based on their learning performance and rate of learning.
• According to our research, the 5% of students who learn the least rapidly in elearning require roughly seven times longer to accomplish the requirement than the 5% of students who learn the most quickly.
• This study will help faculties with student classification.Mentoring slow learners will be easier and more specific in this scenario through an intelligent tutoring system, which will help improve the program's outcome.
Experimental result analysis shows that among the time-related and grade-related metrics, the marks scored, the time taken to answer the review questions, and the duration to complete the course are the most representative of the student's achievement level.The adopted methodology is beneficial because it can serve as groundwork to identify student clusters based on their digital behavioural cues, even though this study does not provide a determining factor for recognising the degree of participation in e-learning environments.

Conclusion
Slow e-learners need to be given more time and resources, just like in a classroom context, along with strategies to enhance their cognitive process and knowledge aspects.A comprehensive and systematic strategy based on the revised Bloom's Taxonomy was used as an excellent assessment method for determining the level of learning, and the multiple-choice questions acted as a formative assessment tool at the end of each topic, helping to reinforce learning in students.K-Means clustering, a widely used technique for data cluster analysis, was used to group learners in the e-learning environment.The three clusters are identified as clusters 0 (advanced), 1 (average), and 2 (slow).The experiment results show a relationship between learning pace and academic achievement, with fast learners learning at a slower rate and slow learners learning at a faster rate.Identification of learners in such a way has a pedagogical basis and is a standard way to measure the cognitive and knowledge dimensions of a learner.We can thus identify slow e-learners using this approach with the help of log files and the K-Means clustering algorithm to improve learning effectiveness.
The proposed system has some limitations, like using the overall score percentage obtained in the initial attempt to calculate mastery level, even though it is less than 80%.
Instead of giving students extra chances and time to get 80%, we employed the certificate course method to determine how much time was spent learning for the e-learning course.When using a remedial technique, it is possible to introduce the amount of time required for each student to get an overall score of 80%.This study does not provide a criterion for measuring the level of involvement in e-learning environments.Remedial aims to improve the competencies of low-achieving students, and in the next step, we hope to improve the performance of such slow learners with the help of an intelligent instruction system.