Analysing student behaviour in a learning management system using a process mining approach

: Online learning implementation has been growing year by year across countries, including Indonesia. Many higher education institutions use a Learning Management System (LMS) to facilitate online learning. Unfortunately, many issues arise during online learning implementation, such as a lack of student behaviour monitoring. This study adopts an educational process mining technique to conduct weekly assessments of student behaviour during one semester. The study was undertaken in the following steps: problem identification, literature review, design of study context, log data collection from LMS, log data filtering, event data grouping, conversion of LMS logs to event logs, clustering, and process model discovery. The following findings were revealed in this research: the most frequently accessed features were course material, assignments, and forums; students accessed the LMS most frequently on lecture days; the number of student activities decreased in line with fewer instructions from lecturers; students who attained the best grades most frequently accessed the LMS, and vice versa; and high-achieving students had a more complex process model than other students. Therefore, this research suggests that systematic teaching strategies have a broader impact on student engagement and performance.


Introduction
Online learning has been successfully developed in many countries for several decades.In Indonesia, several online learning strategic initiatives have emerged at a national level: Sistem Pembelajaran Daring Indonesia (SPADA), the Indonesia Online Learning System, initiated by the Ministry of Research and Higher Education (Pannen, 2021), and Rumah Belajar were developed by the Centre of Information of the Communication Ministry of Education and Culture; Universitas Terbuka (the Open University) has conducted online distance education programmes since 1984 (Pannen, 2021); Indonesia-X was implemented in 2014 (Pannen, 2021); and in 2005, the Student-Centered e-Learning Environment (SCeLE), initiated at the Faculty of Computer Science, was implemented in Universitas Indonesia (Hasibuan & Santoso, 2005).These initiatives have received wide attention.However, many obstacles arose during the implementation, including a lack of Information and Communication Technology (ICT) skills and online learning knowledge, low proficiency in the English language, inadequate infrastructure, a lack of technical support, little financial aid, a lack of senior management policy, insufficient training in online learning, poor instructional design, and a lack of motivation and student behaviour monitoring (Naveed et al., 2017).Moreover, student preparedness was another barrier to online learning (Kasiyah et al., 2017).
Several experts have studied behaviour in education; for example, Liang et al's (2014) study focused on the relationship between learners' perceived learning experience, learning behaviours and learning outcomes with Massive Open Online Course.The studies conducted by Fauzi et al. (2018) and Punniyamoorthy and Asumptha (2019) on knowledge sharing between academics in India applied the theory of planned behaviour and the social capital theory to determine factors associated with Malaysian higher education academics' knowledge sharing intentions.In an online learning environment, teachers observe student behaviour through a student dashboard, which is typically used to provide information related to the students' frequency of access to the Learning Management System (LMS), assignment grade book, quizzes, exams, and the number of forum posts.
The LMS uses descriptive statistics to analyse statistics, such as student behaviour, average use, and student grade tables.Unfortunately, the previous research did not assess student behaviour from week to week using the LMS over time.Detection of each student's behaviour is required to reveal learning phenomena to improve educational outcomes (Romero et al., 2014) and as part of the student learning evaluation method (Cairns et al., 2015).Therefore, this study investigated student behaviour in the LMS using a process mining approach with the aim of improving the teaching instruction and strategy in online settings.The current research answers the three research questions below.

Research question 1:
What behaviour do students display in a learning management system for a course?'Behaviour' is defined as the way humans act, especially towards others, in responding to a particular situation or stimulus (Ipurangi, 2021).In learning, behaviour is reflected in students' words and actions (Levy, 2021).Therefore, by understanding student behaviour, teachers can direct students to follow the learning path and avoid hindrances, observe student responses to teaching strategies and prepare support plans if those strategies fail, understand how the behaviour of successful and less successful students differs, and predict student performance in the future.Therefore, this study aims to reveal student behaviour using an LMS.

Research question 2:
What is the students' process model in a course based on their behaviour in a learning management system?Students' learning behaviour is complex.Therefore, using a chart to analyse and present data about the student behaviour exhibited in an LMS is not adequate, and a method for sequencing the process is required.Educational analytics enable learning behaviour to be visualised.For example, the process model describes the workflow and activities in a graphic format.Teachers can easily see the learning process by using the process model to evaluate whether students follow the designed learning path.

Research question 3:
How do teaching strategies affect student behaviour in a learning management system?Stones and Morris (1972) define teaching strategy as a comprehensive plan for lessons that includes instructional objectives and an outline of the tactics to fulfil the learning objective.Teaching strategies are developed with technological advances.Blended learning is not limited to face-to-face teaching, but teaching strategies must ensure asynchronous student engagement.Teaching strategies can motivate students, direct students' focus on the learning path, monitor learning progress, organise various learning activities, and evaluate teaching and learning activities.This teaching strategy orchestrates learning resources with students.Therefore, more detailed research on the impact of teaching strategies on student behaviour is needed.This paper comprises five sections.In the first section, the researchers explain the research background and objective.The second section illustrates several areas related to the research, such as educational process mining and the LMS.The research methodology is discussed in the third section.The fourth section explains the results of the experiment and presents the discussion.The last section provides the conclusions and suggestions for future research.

Literature review
By using process mining, teachers can more easily understand student activities in LMS.In this decade, process mining has been used in many educational contexts: applying heuristic mining techniques to study online chat data from teams working on complex tasks (Reimann et al., 2009); using fuzzy mining techniques to examine the relationship between students' self-reported strategies and progress in self-regulated learning (Beheshitha et al., 2015); using the process mining approach and machine learning to make predictions to improve students' learning experience in extensive open online courses (Umer et al., 2017); using the process mining approach to explore students' behaviour and interaction patterns in different types of online quiz-based activities; and improving personalised learning by providing insights into the learning processes of students with diverse learning backgrounds (Intayoad et al., 2018).Therefore, the proposal is to implement a new algorithm for educational data called Inductive Miner (Bogarín et al., 2018).

Educational process mining (EPM)
In the educational world, there are many actors, processes, systems, and other connected entities.For example, Fig. 1 shows several roles in academic institutions, such as students, teachers, senior management, and administrative staff (Cairns et al., 2015).They are supported by information systems, for example, the LMS to help teachers conduct the learning process, the Human Resources Information System to manage employees, and the Applicant Information System to manage student candidates.Typically, information systems store user activity as log data.These logs can discover process models and improve existing processes; process mining is used in this context.Process mining is a process-centric technique used in educational data mining that can extract knowledge of event logs commonly available in current information systems (Romero et al., 2016).It can be defined as a new method that constructs on process model-driven approaches and data mining that provides an exhaustive toolkit to produce fact-based insights and promote process enhancements (Van der Aalst, 2011).There are three steps in process mining: discovering an accurate process model; performing a conformance check to reconcile the event logs to a prescribed process model; and enhancing and expanding the model (Cairns et al., 2015).
The top five research topics on process mining are algorithm discovery, conformance checking, process mining application, architecture, and tools and methods for process mining projects (Dos Santos Garcia et al., 2019).In addition, the top three process mining algorithms are Heuristic Miner, Alpha and its variations, and Evolutionary-based algorithms (Dos Santos Garcia et al., 2019).Several commercial process mining tools have been developed, such as Disco, ARIS Process Performance Manager, Celonis Process Mining, ProcessAnalyzer, Interstage Process Discovery, Discovery Analyst, and XMAnalyzer.ProM is a complete process mining environment and the most frequently used tool in EPM in around 84% of cases (Ghazal et al., 2017).Process mining has been implemented in several domains, such as healthcare, manufacturing, education, and finance (Dos Santos Garcia et al., 2019).
In the educational context, EPM involves discovering and analysing processes and flows in event logs generated by educational environments (Romero et al., 2016).EPM aims to build complete and compact educational process models to make tacit knowledge explicit and better understand the education process (Trcka & Pechenizkiy, 2009).

Student-centered e-learning environment: A Moodle-based LMS
The student-centered e-learning environment (SCeLE) is a Moodle-based LMS initiated by the Faculty of Computer Science, Universitas Indonesia.SCeLE was developed by customising modules from the Moodle platform.It was introduced in 2005 and is still used by all students and lecturers to interact in learning and teaching activities.By default, SCeLE has a report plugin that provides a student activity log based on date, event, and name.Unfortunately, teachers can only see a list of activities using this plugin.Moreover, the data do not report the list's meaning and value for instruction and learning materials.Therefore, the researchers propose the use of process mining to detect student behaviour during one academic semester.

Context of the study
This study was conducted in the Computer-Assisted Instruction (CAI) course, a mandatory course at the Faculty of Computer Science, Universitas Indonesia (Fasilkom UI).CAI is the only course at Fasilkom UI that focuses on education and allows students to use their technical knowledge to promote technology as an enabler of education.In CAI, lecturers discuss how computers assist in learning and teaching.The course covers cognition, learning theories, metacognition, digital content, and LMS.In addition, lecturers explore current issues in online learning, such as MOOC, personalised learning, collaborative learning, and educational research.A course page in SCeLE is displayed in Fig. 2.

Fig. 2. Course page structure
As shown in Fig. 2, the LMS course page structure comprises a topic title for every week, learning objectives, class activities, a forum, learning materials, and weekly reflection.This course adopts blended learning, where students are expected to be actively involved in the whole learning process, both face-to-face and online.Lecture activities consist of interactive lectures (1 × 50 minutes), discussions (2 × 50 minutes), group presentations (2 × 50 minutes), and role playing (1 × 50 minutes).The assessment includes individual participation (5%), group participation (5%), individual assignment 1 (15%), individual assignment 2 (10%), group assignment 1 (10%), group assignment 2 (35%), and mid-test (20%).For example, in week one, students complete nine activities, such as reading the article 'Historical Overview of Learning & Technology', reading the article 'How People Learn', reading the article 'Learning: From Speculation to Science', reading the article 'Book Chapter -Learning: From Speculation to Science', watching a video with the title 'Fish is Fish' by Michael Competiello (2013), watching the video 'Brain Connectivity Study Reveals Striking Difference Between Men and Women', reading the 'Weekly Reflection Guideline', submitting weekly reflection #1, and discussing the 'Discussion Forum week #1'.The instructions to students for each week are listed in Table 1.

Research process
Logs have been widely used to analyse student behaviour.For example, Bousbia et al. (2010) analysed the relationship between learning styles and navigational behaviour, Tessier and Dalkir ( 2016) conducted a log analysis to ascertain how knowledge transfer is carried out, and Simcock et al. ( 2019) used logs to investigate whether learning might have happened.In this study, LMS logs are key to construct the process model.As shown in Fig. 3, the researchers answer the research questions in several stages: collecting log data from SCeLE; pre-processing the log data to remove irrelevant activities; implementing event grouping; converting the SCeLE logs to event data logs; clustering the logs based on students' final grades; and using Disco to find process models.

Converting LMS logs to event data
The Moodle LMS has several attributes in the log table as its default: • full name -attributed to the student's name • time -attributed to the timestamp • event's context -attributed to the context of an event, such as the title of learning materials, discussions, and topics • component -attributed to a group of activities on Moodle, such as system, forum and file • event's name -attributed to specific activities such as course viewed, and discussion viewed • description -attributed to specific information from the event, which is usually in the form of a statement • origin -attributed to devices used to access the LMS, such as web and mobile • IP address -attributed to the internet protocol value.
An example of the Moodle logs can be seen in Table 3.This study used Disco to discover the process model.Disco requires an event log with a minimum of three elements as input: case ID, in which the case ID defined the scope of the process; activity, in which the activity defined the steps in the process map and their granularity; and timestamp.Elements can be added as needed.These attributes were converted to simplify the process mining tools input: • trace -this attribute was linked with the Moodle Log Component value.• event -this attribute was linked with the Moodle Specific Activity value.• time -this attribute was linked with the Timestamp value.• resources -this attribute was linked with the Student Name value.
An example of the Disco logs is shown in Table 4.

Manual clustering
Learning is a complex process; therefore, the researchers conducted clustering as preprocessing to improve and simplify EPM.First, clustering was applied to students with similar characteristics, and process mining was implemented to discover more specific student behaviour models.In this study, the researchers used a manual cluster, grouping students by using only their final grades.There were three clusters of students in this course: (1) cluster 1 -students with final grades greater than or equal to 80; 41 students were assigned to this cluster.
(2) cluster 2 -students with final grades between 65 and 79; 9 students were assigned to this cluster.
(3) cluster 3 -students with final grades between 0 and 64; 2 students were assigned to this cluster.
Clusters 1 and 2 included successful students, and cluster 3 included unsuccessful students.

Discovering a process model
The vital step in this research was to discover a process model.To find this model, the researchers used Disco as a process mining tool.Disco was selected since it is a popular tool for process mining that supports import and export log files and provides extensive form filtering, such as timeframe, performance, event log, variation, and attribute.In addition, Disco has a user-friendly interface and reports in several formats, such as charts, animation, and graphs.The process model was discovered in four steps, as shown in Fig. 4: upload dataset; set up configuration, such as activity and path (50% activity and 25% path); view the process maps; and analyse each process map.As a result, the researchers discovered a process model for each cluster to gain an understanding of and explore students' behaviours.

Research Question 1: What behaviour do students display in a learning
management system for a course?
activity for the three clusters was observed on lecture days (Tuesday and Wednesday).This finding was consistent from the first week of lectures to the ninth week, as the teacher gave varied instructions from the first week to the mid-test period, such as reading material, watching videos, discussion forums, and the influential weekly reflection.After the mid-test, student activity mainly decreased, apart from several days when activity increased in relation to individual and group assignments.This finding reinforces Firat's (2016) finding that the students spent time on the LMS on the days of face-to-face classes.

Fig. 5. Daily usage for each cluster
As shown in Fig. 6a, cluster 1 was very active in the first week, activity decreased in the following week, and it increased again in the fifth to seventh weeks (before the mid-test).Clusters 2 and 3 showed similar behaviour, where the first week was quite active, and activity increased significantly in the second week.Then, activity fell from the third week to the seventh week.After the mid-test, the three clusters showed similar behaviour, with activity decreasing from the ninth week to the last week.

Fig. 6. Weekly usage for each cluster
As shown in Fig. 6b, clusters 1 and 2 exhibited the same behaviour.In both clusters, 100% of students were active from week to week until the mid-test, whereas only 50% of students in cluster 3 were active for four weeks before the mid-test.From the mid-test until the end of the semester, the three clusters showed similar behaviour, where not all students actively used the LMS.Unfortunately, there were also four weeks when the cluster 3 students did not use the LMS.

Fig. 7. LMS top feature usage
Cluster 1 and cluster 2 showed similar behaviour in using features of the LMS (see Fig. 7).The frequently used elements included the system, such as course views, discussion forums, assignments, and files.This usage profile started from the first week and lasted until the end of the semester.Cluster 3 used the assignment and file feature more frequently, but not the online forums.This finding supports Firat's (2016) finding that discussion forums, assignments and content are the top features used by students.This finding implies that teachers need to design strategies that encourage students to remain actively involved in the LMS outside face-to-face and other synchronous activities.In addition, teachers need to increase the online discussion forums since one of the differences between successful and unsuccessful students is engaging with this feature.Consequently, teachers must provide different trigger questions to encourage students to be more active in discussions.Moreover, this finding supports the use of log data to detect early student performance (Riestra-González et al., 2021).

Research Question 2: What is the students' process model in a course based
on their behaviour in a learning management system?
This research used process mining to uncover student behaviour.Romero et al. (2016) and Juhaňák et al. (2019) also used process mining.As shown in Fig. 8, the process model in cluster 1 is relatively more complicated than the other process models.In cluster 1, the process starts when students are already enrolled in the course, visiting course viewed.The model then splits into different possible routes.One route continues via course module viewed and the other via discussion viewed.After course module viewed, the model divides to take another possible path, continuing via discussion viewed and submission form viewed.This process model indicated that students always viewed the learning module before visiting the assignment form since the teacher described the exercise through the learning module.In addition, students visited the discussion forum after seeing the course module, and vice versa.Therefore, it was concluded that the route for cluster 1 was to visit the course page, look at the modules on the course, observe and engage in discussions, and submit their assignments.The behaviour was repeated almost every week.

Fig. 8. Cluster 1 process model
As shown in Fig. 9, cluster 2's process model is more straightforward than that of cluster 1 but more complicated than that of cluster 3.In cluster 2, the process started when students were already enrolled in the course, visiting course viewed, and then the model splits into different possible routes.One path continues via course module viewed and the other via discussion viewed.After course module viewed, the model splits into another possible route.Again, one route continues via discussion viewed and the other via submission form viewed.
In contrast to cluster 1 students, cluster 2 students did not directly visit the discussion forum after seeing the learning module, and vice versa.It was concluded that the route taken by cluster 2 was to visit the course page, look at the course modules, and then observe and engage in discussions or submit their assignments.The behaviour was repeated almost every week.Therefore, cluster 1 and cluster 2 had relatively similar process models.

Fig. 9. Cluster 2 process model
As shown in Fig. 10, the process model in cluster 3 was simpler than the other process models.In cluster 3, the process started when the students were already enrolled in the course, then they visited the course page and looked at the course modules.However, cluster 3 rarely engaged in discussions and submitted assignments, indicating that cluster 3 differed significantly from clusters 1 and 2. These process maps revealed that successful students followed the learning path while less successful students did not.Therefore, teachers must design teaching strategies that provide early or real-time detection of students who do not follow the learning path.The technical implication requires the development of visualisation for the LMS that provides process maps for instructors.This idea aligns with Cerezo et al's (2020) suggestion that visualisation is essential and would help to make appropriate, real-time decisions during the teaching-learning process.Based on the LMS usage chart and process model, it was found that each cluster had a significant pace in the weeks before the mid-test and slowed after the mid-test until the end of the semester.The reason was that the teachers provided instructions from the first week to the seventh week.During this period, teachers consistently asked students to read learning materials, such as teacher presentations, eBooks, or external sources; watch videos; and answer trigger questions in online forums; and encouraged students to complete self-reflection each week.However, during the period after the mid-test, teacher instructions decreased, and student activities also slowed.In the weeks after the mid-test, students paid more attention to group projects where student interaction was conducted outside the LMS.In addition, the points that the teachers provided for student participation on the LMS affected student activity.The researchers used correlation analysis to identify the relationship between teaching instructions and student usage behaviour for each cluster (see Table 5).
Teaching instruction strongly correlated with successful students' (cluster 1 and cluster 2) behaviour, whereas it had a weak correlation with unsuccessful students' behaviour (Cluster 3).This finding is consistent with the findings of other studies, such as those conducted by Jaggars and Xu (2016), Watson et al. (2017), andYang (2017), in which they noted that instructional strategies have a positive impact on student engagement and successful student performance.

Conclusion
Online learning is growing continuously, and studies exploring how technology can help teachers analyse student behaviour during online learning will receive more attention.It is essential to analyse student behaviour since it exhibits many critical phenomena, such as student independence, learning material preferences, motivation level in learning, student interaction, and self-efficacy.
The current study adopted a process mining approach to gain an understanding of student behaviour in an LMS.Several findings were obtained in this study.First, it was found that active students achieve maximum academic performance, and vice versa.Second, the process model of high-performing students was more complicated than that of other students.Third, systematic and mixed teaching strategies have a strong impact on student learning behaviour.Finally, student usage behaviour and online learning process models can be used to estimate student performance and course completion.This research has several limitations, however.First, the findings of this study were generated from one course and based on observed activities in the LMS only.Therefore, the findings cannot be generalised to other subjects and beyond LMS.Second, the process model is difficult to understand (Cerezo et al., 2020).Accordingly, high-level coding, such as a self-regulated learning strategy scheme, is needed to simplify it.In addition, there are many other research opportunities, such as exploring student interaction in online forums, participation behaviour, student motivation levels, activity completion patterns, self-regulated learning, personal goals, learning plans, time management, and student preparedness.

Table 1
Instructions to students

Table 5
Pearson correlation analysis result