The potential contributions of concept maps for learning website to assessment for learning practices

: The purpose of this paper is to examine the promising contributions of the Concept Maps for Learning (CMfL) website to assessment for learning practices. The CMfL website generates concept maps from relatedness degree of concepts pairs through the Pathfinder Scaling Algorithm. This website also confirms the established principles of effective assessment for learning, for it is capable of automatically assessing students’ higher order knowledge, simultaneously identifying strengths and weaknesses, immediately providing useful feedback and being user-friendly. According to the default assessment plan, students first create concept maps on a particular subject and then they are given individualized visual feedback followed by associated instructional material (e.g., videos, website links, examples, problems, etc.) based on a comparison of their concept map and a subject matter expert’s map. After studying the feedback and instructional material, teachers can monitor their students’ progress by having them create revised concept maps. Therefore, we claim that the CMfL website may reduce the workload of teachers as well as provide immediate and delayed feedback on the weaknesses of students in different forms such as graphical and multimedia. For the following study, we will examine whether these promising contributions to assessment for learning are valid in a variety of subjects.


Introduction
Assessment is one of the crucial components of education (Gikandi, Marrow, & Davis, 2011), and is required for three broader aims (Black, 1993):  The certification of individual student achievement;  The accountability of educational institutions via the comparison of results;  Direct assistance to learning through useful feedback.

Assessment for learning
Recently, there has been a shift from a primary focus on summative assessment to formative assessment (Irons, 2007;Greenstein, 2010), yet what formative assessment is remains an issue.Some claim that formative assessment refers to assessment instruments whereas others believe that it is a process (Bennett, 2011).Defining formative assessment as only an instrument causes confusion when formative feedback is given on summative assessment (Irons, 2007).For instance, 3 2 + 2 x 4 is an item which assesses students' understanding of the order of operations in mathematics.If the item is asked at the end of a unit, it has summative value.Provided that a student answered 44 rather than 17, it may be concluded that the student ignores the hierarchy of multiplication over addition.This information might have formative value if the teacher makes some instructional decisions to address this misconception (Good, 2011).Therefore, even though an assignment might be designed or planned as summative, its methodology, data analysis and use of results determine whether it is formative or summative (Dunn & Mulvenon, 2009).Consequently, the mistake of incorrectly concluding that teachers are able to infer students' specific misconceptions from every item of summative assessments might be made.Due to this issue, the supporters of the process view started using the term "Assessment for Learning" or "Formative Evaluation" instead of "Formative Assessment" in the literature (Bennett, 2011).With this regard, Cizek (2010) defines assessment for learning as: "the collaborative processes engaged in by educators and students for the purpose of understanding the students' learning and conceptual organization, identification of strengths, diagnosis of weaknesses, areas for improvement, and as a source of information that teachers can use in instructional planning and students can use in deepening their understandings and improving their achievement (pp. 6-7)."

The effectiveness of assessment for learning
Despite the challenges to assessment for learning including resources and time (Cizek, 2010), several studies reveal the positive impact of assessment for learning on students' achievement.For example, Black and Wiliam (1998) concluded that assessment for learning increases students' performance by reviewing around 250 articles based on assessment for learning.These studies are related to feedback, self-assessment, and peer assessment.Likewise, Nyquist (2003) came to a similar conclusion after conducting a meta-analysis on use of feedback for formative assessment purposes.And, in an empirical study of 24 teachers who received six-months of training to develop formative assessment practices, Wiliam, Lee, Harrison, and Black (2004) demonstrated the positive impact of using assessment for learning in classrooms.
However, the generalizability of these findings has been questioned by some.For instance, Bennett (2011) has argued that the mean effect size computed by Black and Wiliam (1998) is based on studies that are too diverse to be meaningfully combined, and that the meta-analysis conducted by Nyquist (2003) was too narrowly focused on the college-level population.Similarly, Dunn and Mulvenon (2009) have raised methodological concerns that may limit the conclusions drawn by Wiliam, Lee, Harrison, and Black (2004).Because of these issues, the specific principles of assessment for learning should be further addressed.MacDonald (2007), while studying the implementation of formative assessment in classrooms, suggests that although the concept of formative assessment may be attractive to teachers, their usual criticism is that it is challenging to utilize in teaching practice consistently.Teachers describe the need of customizing teaching practice for individual students in large classes as well as helping the students who they deem to be more challenging to work with.Their biggest test, however, is to conform to the enormous requirements of standardized curriculum in restricted time periods (Wissick & Gardner, 2008).Trumpower, Filiz, and Sarwar (2014) discuss the needs of teachers using formative assessment.They succinctly lay out the criteria (referred to as 'elements' in this paper) for teachers to carry out effective formative assessment that it must (1) assess higher order knowledge, (2) identify students' strengths and weaknesses, (3) provide effective feedback, and (4) be user-friendly.Further analysis of literature indicates that another element should also be added to their existing scheme, specifically teachers' ability to track student learning over time.This will be discussed as the fifth element.

Higher order knowledge
Involving the students actively and cognitively in the learning process helps the teachers take interest in developing their understanding of the subject matter instead of simple information intake.Bransford, Brown, and Cocking (1999) and more recently, Hwang and Chang (2011) discuss at length the significance of this process, where students also learn to discern how their own and their peers' conceptual awareness of the ideas compare with the criteria and learning outcomes.This emphasis on the process of learning stresses the need to develop students' higher order knowledge.Studies show that students who focus on the learning process and not only passing the test accomplish better results and retain information longer (Wiliam, Lee, Harrison, & Black, 2004).

Students' strengths and weaknesses
For formative assessment to be effective, it must also be specific (Trumpower & Sarwar, 2010).A student's particular strengths and weaknesses must be identified in order for the teacher to help scaffold learning.The teacher and potentially the students themselves can improve learning based on the remedial instruction provided once the gap is identified between existing knowledge and the unit objectives.Teachers can also obtain further evidence of student learning using wide-ranging tasks, using a range of media, and applying new and existing skills in new situations (Yorke, 2003).This also provides students with an opportunity to explore new subjects from different perspectives and to strengthen their understanding and ability to transfer knowledge to different contexts.Needs based on different situations in the classroom and of students with different backgrounds can also be met with the use of varied methods of instruction.This enables teachers to select from a broad range of tools to help different students reach their learning goals.

Useful feedback
Wiliam, Lee, Harrison, and Black (2004) reported what has now become a popular idea that feedback is most effective when given in a timely manner.The authors conducted a study in which the results indicated that when teachers provided formative feedback within or between teaching units in their classrooms, students' progress rates were almost doubled, compared to the control classrooms, over the course of the year.It is also essential to note that feedback should not be provided rapidly since students should be given an opportunity to think and reflect about the concepts before seeking help from teacher or peer feedback.Rushton (2005) further indicates that learning gains tend to be bigger when feedback is focused on the task with specific suggestions for improvement rather than focusing on student's personality, even if it is positive.Effective feedback should refer to clear criteria, learning goals, and expectations for performance.It is also important to balance the amount of feedback based on the needs of the student.Bransford, Brown, and Cocking (1999) reports that formative assessment is not popular among many teachers as they view it as a needless surcharge to their existing unmanageable workloads.Therefore, it can be concluded that formative assessment must be user friendly in order to succeed and the entire formative assessment process must be easy to use if it has to become a viable option in the classroom (Trumpower & Sarwar, 2010).There are different perspectives on the process being user-friendly, however.A teacher's perspective, although rewarding, could involve a tediously long process of providing remedial feedback to each student whereas students could reap substantial benefits from the assessment feedback and be motivated to use it only if they feel it is easy to use and access.

Tracking learning over time
After the identification of the learning goal, teachers assess the existing knowledge level of the student about the unit.Thereafter, progress is measured in comparison to that level toward the goal.Teacher's regular interaction with students is a key feature in classrooms using formative assessment where evidence is collected over several intervals about the growing understanding of the new concepts (Ballantyne, Hughes, & Mylonas, 2002).Teachers can utilize techniques like sophisticated questioning and extended dialogues to explore students' thinking to detect any misconceptions and to help them correct the conceptual errors.Köller (2001) found positive effects on student's deep learning in several experiments and field studies where teachers gathered new evidence of student progress and understanding at successive stages.

Technology assisted assessment for learning in statistics education
Statistics educators face challenges in addressing their students' individual needs and providing students with specific feedback, because introductory statistics class size is always large as well as the students in this class have different academic backgrounds and various level of preparation, (Im & Yin, 2009).Therefore, Aliaga et al. (2005) reveal technology should be used for not only computing numbers but also for exploring conceptual ideas as well as boosting student learning.
To meet these needs in statistics education, a few computer applications have been developed such as Korelle, Stats-mIQ and Simulation Assisted Learning Statistics (SALS).To begin with, Koralle consists of worked examples which are methodically linked with problem-solving tasks.Students are first required to solve a problem, and then a worked example demonstrating the correct solution procedure is presented.Next, students can compare their own solutions to the example information (Krause, Stark, & Mandl, 2009).Therefore, while students are interacting with Koralle, they receive only corrective feedback on problem solution procedure.However, the software fails to automatically identify students' errors of solving statistical problems, so effective feedback for addressing these errors cannot be given.Second, the Stats-mIQ contains a series of multiple-choice quizzes.Each selected answer corresponds to extensive feedback.If the correct answer is selected, given feedback is corrective.Otherwise, students receive metacognitive feedback for addressing misconceptions and biases (Kleitman & Costa, 2014).Although students are given corrective and metacognitive feedback in the Stats-mIQ, whether students comprehend knowledge received through metacognitive feedback cannot be measured due to lack of providing new problems associated with the feedback.Finally, SALS consists of four phases: Externalization, Reflection, Construction and Application.In Externalization phase, students are posed a question whose context is related to daily life.Through this question, students realize their own ideas on the statistics concepts they are learning.In Reflection phase, students are asked to express their own ideas that were externalized in the Externalization phase.In Construction phase, students are required to interact with the Dynamically Linked Multiple Representations (DLMRs) via clear learning guides revealing the definition of key concepts and manipulating procedures.Therefore, students can identify the relationships among different representations and then construct their own concepts.Application phase is the last phase allowing students to solve two problem situations in different contexts and with different solution paths.As a result, in SALS let students both build their own concepts by the way of manipulating DLMRs and apply their conceptualization to new problems (Liu, 2010).
Even though these computer applications provides immediate feedback on students' performance and let students transfer their conceptualizations to new problems, none of these computer applications assesses students' prior knowledge.Several studies claim that identifying students' prior knowledge is crucial.For instance, Krause, Stark, and Mandl (2009) argue that students having little prior knowledge cannot effectively compare their own solutions with the worked examples in Koralle so they might not benefit from the standardized feedback.In addition, Leppink, Broers, Imbos, van der Vleuten, and Berger (2012) and Kalyuga (2009) claim that using prior knowledge for structuring and integrating new information can help students develop the latter ability.Such knowledge integration can be done through graphic organizers such as concept maps (Yin, 2012).Thus, ideal technology assisted assessment for learning tool in statistics education should explore each student's prior knowledge (strengths and weaknesses), provide immediate feedback on their weaknesses and then identify whether the provided feedback works for every students.
We have been considering these needs and developing a concept mapping website (conceptmapsforlearning.com) based on the aforementioned principles of effective formative assessment in our research lab (Filiz, Trumpower, & Atas, 2012).Within the website, students create concept maps on a particular subject and then receive individualized feedback and associated instructional material (e.g., videos, website links, examples, problems, etc.) based on a comparison of their concept map and a subject matter expert's map.After studying the feedback and instructional material, teachers can track their students' progress by having them create revised concept maps (Filiz, Trumpower, & Vanapalli, 2014).

Brief presentation of the concept maps for learning website
This section explores how the topics of concept maps are chosen and how students interact with the concept maps.

Choosing a topic
A topic is chosen if either students face difficulties in explaining concept relations underlying a formula, or, students have misconceptions on this topic.
Of particular relevance to this presentation, Analysis of Variance (ANOVA) is chosen.Castro Sotos, Vanhoof, Van den Noortgate, and Onghena (2007) reveal that although students can manipulate and carry out calculations (ex.ANOVA) with statistical data, they might still have several misconceptions when results from inferential techniques are interpreted.For instance, students may only pay attention to p-value and omit information related to interpreting effect size (Broers, 2009).Even though few animated ANOVA tables illustrating a few of their properties are existed (Stirling, 2010;Van der Merwe & Wilkinson, 2011), the number of studies on the conceptualization of ANOVA calculations is limited (Van der Merwe, 2012).Thus, for identifying each student's strengths and weaknesses as well as providing feedback on their specific weaknesses via the CMfL website, the following concepts are selected: F-ratio, Number of groups, Between-groups mean square, Within-groups mean square, Sum of squares between groups and Degree of freedom.Some might critique that the number of concepts in this set is fewer than it should be.For example, Cooke (1999) and Goldsmith, Johnson, and Acton (1991) claim that there should be at least12 concepts in the set, whereas Casas-García and Luengo-González (2013) argue that a set should include no more than 12 concepts to accurately assess a domain.Therefore, it appears that there is a consensus about using around 12 concepts to examine the rigor of the domain.Some studies, however, have used fewer than 12 concepts to assess a domain but stress the need of the core concepts being endorsed by domain experts.For instance, Casas-García and Luengo-González (2013) used a set of 11 concepts to generate students' knowledge maps about the concept of angles.After a sample of Primary (Grade 3P to 6P), Secondary (Grade 1E to 4E), Olympiad (2E), and Mathematics Undergraduate students completed pairwise ratings of the concepts, the representative knowledge map of each group was generated.Once they were compared to each other and similarity indexes between the knowledge maps were computed, a Kruskal-Wallis test was performed.Results showed that the similarity between the knowledge maps increased as age and experience of the students increased (H= 15.252, 7 d.f., p= 0.0329).Similarly, Boring (2005) identified 10 core concepts of an introductory psychology course in order to compare students' essay scores (g), which are given by the human graders (holistic and analytic) and the computerized grader, with Pathfinder similarity index (C).For the holistic graders, g = 0.414C -0.048,where R 2 = 0.001 and p > 0.05.Thus, the numbers show that the degree of Pathfinder network similarity did not prove to be a sound enough indicator of the grade given by the holistic graders.It was also found that the relationship between the Pathfinder network similarities and the score awarded by the analytic graders was not significant (R 2 = 0.012 and p > 0.05).On the contrary, the degree of similarity between the computerized grader's Pathfinder network and the student essay writer's network generated a comparatively good indicator of the grade awarded for that paper (R 2 = 0.247 and p = 0.06).These studies are examples of how researchers who used less than 12 concepts presented in their results that the generated knowledge maps were valid measures of students' comprehension.
Thus, we decide to generate an expert map of the selected concepts.To do so, three expert are asked to perform the pairwise ratings of these concepts by using a scale of 1(less related) to 5(more related) (see Fig. 1.).Afterwards, these ratings are converted to binary values through the PathFinder scaling algorithm (see Filiz, Trumpower, & Vanapalli, 2014).Then, the expert map (see Fig. 1.) is generated based these computed values.Also, the accurateness of this expert map is confirmed through the formula used for computing F-ratio.Fig. 1.A pairwise rating task and the expert map of ANOVA Thereafter, teachers using the website choose the topic "ANOVA" from amongst the topics.Students are then provided with the concepts corresponding to the chosen topic and rate the degree of relationship between the concepts in order to generate their concept map.The website then compares each student's concept map with the expert concept map to generate the individualized feedback for each student.This comparison is scored via C (configural similarity) measure which is computed by dividing the number of common links in two knowledge maps by the total number of links in both knowledge maps (Clariana, 2010).Through this similarity index, teachers find out degree of closeness between students maps and the expert map.Apparently, the value of this measure varies between "0" and "1".On the assumption that this value is always smaller than "1", our basic research question is whether or not various individualized feedbacks on the CMfL website help make this value much closer to "1".
In the following section, there are two examples of students' concept maps to demonstrate how students having different concept maps receive individualized feedback on their incomplete understanding of ANOVA calculations.

Demonstration of students' interaction with the website
After the rating task is performed, students receive three different types of feedback in the form of visual, textual, and linked instructional material.
First, visual feedback is comprised of a visual presentation of the expert concept map superimposed over the student's map with any discrepancies highlighted by different types of links.This visual feedback map consists of three different links relevant link, extraneous link and missing links.

Fig. 2. An example of two students' visual feedback maps
As seen in Fig. 2, a black line appears provided that there is a link between two concepts in both an expert's map and the student map (ex."Degrees of Freedom and Number of Groups" or "Between Groups Degrees of Freedom and Number of Groups").This type of link will be referred to as a relevant link.A grey dotted line appears if there is a link between two concepts in the student map, but not in the expert's map (ex."Degrees of Freedom and F-ratio" or "Number of groups and F-ratio").This is referred to as an extraneous link.Finally, a red dashed line appears (ex."Between Group Means Square and F-ratio" or "Between Group Means Square and Degree of Freedom") if there is a link between two concepts in the expert map, but there is no link between these concepts in the student map.Such links are referred to as missing links.
In addition to this visual feedback, additional instructional material is linked to any missing links.First, when students move the mouse cursor over a missing link, a text message appears which explains how the associated concepts are related; these explanations have been provided by subject matter experts, but can be modified by individual teachers using the website.As seen in Fig. 3,, when Student A clicks on the missing link between "Between Group Means Square" and "F-ratio", a text message pops up as "The F-ratio can be thought of as a measure of how different the means (ex.Between groups mean square) are relative to the variability within each sample".Likewise, when Student B clicks on the missing link between "Between Group Means Square" and "Degree of Freedom", a text message appears which reads "Once the sums of squares have been computed, between groups mean square is computed by dividing sums of squares by the degrees of freedom".Second, if students double click on a missing link, they are able to access linked instructional material intended to illustrate the ways in which the associated concepts are related (e.g., videos, website links, examples, problems, etc.); again, this material has been provided by subject matter experts, but additional material can be added by individual teachers (see Appendix a).

How digital concept mapping website contributes to assessment for learning practices
One of the crucial contributions of the CMfL website to assessment for learning is to automatically identify each student's misunderstandings and misconceptions.Students taking introductory statistics courses are more like to have diverse backgrounds and prior knowledge.Consequently, students getting same total score from the exam cannot have similar strengths and weaknesses (Im & Yin, 2009).Therefore, identifying each strengths and weaknesses is crucial.Typically, the task of diagnosing misunderstandings is a difficult one, requiring both specific skills (e.g.how to conduct an interview) and plenty of time for developing an appropriate assessment task and for evaluating it (Browning & Lehman, 1988).Conversely, teachers are required to provide very minimal input in concept maps for learning website.Although they may add their own explanations and linked content, they are only required to choose a topic and submit a list of students for whom they wish to grant access to the website.
An additional contribution of the CMfL website is that it is capable of providing immediate feedback.Shute (2008) notes that although high-achieving students may benefit from delayed feedback, immediate feedback might be more useful for low-achieving students.The author also reveals that immediate feedback is required for difficult tasks which are associated with higher order knowledge.Krause, Stark, and Mandl (2009) also note that lecturers save time via automatic feedback which provides immediate and individualized feedback in large size classes such as introductory statistics classes.
Moreover, providing feedback on the weaknesses of students in different forms is another of the crucial contributions of the CMfL website to assessment for learning, for research has shown that students are most likely to ignore verbal and written feedback (Shute, 2008;Lee, 2009).The concept maps for learning website provides both visual feedback and linked feedback as a form of associated learning activities including videos, games, or cartoon.Especially, students get informed about the similarities and differences between their maps with the expert map through visual feedback map.Krause, Stark, and Mandl (2009) find that students with little prior knowledge may ineffectively compare their own solutions with the worked example.Likewise, these students would not effectively compare their maps with the expert map if these maps were given side by side.Furthermore, the concept maps for learning website is most likely to promote equitable education (Gikandi, Marrow, & Davis, 2011), because each student's weaknesses (misconceptions and/or misunderstanding) and strengths is diagnosed through computer based assessment for learning software.In addition, students are able to study received feedback from a variety of associated instructional multimedia materials.As a result, the CMfL website gives an opportunity to every student for studying on their misunderstandings as long as they need.
Finally, our concept maps for learning website may help students improve their general problem solving skills.In a related study, Schacter, Herl, Chung, Dennis, and O'Neil (1999) found that computer based concept mapping tasks improve students' problem solving performance.Particularly in this study, we demonstrate how students' ability of performing ANOVA test might be boosted by increasing their conceptual knowledge of this formal procedure through the CMfL website.

Conclusion
According to Kleitman and Costa (2014), formative assessment tools consist of structured items such as multiple choice questions, mostly attached with structured instructions and extensive feedback.Since these tools supply timely and decisive diagnostic information, they should be used in introductory statistics classes.In general, these tools help students to complete gaps in their knowledge while addressing their certain misconceptions (Broers, 2009;Im & Yin, 2009).Unlike these formative assessment tools using multiple choice questions such as Koralle, Stats-mIQ and SALS, the assessment component of the CMfL website is concept maps.Therefore, this website has potential to offer further benefits.
One of the benefits of the CMfL website is that it can aid the instructors by automating many tasks in formative assessment.It also generates immediate and delayed feedback in variety of forms after automatically identifying each student's misunderstandings and misconceptions of the subject area.The CMfL website also has the potential to promote equitable education as well as helping students improve their general problem solving skills.This paper discusses whether these favorable contributions to assessment for learning are valid in statistics education.
For future studies, we are planning to create more concept maps related to different subjects.We will examine whether these promising contributions to assessment for learning are valid in different subjects.

Fig. 3 .
Fig. 3.An example of two students' textual feedback maps