A meta-analysis examining the moderating effects of educational level and subject area on CSCL effectiveness

The positive effects of computer-supported collaborative learning (CSCL) on students’ learning outcomes and processes have been widely reported in individual empirical studies and meta-analyses. More specifically, in the meta-analysis by Chen, Wang, Kirschner, and Tsai (2018), the effects were found to be attributed to the three main elements of CSCL including collaborative learning, computer use, extra learning environments/tools or extra supporting strategies. This study extends that meta-analysis by examining the moderating effects of educational level and subject area on the effectiveness of CSCL. The moderating effects of educational level were found not to be significant on the effectiveness of collaborative learning, computer use, extra learning environments or tools, or extra supporting strategies with respect to student knowledge achievement. Subject area, on the other hand, was found to be a significant moderator for the effectiveness of extra learning environments or tools and extra supporting strategies. When using extra learning environments or tools for CSCL, larger effect sizes were found for engineering and science courses; when using extra supporting strategies for CSCL, larger effect sizes were found for science and social science courses. The results also showed that more studies were conducted at the university level and in 410 J. Chen et al. (2019) engineering, science, and social science disciplines.


Introduction
Drawing on social constructivism and shared cognition (Salomon & Perkins, 1998;Stahl, 2006), collaborative learning (CL) emphasizes that knowledge is shared among and sometimes co-constructed by two or more group members, mostly through social interactions (Dillenbourg, 1999). During this process, learners can make use of what is known as collective working memory (Kirschner, Paas, Kirschner, & Janssen, 2011) where group members can make use of each other's working memory capacity to share the cognitive load imposed by a task, process the task related information more deeply, and construct higher quality schemas in their long-term memories than learners working individually. Computer-supported collaborative learning (CSCL) focuses on how information and communication technologies (ICTs) can be used to support collaborative learning by facilitating the learning processes and knowledge sharing or co-construction (Kreijns, Kirschner, & Jochems, 2003;Stahl, Koschmann, & Suthers, 2006).
Empirical studies on CSCL have examined learning outcomes mainly including individual knowledge gains, individual skill acquisition (e.g., problem-solving skills, collaboration skills), individual perceptions (e.g., motivation, emotion), group task performance, and group learning processes such as social interaction and socially shared regulation of learning (Chen et al., 2018;Järvelä et al., 2016). The effects of CSCL have been examined in these measures, and have been synthesized in several meta-analysis such as Borokhovski, Bernard, Tamim, Schmid, and Sokolovskaya (2016), Jeong, Hmelo-Silver, Jo, and Shin (2016). In general, these meta-analyses have reported positive effects of CSCL on students' learning outcomes and processes.
A recent comprehensive meta-analysis by Chen et al. (2018) synthesized the effects of CSCL based on its three main elements: (1) collaboration per se, (2) use of computers, and (3) use of extra learning environments or tools (e.g., videoconferencing, digital games), or supporting strategies (e.g., peer feedback) in CSCL, and reported that all three elements produced small to medium effect sizes (ES) on learning outcomes and processes. For example, collaboration in computer-based learning settings produced significant positive effects on learners' knowledge achievement (ES = 0.44), skill acquisition (ES = 0.64), and perceptions (ES = 0.38). Moreover, in this meta-analysis, the moderator analyses examined the relationships between several study features (e.g., sample size, research design, intervention duration) and learning outcomes. And homogeneity statistics conducted for knowledge gain showed significant variances in effect sizes across studies and suggested that further grouping of studies was needed to explore potential moderators; in it, study features such as sample size, research design, and intervention duration were analyzed as potential moderators, however, educational level and subject area were not tested as potential moderators. The possibility that educational level and subject area may moderate the effectiveness of CSCL is supported by previous research such as Jeong et al. (2016) and Vogel, Wecker, Kollar, and Fischer (2017). This study aimed to extend the research of Chen et al. (2018) by investigating the moderating effects of educational level and subject area on CSCL. More specifically, the moderating effects were examined in terms of the three main elements of CSCL, namely (1) collaboration per se, (2) use of computers, and (3) use of extra learning environments or tools, or supporting strategies in CSCL.
2. To what extent do the effects of computer use in CL settings vary by educational level or subject area? 3. To what extent do the effects of the use of extra technology-mediated learning environments or tools, or supporting strategies in CSCL vary by educational level or subject area?

Method
Since this study is an extension of the prior research by Chen et al. (2018), the method used in this study are the same as that described in that article including the literature search process, inclusion/exclusion criteria used to filter the initially searched literature, coding framework, and statistical methods.

Literature search
The empirical studies on CSCL were searched in the online database of Web of Science as well as Google Scholar. The search terms included collaborative learning, cooperative learning, group learning, team learning, or CSCL; in addition, the terms computer, online, Web, Internet, network, technology, mobile, virtual environment, simulation, or game need to be included in the research topic. The Timespan was defined as 2000 to 2016, Document Type as Article, and Document Language as English. The search yielded a total of 3,500 articles. Then, these articles were further filtered on the basis of a number of inclusion/exclusion criteria.

Inclusion/exclusion criteria
The inclusion criteria include that the article must present an empirical study with a controlled quasi-experimental or experimental design, the equivalence of the experimental and control groups must be ensured, the learning content must be taught in the same way (teaching method was equivalent) in both the experimental and control conditions, students' academic learning outcomes (e.g., knowledge achievement, skills) or group task performance should be reported, and enough data for the calculation of effect size need to be provided. Exclusion criteria include that articles focused on special education or gifted education.

Coding framework
The substantive study features extracted from each study include educational level, subject area, number of participanting learners in both experimental and control conditions, measures or instruments such as students' knowledge achievement, and the treatment or intervention. As stated, the measures or students' learning outcomes included individual knowledge gain, skill acquisition, perceptions (e.g., attitudes), group task performance, and group process (see Table 1 for detailed descriptions of these learning outcomes).

Skill acquisition
Thinking skills (e.g., higher-order thinking skills, critical thinking skills), problem-solving skills (e.g., programming), group learning skills, measured by objective tests.

Perception
Measured by survey or questionnaire. 1. Evaluation of the overall course, learning system or environment (e.g., usefulness, ease of use, satisfaction, intention to use learning system or environment), 2. Perception or evaluation of specific learning approach or technique (e.g., perceptions of the collaborative learning approach, concept-mapping technology, intention to use), 3. Overall learning experience (e.g., enjoyment, engagement), 4. Attitude towards a specific discipline (e.g., attitude towards science, motivation to learn science, interest), 5. Perceived capability (e.g., competency, academic selfefficacy or self-concept), 6. Perceived performance in specific skills (e.g., problem solving, use of technologies, confidence in clinical management, social efficacy), 7. Perceived individual learning gains (e.g., perceived learning), 8. Perceived group learning outcome, 9. Perceived group process (e.g., social presence, cooperativeness).

Group task performance
Measured by group report, essay, assignment, problem solutions, other group artifacts (e.g., story, concept map), or the accuracy of completed sub-tasks, assessed at the group level. (Note that when the control condition was computer-supported individual learning, group task performance and social interaction were not included in the analysis.)

Social interaction
Task-related (e.g., argumentation, knowledge construction, meta-cognitive activities), Social activities (e.g., greeting), Offtask (e.g., technical, nonsense). Measured by quantitative process analysis or content analysis of discourse. (Note that if only the total number of discussion posts was reported without detailed categorization of discussion, effect size was not calculated for such interaction results.) Educational level was categorized as pre-school, primary, secondary, university, and adult (i.e., personal or group development in the workplace, such as software programmers from the company, primary care professionals). Subject area was categorized as (1) Art, (2) Business and Management, (3) Engineering (e.g., computer technology, mechanical engineering), (4) Language (e.g., reading and writing courses of English, Spanish), (5) Medicine, (6) Science (e.g., mathematics, physics, chemistry, geology, geography, biology, earth science, nature science), or (7) Social science (e.g., psychology, educational courses).
According to Chen et al. (2018), the selected studies were categorized into four categories based on the interventions, namely studies: (1) contrasting computer-supported collaborative learning with computer-supported individual learning (i.e., examining the effects of collaboration), (2) contrasting computer-supported collaborative learning with traditional collaborative learning (i.e., examining the effects of computer use), (3) contrasting CSCL supported by extra learning environments or tools, or strategies with CSCL (i.e., examining the effects of the use of extra learning environments or tools, or supporting strategies under the condition of CSCL), and (4) comparing different learning environments or tools, or supporting strategies. As studies in the fourth category vary significantly in their interventions or treatment employed, they were not analyzed for mean effect size or homogeneity statistics. Furthermore, the learning environments or tools include seven major sub-categories: basic online discussion tools, enhanced online discussion tools, visual representation tools, group awareness tools, graphs or multimedia for instruction, adaptive or intelligent systems or environments, and virtual environments; the main supporting strategies include: teacher's facilitation, peer assessment or peer feedback, role assignment, and instruction and guidance (see Table 2 for detailed descriptions of the sub-categories of learning environments/tools or supporting strategies) (Chen et al., 2018). Help sustain group discourse and promote students' social interaction by providing guidance such as scripts.
Dynamic collaboration script, discussion script, social script, epistemic script, advice, instruction on effective communication

Statistical methods
The statistical analyses referred to the statistical methods used in practical meta-analysis (Lipsey & Wilson, 2001). Effect size is usually used to represent the effectiveness of an intervention, and its indices can be Cohen's d and Hedges's g. Firstly, Cohen's d for each separate study was calculated. Yet, due to the small sample size upward bias of Cohen's d, in this study, it was converted to Hedges's g. After the effect sizes for all selected individual studies were calculated, they were then synthesized to produce the weighted mean effect size for each outcome with the use of the random effects model. In addition, the significance of weighted mean effect size is checked by its 95% confidence interval.
In the current meta-analysis, educational level and subject area were examined as moderators through between-group homogeneity (QB) and within-group homogeneity (QW), as the effects across different educational levels and subject areas are important for educators wishing to implement CSCL. QB examines the homogeneity of effect sizes across groups, and its statistical significane indicatesthe the significant impact of the potential moderator on the variance across groups. Similarly, QW, tests the homogeneity of effect sizes within each group, and it is only accurate when there are more than10 studies in each group. In this study, moderator analysis was only performed for knowledge achievement due to the small number of studies reporting other learning outcomes.

Results
There were 425 studies that were selected based on the inclusion/exclusion criteria, the same as those analyzed by Chen et al. (2018). Among them, 84 examined the effects of collaborative learning (corresponding to Research Question 1); 71 examined the effects of computer use (corresponding to Research Question 2); 193 were categorized into category 3 (corresponding to Research Question 3), with 142 examining the tools or strategies listed in Table 2; and 77 compared two or more different tools or strategies.

Moderating effects of educational Level and subject area on the effectiveness of collaborative learning (RQ1)
Research Question 1 investigates the moderating effects of educational level and subject area on the effectiveness of collaborative learning. Due to the relatively small number of studies reporting skills at each educational level (i.e., 2 at primary level, 2 at secondary level, 12 at university level, and 1 at adult level) and/or perceptions (i.e., 4 at primary level, 1 at secondary level, and 21 at university level), moderator analysis was only performed for knowledge achievement (as it is only accurate when there are more than 10 studies in each group). Table 3 presents the results of moderator analysis of educational level (including within-group homogeneity statistics QW, and between-group homogeneity statistics QB), as well as the total number of participants involved at each educational level, the total number of studies included, the mean effect size , 95% confidence interval. The homogeneity analysis shows no significant variability between the different educational levels (QB = 1.04, df = 3). The effect sizes are 0.52, 0.37, 0.43, 0.75 for the primary, secondary, university, and adult educational levels, respectively. In addition, although moderator analysis was not conducted for skills and perceptions, it was found that studies reporting these two outcomes were mostly conducted at the university level. At this level, the mean effect size was 0.37 (95% CI[0.02, 0.72], k = 12) for skills measure and 0.36 (95% CI[0.18, 0.54], k = 21) for learners' perceptions.
Regarding the moderating effects of subject area, moderator analysis was only performed for knowledge achievement due to the relatively small number of studies reporting skills for different subject areas (i.e., 7 on engineering, 2 on language, 1 on medicine, 4 on science, and 3 on social science) and/or perceptions (i.e., 1 on business, 11 studies on engineering, 3 on language, 2 on medicine, 5 on science, and 4 on social science). Table 4 presents the results of moderator analysis of subject area (including within-group homogeneity statistics QW, and between-group homogeneity statistics QB), as well as the total number of participants involved on each subject, the number of studies, the mean effect size , 95% confidence interval. The results of homogeneity analysis show that there was no significant variability between the different educational levels (QB = 2.84, df = 5, p > .05). The effect sizes are 0.75 for business, 0.38 for engineering, 0.48 for language, 0.40 for medicine, 0.42 for science, and 0.67 for social science, respectively. Among the 84 studies, most were for engineering (k = 32) and science education (k = 28). In addition, studies on engineering education had a mean effect size  Note. P = number of participants. k = number of independent studies analyzed for knowledge achievement. = weighted mean effect size. CI = confidence interval. QW = within-group homogeneity statistics. QB = between-group homogeneity statistics. Effect sizes for several cells are not reported because no selected studies reported such data for that outcome measure. *p < .05. = weighted mean effect size, in which N indicates that the effect size is nonsignificant at 95% confidence interval. CI = confidence interval. QW = within-group homogeneity statistics. QB = between-group homogeneity statistics. Effect sizes for several cells are not reported because no selected studies reported such data for that outcome measure. *p < .05. Table 3, the effect sizes varied across studies within the primary and university educational levels, it is thus useful to know the distribution of effect sizes of different subject areas at each educational level. At the primary school level, the extracted studies were conducted in language (k = 6) or science/mathematics (k = 6) courses, and produced statistically significant effects on knowledge achievement (ES = 0.46 and 0.57, respectively). At the secondary school level, almost all the selected studies were conducted in science or math courses (k = 14), such as physics and chemistry, and produced an effect size of 0.39 for knowledge achievement, suggesting its effectiveness in secondary level science learning. At the university level, CSCL was more often examined in engineering courses and was quite effective (with an effect size of 0.38, k = 27), especially in computing-related courses (about 20 studies).

Moderating effects of educational level and subject area on the effectiveness of computer use (RQ2)
Research Question 2 explores the moderating effects of educational level and subject area on the effectiveness of computer use. Similar to the moderator analysis for Research Question 1, due to the relatively small number of available studies reporting skills at each educational level (i.e., 1 at pre-school level, 3 at primary level, 1 at secondary level, and 4 at university level), for perceptions (i.e., 5 at primary level, 2 at secondary level, 15 at university level, and 1 at adult level), for group task performance (i.e., 3 at primary level, 10 at university level, and 1 at adult level), and for social interaction (i.e., 2 at primary level, and 10 at university level), moderator analysis was only performed for knowledge achievement. Table 5 presents the results of moderator analysis of educational level (including within-group homogeneity statistics QW, and between-group homogeneity statistics QB), as well as the total number of participants involved at each educational level, the total number of studies, the mean effect size , 95% confidence interval. The between-group homogeneity statistics QB with a value of 4.07 (df = 4) shows no significant variance between the different educational levels. However, there is significant variance in effect sizes within the university level. The effect sizes are 0.58, 0.60, 0.42, and 0.33 for the pre-school, primary, secondary, and university levels, respectively. Most of the included studies were conducted at the primary school (k = 22) and university (k = 37). In addition, although moderator analysis was not conducted for skills, perceptions, group task performance, and social interaction, it was found that studies reporting these outcomes were mostly conducted at the university level. At this level, the mean effect size was 0.56 (k = 4) for skills measure, 0.45 (k = 15) for perceptions, 0.90 (k = 10) for group task performance, and 0.61 for social interaction (k = 3).
Due to the relatively small number of available sample studies reporting skills for different subject areas (e.g., 2 on social science), perceptions (e.g., 1 on business), group task performance (e.g., 1 on art, 4 on science), and social interaction (e.g., 1 on engineering, no study on social science), moderator analysis was only performed for knowledge achievement. Table 6 presents the results of moderator analysis of subject area, as well as the total number of participants involved on each subject, the total number of studies included, the mean effect size , 95% confidence interval. The results of homogeneity analysis show that there was no significant variability between the different subject areas (QB = 6.92, df = 7, p > .05). The effect sizes are 0.49 for art, 0.41 for business, 0.13 for engineering, 0.62 for language, 0.33 for medicine, 0.53 for science, and 0.29 for social science, respectively. Among the 65 studies, most were for language (k = 14), science (k = 26), and social science education (k = 11). Note. P = total number of participants. k = number of independent studies analyzed for knowledge achievement. = weighted mean effect size. CI = confidence interval. QW = within-group homogeneity statistics. QB = between-group homogeneity statistics. Effect sizes for several cells are not reported because no selected studies reported such data for that outcome measure. *p < .05. Note. P = total number of participants. k = number of independent studies analyzed for knowledge achievement. = weighted mean effect size, in which N indicates that the effect size is nonsignificant at 95% confidence interval. CI = confidence interval. QW = within-group homogeneity statistics. QB = between-group homogeneity statistics. Effect sizes for several cells are not reported because no selected studies reported such data for that outcome measure. *p < .05.

Moderating effects of educational level and subject area on the effectiveness of using extra learning environments or tools, and supporting strategies (RQ3)
Learning environments or tools. In total, 62 studies examined the moderating effects of educational level and subject area on the effectiveness of using extra learning environments or tools on knowledge achievement.
Similar to the moderator analysis for Research Questions 1 and 2, due to the relatively small number of available sample studies reporting skills, perceptions, group task performance, and social interaction, moderator analysis was only performed for knowledge achievement. Table 7 presents the results of moderator analysis of educational level on the effectiveness of extra learning environments or tools. The homogeneity analysis shows no significant variability between the different educational levels (QB = 6.29, df = 2) or within each educational level. The effect sizes are 0.56, 0.69, and 0.51 for the primary, secondary, and university levels, respectively. Most of the included studies were conducted at the university level (k = 43). Note. P = total number of participants. k = number of independent studies analyzed for knowledge achievement. = weighted mean effect size. CI = confidence interval. QW = within-group homogeneity statistics. QB = between-group homogeneity statistics. Effect sizes for several cells are not reported because no selected studies reported such data for that outcome measure. *p < .05. Table 8 presents the results of moderator analysis of subject area on the effectiveness of extra learning environments or tools. The homogeneity analysis suggests significant variability between the different subjects (QB = 13.51, df = 5). The effect sizes are 1.26 for business, 0.52 for engineering, 0.61 for language, 0.29 for medicine, 0.59 for science, and 0.33 for social science, respectively. Taken into account the number of selected studies for each subject (e.g., k > 10), larger effect sizes were found for engineering and science courses. Moreover, most studies were for engineering (k = 20), science (k = 19), and social science education (k = 15).
In total, there were 10 studies reporting student skill acquisition and produced a mean effect size of 0.79, among which 7 studies were conducted at the university level and produced a mean effect size of 0.  Note. P = total number of participants. k = number of independent studies analyzed for knowledge achievement. = weighted mean effect size, in which N indicates that the effect size is nonsignificant at 95% confidence interval. CI = confidence interval. QW = within-group homogeneity statistics. QB = between-group homogeneity statistics. Effect sizes for several cells are not reported because no selected studies reported such data for that outcome measure. *p < .05.

Supporting strategy.
A total of 42 studies examined he moderating effects of educational level and subject area on the effectiveness of using extra supporting strategies on knowledge achievement. Table 9 presents the results of moderator analysis of educational level on the effectiveness of extra supporting strategies. The between-group homogeneity analysis shows no significant variability between the different educational levels (QB = 2.18, df = 2). Yet, there was significant variability within the university level (QW = 48.89). The effect sizes are 0.34, 0.37, and 0.39 for the primary, secondary, and university levels, respectively. Most of the included studies were conducted at the university level (k = 35). Table 10 presents the results of moderator analysis of subject area on the effectiveness of extra supporting strategies. The results of homogeneity analysis show that there was significant variability between the different subjects (QB = 10.57, df = 4, p < .05). The effect sizes are 0.16 for business, 0.30 for engineering, 0.69 for medicine, 0.45 for science, and 0.40 for social science, respectively. Taken into account the number of selected studies for each subject (e.g., k > 10), larger effect sizes were found for science and social science courses. Moreover, most studies were for engineering (k = 14), science (k = 12), and social science education (k = 13). Note. P = total number of participants. k = number of independent studies analyzed for knowledge achievement. = weighted mean effect size, in which N indicates that the effect size is nonsignificant at 95% confidence interval. CI = confidence interval. QW = within-group homogeneity statistics. QB = between-group homogeneity statistics. Effect sizes for several cells are not reported because no selected studies reported such data for that outcome measure. *p < .05. Note. P = total number of participants. k = number of independent studies analyzed for knowledge achievement. = weighted mean effect size, in which N indicates that the effect size is nonsignificant at 95% confidence interval. CI = confidence interval. QW = within-group homogeneity statistics. QB = between-group homogeneity statistics. Effect sizes for several cells are not reported because no selected studies reported such data for that outcome measure. *p < .05.
Within the university level, most studies were for engineering (k = 14), and produced a mean effect size of 0.30 (95% CI[0.12, 0.48]); 13 on social science yielding a mean effect size of 0.40 (95% CI[0.08, 0.71]); 5 on science yielding a mean effect size of 0.59 (95% CI[0.11, 1.07]); 2 on medicine yielding a mean effect size of 0.69 (95% CI[0.11, 1.26]); and 1 on business and produced a mean effect size of 0.16. Taken into account the number of selected studies for each subject (e.g., k > 10), larger effect sizes were found for social science courses.
Among the 15 studies reporting student skill acquisition, 9 were conducted at the university level and produced an effect size of 0.76 (95% CI[0.12, 1.40]); 3 at the secondary level and produced an effect size of 0.24 (95% CI[-0.01, 0.51]); 2 at the primary level and 1 at the adult level. Among the 26 studies reporting student perceptions, 22 were conducted at the university level and produced an effect size of 0.19 (95% CI[0.01, 0.37]). Among the 21 studies reporting group task performance, 14 were conducted at the university level yielding an effect size of 0.57 (95% CI[0.26, 0.88]); and 6 at the secondary level yielding an effect size of 0.04 (95% CI [-0.18, 0.27]). Among the 28 studies reporting social interaction, 21 were conducted at the university level yielding an effect size of 0.64 (95% CI[0.45, 0.83]); 5 at the secondary level yielding an effect size of 0.38 (95% CI[-0.01, 0.77]); and 2 at the secondary level yielding an effect size of 0.49 (95% CI[-0.41, 1.39]).

Moderating effects of educational level and subject area on the effectiveness of collaborative learning (Research Question 1)
Homogeneity statistics reveal that there was nonsignificant variability in the effect sizes of the different educational levels, with regard to individual knowledge achievement. This finding confirms that of Lou, Abrami, and d'Apollonia (2001), who found that achievement outcomes across primary, secondary, and postsecondary levels were relatively equal, but is not consistent with Jeong et al. (2016) who concluded that K-12 learners benefited most from CSCL. This inconsistency might relate to one limitation of Jeong et al's study which combined the effect sizes for different outcomes whereas in the current study, the outcomes were very specific. Here, the mean effect size was medium (g = 0.52) for the primary level, and small to medium for the university (g = 0.43) and secondary school levels (g = 0.37). The effect size for university was similar to the results of Springer, Stanne, and Donovan (1999) who compared the traditional small-group learning of undergraduate students with individual learning and reported an effect size of 0.51 for achievement. At first look it seems that primary school students may benefit more (see the magnitude of effect size); however, these effects must be interpreted with caution because the studies were distributed unevenly across different educational levels.
Also, there was no significant variance between different subject areas. Considering the relatively small sample studies in some subjects such as business (<10), the statistical power might be lowered unless there are a large number of studies for each subject area. Therefore, the findings should be cautiously interpreted.
Moreover, it is thus useful to know the distribution of effect sizes of different subject areas at each educational level. At the primary school level, the extracted studies were conducted in language or science/mathematics courses. The effects of collaborative learning on primary school students' language and mathematics learning have been illustrated in the studies of Slavin, Lake, Chambers, Cheung, and Davis (2009) (ES = 0.21 for reading) and Slavin and Lake (2008) (ES = 0.29 for mathematics). At the university level, CSCL was more often examined in computing-related courses. Particularly, pair programming with groups of two students is widely used in these courses (e.g., Cavus, Uzunboylu, & Ibrahim, 2007) whereas larger groups are often used at other levels. Possibly, these courses are more appropriate for students to perform CL, or maybe it is that these courses are more convenient (e.g., the availability of connected computers in such subject areas). In any event, the motive is seldom given.

Moderating effects of educational level and subject area on the effectiveness of computer use (Research Question 2)
Regarding the effectiveness of computer use in collaborative learning, there is no significant difference in effect sizes across educational levels or different subject areas.
Considering the relatively small number of available sample studies on some subjects such as medicine (<10), the statistical power might be lowered. Therefore, more research is needed, and the current results for subjects with a small number of sample studies must be interpreted with extreme caution.
Regarding the interaction between educational level and subject area, at the primary school level, the use of computers in CL was quite effective in language and science learning (mainly mathematics), with significant medium effect sizes. In particular, digital reading (e.g., reading annotation systems) and writing systems (e.g., Google Drive or Google Sites) have shown substantial advantages in terms of enhancing elementary students' reading comprehension and writing performance by facilitating their knowledge organization, formative feedback, and monitoring of progress (Chen & Chen, 2014;Genlott & Grönlund, 2016;Zurita & Nussbaum, 2004). Also, computer games appear to promote the learning of mathematics concepts and problem-solving performance (Hwang & Hu, 2013). At the middle school level, web-based learning systems were employed to foster science literacy (e.g., Frailich, Kesner, & Hofstein, 2009). At the university level, computer use was also quite effective in language and science learning. However, the effect sizes for knowledge gain failed to reach statistical significance for engineering, medicine, and social sciences. These results should be cautiously interpreted with caution due to the relatively small number of available sample studies on these subjects, indicating that more empirical studies are needed. On the other hand, this may be due to the fact that in such disciplines, technology support mainly focuses on promoting skill acquisition or the completion of group tasks (Kwok, Ma, & Vogel, 2002), rather than on knowledge acquisition. With respect to the number of available studies in university level engineering subjects, in particular, there are fewer studies (only 6) examining the effects of computer use, which is quite different from the large number exploring the effects of collaborative learning. A plausible explanation is that computer use is quite common in engineering education, so there seems no need to examine whether or not to use computers. On the other hand, relatively more studies were conducted in university level science and social science subjects. It is possible that instructors in these subjects are trying to integrate computer use in their courses so as to improve learning outcomes. Furthermore, computer use varies across subjects. For example, online learning forums are more often used in social science courses (e.g., Roseth, Saltarelli, & Glass, 2011) while more dedicated subject area applications such as virtual laboratories or simulations are generally used in engineering or technology courses (e.g., Corter, Esche, Chassapis, Ma, & Nickerson, 2011).

Moderating effects of educational level and subject area on the effectiveness of using extra learning environments or tools, and supporting strategies (Research Question 3)
Regarding the moderating effects of educational level on the effectiveness of extra learning environments or tools, the findings suggested no significant variance between the different educational levels or within each educational level. However, one interesting finding is that most of the included studies were conducted at the university level.
Regarding the moderating effects of subject area on the effectiveness of extra learning environments or tools, the findings indicated significant variability between the different subjects; larger effect sizes were found for engineering and science courses when taking the statistical power into account (i.e., the number of selected studies for each subject); and most studies were for engineering, science, and social science education. For example, selected studies on virtual environments such as digital games were mainly on science course and produced quite positive effects (e.g., Chiang, Yang, & Hwang, 2014).
With respect to the effects on other learning outcomes including skill acquisition, group task performance, social interaction, and perceptions, a majority of the selected studies were also conducted at the university level and produced positive effects on these outcomes.
The explanation might be that these learning environments or tools such as basic online discussion tools, enhanced online discussion tools, and group awareness tools are more easily adapted by university students than by primary and secondary school students. Other tools such as visual representation tools and virtual environments can be manipulated by both secondary and university level students (Wang, Cheng, Chen, Mercer, & Kirschner, 2017;Wu & Wang, 2012;Yuan, Wang, Kushniruk, & Peng, 2016).
Regarding the moderating effects of educational level on the effectiveness of extra supporting strategies, the results showed no significant variability between the different educational levels, although most selected studies were conducted at the university level. This suggested that the use of extra supporting strategies such as role assignment was adopted for university students only. Regarding the moderating effects of subject area on the effectiveness of extra supporting strategies, there was significant variability between the different subject areas; larger effect sizes were found for science and social science courses when taking into account the statistical power (i.e., the number of selected studies for each subject). At the university level, most studies were for engineering and social science; and larger effect sizes were found for social science courses. Similarly, studies reporting other learning outcomes such as skills acquisition were also mostly conducted at the university level. The use of extra supporting strategies such as peer assessment or peer feedback, and role assignment was applicable among university students.

Conclusion
The positive effects of computer-supported collaborative learning on students' learning outcomes and processes have been widely reported in individual empirical studies and meta-analyses. More specifically, the effects found were mostly attributed to the three main elements of CSCL including collaborative learning, computer use, extra learning environments/tools or extra supporting strategies. This study extends the prior metaanalysis by examining the moderating effects of educational level and subject area on the effectiveness of CSCL. The moderating effects of educational level were found not to be significant on the effectiveness of collaborative learning, computer use, extra learning environments or tools, or extra supporting strategies with respect to student knowledge achievement; and subject area was found to be a significant moderator for the effectiveness of extra learning environments or tools, and extra supporting strategies.
While CSCL has been applied at all educational levels (from pre-school to adult), more studies have been conducted at the university level. Regarding the distribution of the selected studies among different subject area, the studies were mostly on engineering, science, and social science disciplines, with few studies in art, business, and medicine subject areas.
More specifically, at the primary school level, studies investigating the effects of collaboration and computer use were mostly in language and science (mathematics in particular) subjects and the number of studies was about equally distributed in the two subject areas, showing medium effect sizes. At the secondary school level, the effects of both collaboration and computer use were primarily explored in science courses, with positive yet small effect sizes.
While applying specific learning environments or tools and specific learning strategies have received increased attention in CSCL research and practice, we need to select appropriate learning tools or strategies based on the nature of learning subjects. For example, virtual environments such as digital games or virtual reality are quite suitable for science learning as they can provide situated learning scenarios. In general, we need to consider applying specific learning environments or tools for CSCL of engineering subjects, adopting specific learning strategies for CSCL of social science subjects, and incorporating specific learning strategies as well as specific learning environments or tools for CSCL of science subjects.