An effective method of collecting practical knowledge by presentation of videos and related words

The concentration of practical knowledge and experiential knowledge in the form of collective intelligence (the wisdom of the crowd) is of interest in the area of skill transfer. Previous studies have confirmed that collective intelligence can be formed through the utilization of video annotation systems where knowledge that is recalled while watching videos of work tasks can be assigned in the form of a comment. The knowledge that can be collected is limited, however, to the content that can be depicted in videos, meaning that it is necessary to prepare many videos when collecting knowledge. This paper proposes a method for expanding the scope of recall from the same video through the automatic generation and simultaneous display of related words and video scenes. Further, the validity of the proposed method is empirically illustrated through the example of a field experiment related to mountaineering skills.

The SECI model (Nonaka & Takeuchi, 1995) is known as one way of representing the process whereby the tacit knowledge of organization members is codified (made explicit) and thus further knowledge is created in the continuous process of knowledge discovery and sharing. In this model, there are two types of knowledge (tacit and explicit), and new knowledge is created through the repeated process of tacit knowledge being transformed into explicit knowledge and explicit knowledge being subsequently transformed into new tacit knowledge. Furthermore, the proactive utilization of video content has been shown to be effective in the context of advanced skills and practical knowledge and the skills required for social activities that cannot be expressed fully in words alone.
As a method of realizing the transformations from tacit to explicit knowledge as described by the SECI model, the authors are developing a video scene linked bulletin board system (BBS) (Shimada, Tsutsuguchi, Kojima, Konishi, & Higashino, 2012). This system combines a video sharing system with a communication system enabled by a BBS, which enables the posting of the knowledge that comes to mind when watching the video and the exchange of opinions on the BBS. By providing a video-based space in which a community can interact over a network, the system enables users to express the tacit knowledge that they possess. The benefits of using video include both the fact that watching scenes of tasks in action induces a simulated experience, which makes it easier to express know-how, and the fact that it is an easy way to share the background of the issue in question. Prior studies have investigated ways to more effectively collect the practical and experiential knowledge that is unevenly distributed among organizational members through annotating videos with related comments (Majima, Shimada, & Maekawa, 2011).
On the other hand, the content of the communication carried out on a video linked bulletin board is strongly constrained by the video scene itself. The types of knowledge that have been collected in prior real-use experiments consist of content that is directly depicted by the videothe steps involved in a task, or the relationships between the locations of people or people and things, for example. There is a tendency that only aspects that are specifically related to the video content can be collected from one video scene and it is necessary to prepare specific videos for each and every kind of knowledge that is to be collected. The problem with providing a large number of videos is that doing so entails a great deal of production time and cost, and places a large burden on the viewer in terms of the number of videos that they are required to watch.
In this context, this paper proposes to synchronize and display information that expresses video scene variation, induce a simulated experience other than that depicted in the video scene, and enable the collection of a wide variety of knowledge from a single video scene. In TV programs, videos, and movies, captions are superimposed on the video. Moreover, captions are often used in such learning videos as those for learning languages and sports. When viewing the captioned video, the understanding of the content is deepened by viewing the video and the caption at the same time (Morton, 2015). Therefore, in order to expand the scope of ideas recalled when watching video, a related word is displayed on the video. As a method to automatically generate information that expresses video scene variation, it is proposed that concepts involving the target of knowledge collection be modeled, and the words that comprise the model be assigned to the video scene as related words.
The following discussion first provides an overview of the proposed method and then presents the results of a field experiment that was carried out to empirically investigate the proposed technique.

Related works
Recently, video sharing sites have become widespread, and many of these sites are linked with a communication function that allows the exchange of opinions on a given video. Streaming video services, such as YouTube, rely on a traditional threaded and text-based commenting system for the whole video. It is possible to increase activity on the site and add value to a video by using a communication function. There are many studies analyzing the comments posted to videos to further strengthen these effects. These studies have examined such issues as opinion classification (Madden, Ruthven, & McMenemy, 2013), clustering of videos (Siersdorfer, Chelaru, Nejdl, & Pedro, 2010), and sentiment analysis (Asghar, Ahmad, Marwat, & Kundi, 2015).
These video sharing sites do not provide a mechanism for video scene annotation. On the other hand, other video sharing sites allow comments to be posted at arbitrary points of the video. Since annotation can be performed on the video scene on such sites, it is possible to post opinions directly related to the content of the video. Therefore, these are suitable for collecting the opinions of viewers of a given scene. Analysis of comments posted to these scenes includes the detection of highlight scenes (Xian, Li, Zhang, & Liao, 2015), impression analysis (Yamamoto & Nakamura, 2013), and topic analysis (Wu, Zhong, Tan, Horner, & Yang, 2014).
On a common video sharing site, the text-based communication function linked with the video is implemented through electronic bulletin boards, chats, blogs, and so on. Therefore, videos and their comments are displayed separately. On the other hand, the Japanese website Niconico (formerly Nico Nico Douga) displays comments superimposed on the video to increase sympathy and excitement. Much of the comment analysis in this system consists of emotional analysis, such as impression estimation and positive / negative opinion extraction (Nakamura & Tanaka, 2009;Ikeda, Kobayashi, Sakaji, & Masuyama, 2015). Since user comments are superimposed on the video, this system is not suitable for collecting and sharing practical knowledge expresses through experiences and the like.
The purpose of the studies described above is to improve the retrieval and recommendation of videos and to promote the activities of the site. This paper deals with the transfer of practical and experiential knowledge within organizations. The video sharing systems used in those studies can be applied to collecting and sharing practical knowledge. Our research shows that knowledge collected by video-based communication is specifically related to the video content. Therefore, in order to collect knowledge, it is necessary to prepare many videos on the various topics. There are no studies on collecting a wide variety of knowledge from a single video scene by allowing the scope of ideas recalled when watching the video to be expanded.

Video scene linked bulletin board system
A video scene linked bulletin board system (BBS) is a web application that links an electronic BBS with video scenes in order to enable a video to be cut into scenes, individual scenes to be viewed, the free posting of comments at any time in the video, and the posting of replies in response to such comments.
The posting of videos on this system is carried out in the following way. First of all, a video file is uploaded. After uploading is complete, video analysis is performed on the server-side. Cuts, camerawork sections, voice and sound sections, and other events where there is a significant change in the video content are automatically detected and displayed on the timeline. Next, individual scene sections where the task process or topic changes are manually defined while referring to this event information. Titles, tags, and other metadata are assigned to the scene sections as necessary. Finally, viewing rights and other access rights are set.
After videos have been posted, they are displayed in a list in a fashion similar to that of video sharing websites on the top page of a site that is accessible to general users. Selecting a video from the list brings up the screen shown in Fig.1. The left side of the screen in Fig.1 is a video playback screen similar to that which can be found on a normal video sharing website. The user clicks on the comment icon when they wish to add an opinion. Clicking the icon pauses the video playback and displays a text input field where a comment can be entered. Clicking the send comment button after entering a comment displays the comment on the BBS on the right-hand side of the screen. The BBS reply function can be used when replying to or discussing the content of a comment.

Fig. 1. Video scene linked bulletin board system -Video viewing screen
Comment data are managed by associating the time on the video when the comment button was clicked; playing the video then displays the comments that relate to the scene that is being played. This means that the BBS on the right of Fig. 1 automatically scrolls. In contrast, by turning the automatic scroll off and manually scrolling, it is possible to focus on the comments only. Further, clicking on a comment starts video playback at the point in the timeline where the comment was posted, thus enabling a deeper understanding of the content of the comment.

Approach
In order to expand the scope of ideas recalled when watching video through the simultaneous display of related words, such related words must meet the following requirements.

•
Words that are not a direct expression of the video scene.
• Words that are related to the video in such a way that enables the sharing of the background or premises when expressing knowledge during the viewing of the video.

•
Words that are as different from the video as possible, in order to expand the imagination.
Accordingly, it is important to select words that are related to the video but have a certain degree of distance from the content of the video. Here, the concepts dealt with in the target field for knowledge collection were modelled and a system was created for the selection of related words that express concepts that are a certain distance from concepts expressed by the content of the video. The first step was to construct an ontology of the conceptual model. Next, concepts that express the content of the video were manually selected from the concepts from the ontology and assigned as video scene metadata. Finally, related words were automatically selected for each video scene by matching the metadata and the concepts from the ontology.

Ontology structure
An ontology is a formal expression of the concepts required to explain the target world and the definition of the relationships between such concepts (Mizoguchi, 2003). Fig. 2 presents an example of an ontology for mountaineering skills. The figure expresses the attributes of the mountain during the snow-less period -"Mountain Route," "Weather," and "Member" -in terms of an "attribute of" relationship and the various other characteristics in terms of either an "is-a" (above-below) hierarchical relationship or a "part-of" (whole-part) concurrent relationship.

Assigning metadata to the video scene manually
When posting a video to the video scene linked bulletin board, keywords that represent the content of the scene are assigned during the scene definition stage. Video scene metadata are selected by specifying concepts from the ontology that are applicable to the video scene. For example, in the case of a video scene that shows someone climbing a rocky ridge on a fine day, the attributes for "Mountain in Snow-Less Period" in Fig. 2 could be the "rocky ridge" component of the "Mountain Route" attribute and the "fine weather" component of the "Weather" attribute, with no applicable component selected for the "Member" attribute. Accordingly, two types of keyword would be assigned as metadata -[/mountain in snow-less period/mountain route/rocky ridge] and [/mountain in snow-less period/weather/fine weather]. So that the location of the keywords in the ontology can be understood, they are registered as absolute paths from the higher order concept downwards, in the same way as directories in a file system are expressed.

Automatic assignation of related words
The related words that are displayed on the video scene are selected by matching the video scene metadata with the ontology. The words in the lower word group of the word assigned as metadata were set as the related words.
For example, for the case of a video scene assigned with the metadata [/mountain in snow-less period/mountain route/rocky ridge], the components under /mountain in snow-less period/mountain route/rocky ridge ([scree slope], [fixed rope], [via ferrata]) in Fig. 2 could be selected. Simultaneously, the words below /mountain in snow-less period/weather/fair weather could be selected, and the selected word group would be defined as the related words for the video scene.

Experiment method
For the target field of mountaineering skills, an investigation was conducted to ascertain whether or not the assignation of related words to the video would broaden the recall of knowledge and whether or not practical knowledge could be efficiently collected. The experimental investigation was performed as follows. Utilizing the video scene linked bulletin board shown in Fig. 1, a community of mountaineers was invited to share their opinions and the knowledge of individual members was collected. The video was shown both with and without related words and the comments that were posted were compared. As it was difficult to prepare two homogenous groups, the evaluation was carried out on a single group using a before-after comparison. As learning effects may have an influence on before-after comparisons, the number of comments or the number of characters posted were not taken into account, and instead quality analysis of comments and a subjective evaluation were utilized whereby the questions did not depend on aspects such as usage time and order of usage.

Video
Video footage, which was recorded sporadically during movement from entering a mountain area during a snow-less period through to the descent, was edited into an approximately 20-minute video that depicted the activities in the order that they occurred. Sixteen scenes were defined, and words from the ontology were manually assigned to the metadata for each scene.

Users
The users comprised a group of 10 mountaineers with experience of exchanging comments on blogs or bulletin boards. Their ages ranged from individuals in their 20s to those in their 60s. All had rock climbing experience but the main type of activity that participants engaged in included a wide range of activitiesfrom only low mountains during no-snow periods or only walking mountain ridges, through to individuals involved in comprehensive pursuits including rock climbing and climbing mountains during periods of heavy snow.

Mountaineering ontology
With the cooperation of two mountain guides, an ontology for mountains in snow-less periods was constructed. Management item concepts were modelled from the perspective of collecting knowledge related to mountaineering safety for mountaineering activities involving the climbing of mountains in Japan ranging in height from 1,500m to 3,000m.

Experimental site
An experimental website for the provision of the video scene linked bulletin board was created. Accounts were made for all users and the experimental site could be accessed from anywhere with an internet connection. So that individual users could not be identified, a number was assigned to each user account and that number was used as the user name displayed when posting a comment.

Process
Operation of the video scene linked bulletin board was divided into two rounds. In Round 1, only the video was displayed, with no related words. Conversely, in Round 2, related words were displayed as overlays to the video along with the video playback. Fig. 3 presents example screens from both rounds. Both Round 1 and Round 2 operated for two weeks, with a one-week break between rounds. During Round 2, it was not possible to access the BBS from Round 1. A testing period was held before the start of Round 1 where all users were supplied with a sample video and instructed to post comments in order to test the site. This test period lasted one week.

Directions to users
Users were given the following instructions.
(1) Access the experimental site any time you feel like it during the operation period.
(2) Within the context of mountaineering safety, if, while viewing the video, you recall any past experiences or things that you always try to be aware of, or find something of note in the video scene, add it as a comment.
(3) Reply to any comments that other people have made if you have an opinion or some thoughts about the subject matter.
The above three points were the only instructions that were given to the participants. No detailed directions regarding comment content or any facilitation in order to promote commenting was carried out. In this way, participant use of the system was based upon their own intentions. Further, for Round 2, participants were instructed to disregard any comment exchanges that had occurred in Round 1 and begin their commenting activities afresh.

Posting of comments
Comments consisted of parent comments, which were new comments in response to the video, similar to starting a thread on a BBS, and reply comments, which expressed opinions on the parent comments. In Round 1, there were a total of 58 comments consisting of 32 parent comments and 26 reply comments. In Round 2, there were a total of 101 comments, consisting of 52 parent comments and 49 reply comments. Scene 2 is a scene in which people are resting. The metadata assigned to this scene were [mountain in snow-less period/rest]. The automatically selected related words were [eating and drinking, physical condition/equipment/time management, weather/current location/understanding of route]. The comments posted in Round 1 discussed experiences while taking a rest during mountaineering, while the comments in Round 2 focused on the related words of current location and checking one's route.
Scene 3 depicts the traversing of a slope upon which some snow still remains. The metadata assigned to this scene were [mountain in snow-less period/mountain route/trail/traverse] and [mountain in snow-less period/weather/cloudy]. The automatically selected related words were [precipice, remaining snow, slipping, gas, bad weather, temperature]. The Round 1 comments consisted of opinions regarding the behavior depicted in the video scene, while the comments posted in Round 2 concerned the related word of bad weather.
The above shows that in Round 2, many comments were posted in response to the related words that were displayed on the video. Fig. 4 depicts examples where comments are made on themes that are not depicted in the video, but in Round 2, there were also many comments that were the same as those made in Round 1. ・In situations where the topography cannot be read and the path ahead cannot be seen, there have been times when it has taken me a long time to reach my destination with nervous excitement.
・Even when on small mountains, in addition to a map, a small compass that you can wear on your finger (like ones you see in competitive events) can be useful so I always carry one, along with a larger compass. I always draw a line on my topographic maps so that I know where due north is. Once, when I realized the map that someone had given me was not in my pocket, I got very nervous. I now always put my map in a place where I definitely won't forget about it.
・Looking at the video, it appears that the slope is quite steep and that it is a slippery and dangerous mountain route. Even so, there is no need for extra safety measures.
・Once, when I was taking a break on a climbing route, I leant gently against a rock which then suddenly shifted, tilting by about 15 degrees. I was shocked. It was lucky that it didn't tumble down on to the path below.
・Always carry a bell when going to a mountain where bears might appear. I have never encountered a bear at close quarters but I have seen one at a distance several times and have come across bear droppings before. In places where a mamushi viper could be present, I always proceed by waving my tramping pole in front of my legs. I am also aware of wasps when walking in lower mountains from summer through to autumn. Wasps attack black objects so it is a good idea to wear white clothing. ・I don't carry a bell but I also don't want to encounter a bear. As such, I always investigate bear territory when planning. When aggressive wasps are nearby, I walk without waving my hands as I've heard that they attack moving objects.
・Even if you have a compass and a map, there are details that are not on the map. As such, I've been in situations where I didn't known my current location. Using the altimeter on my watch I was able to kind of estimate where I was. Even if you collect a lot of information when visiting a mountain for the first time, you always encounter a range of conditions and accumulate experience with butterflies in your stomach.
・Strong gusts of wind are particularly scary when walking in these areas.

Validity of posted comments
The purpose of this study is to improve the collection of practical knowledge. We determined whether posted comments are consistent with this purpose. In order to ascertain whether or not the posted comments were valid and contained valuable information, the following test was carried out. Two professional mountain guides evaluated the value and validity of the posted comments and categorized them into the six categories displayed in Table 1. As shown in Table 1, the number of invalid comments in both rounds was extremely low, with two in Round 1 and six in Round 2. Accordingly, almost all comments could be considered valid. Regarding the invalid comments, those in the "Nothing" category were simply interjections indicating that the user was paying attention and the ones in the "Not Related" category were opinions relating to the experimental system.
The above results demonstrate that it was possible to collect the experiential and practical knowledge of mountaineers in both the case of the normal video linked bulletin board where related words were not displayed and the case where related words were displayed simultaneous to the video being played back. Of the valid comments, there were 31 parent comments and 25 reply comments for a total of 56 comments in Round 1, and 51 parent comments and 44 reply comments for a total of 95 comments in Round 2. The following analysis investigates these valid comments.

Content analysis of comments
In our proposed method, related words are superimposed on the video so that practical knowledge can be more efficiently collected. We determined the effect of displaying related words.

Degree of relatedness of comments to video
The relationship between the themes expressed in the posted comments and the content of the video scene was investigated to determine whether posted comments were recalled from the video scene or from related words. Due to the fact that all comments were determined by the parent comment, only the parent comments were included in the analysis. Of all parent comments, 56 were valid in Round 1 and 95 were valid in Round 2. These comments were classified in terms of their degree of relatedness to the relevant video scenes in terms of the following four categories.
Classification Categories: A: Exactly matches the video content B: Close to the video content C: Theme that is not depicted in the video scene but is within a scope that can be imagined from looking at the video D: Differs greatly from the video The method of classification was as follows. First, the comment themes were extracted and then a comparison of the theme and the content of the video scene was carried out independently by two assessors. Subsequently, a third assessor was added and the validity of the result of the classification was debated and thus decided upon. Fig. 5 presents the results of the classification. The figure shows the percentages for each of the four categories. In Round 1, almost all of the comments were classified as A, B, or C (in that order), with A and B comprising almost 85% of the total number of comments. In contrast, in Round 2, A and B comprised approximately 58% of all comments. Further, while category C comments comprised 12% of all comments in Round 1, this percentage rose to 34% in Round 2. The majority of comment themes in category C comments in Round 2 resembled the related words. The percentage of category D comments rose from 3% in Round 1 to close to 8% in Round 2. These results demonstrate that displaying related words on the video in Round 2 expanded the scope of recall past that which was depicted in the video scene itself.

Comment vocabulary
We then determined whether topics were expanded by displaying related words. For this purpose, the amount of vocabulary used in posted comments was analyzed. A morphological analysis of value comments (Round 1: 56, Round 2: 95) was performed using SPSS, and nouns were extracted. Additionally, mountaineering-related words were manually extracted from the extracted nouns. This resulted in a vocabulary of 78 words for Round 1 and 196 words for Round 2, with a common vocabulary of 27 words. The fact that terminology specific to mountaineering increased by more than twice in Round 2 provides evidence of the expansion of themes in Round 2.

Discussion
Round 1 was administered first, then Round 2. The details of this procedure are as follows. There was a one-week break between Round 1 and Round 2. For Round 2, participants were instructed to disregard any comment exchanges that had occurred in Round 1 and begin their commenting activities afresh. By doing so, it was expected that Round 1 and Round 2 would be close to independent. As shown in Fig. 5, in both Round 1 and Round 2, category A comments with contents directly related to video are the most common. From this result, the main action in both Rounds is considered to be the posting of recalled practical knowledge when watching the video. Furthermore, in Round 2 participants responded to related words in addition to the video, and as a result, it may be assumed that various kinds of practical knowledge could be collected. If Round 1 is administered after Round 2, it is expected that most of practical knowledge collected in Round 1 will be recalled from the video, since the related words displayed in the video in Round 2 are expected to have been forgotten.
We conducted small experiments with ten subjects. The relationship between the number of participants and experimental results is as follows. In the proposed method, when a participant posts a comment, there are the following two possible actions or paths.

Path 1: view -write
He or she posts a parent comment after watching videos or related words.

If interested in others' comments (watching the video as necessary), he or she replies to comments or parent comments.
Since Path 1 is one-to-one communication between the participant and the video, the content of the posted comment does not depend on the number of participants. Since Path 2 is communication within the community, it is expected that as the number of participants increases, communication will be increasingly activated, and the topic will expand. Therefore, while the effect of Path 1 is the same regardless of the number of participants, the effect of Path 2 increases as the number of participants increases. From the above, it can be assumed that various types of practical knowledge can be collected from the same video by presenting related words in the video, even in a large community. Large-scale communities can be expected to have further effects.
The verification of the estimates of the influence of the procedure between Rounds 1 and 2 and the scale of community is a subject for future research.

Subjective evaluation
After completion of Round 2, the use of the video scene linked bulletin board in Round 1 and Round 2 was compared using a web-based survey. The five questionnaire items that were investigated are shown in Table 2. Fig. 6 presents the means and standard deviations of the results of the subjective evaluation carried out by the 10 participants. For question Q1, no respondents reported feeling that the related words felt out of place, two reported no feelings of discomfort, and eight reported no strong feelings in either direction. The reason that only a small number of participants felt no feeling of discomfort can be surmised as follows. While it is true that subtitles and captions in media such as television programs are the usual form of simultaneous video playback and word display, such subtitles or captions generally simply describe or explain what is happening in the video. For this reason, it can be said that the related words selected by the proposed methods did not directly express the video content exactly as it was shown on screen. In question Q2, over half of the respondents reported that the related words matched the video and no respondents reported that they did not match. This result implies that words related to the video content were selected. The above two results imply that the proposed method resulted in the selection of words that were related to, but to a certain degree distant from, the video content. With regard to question Q3, 80 percent of users reported an expansion and the remaining 20 percent reported no strong feelings in either direction. This demonstrates that recall was expanded through the display of related words. Furthermore, the results for Q4 demonstrated that the presence of related words was connected with an understanding of the posted comments. The above results confirm that the related words that were selected by the proposed method effectively functioned in prompting the expression of practical and experiential knowledge.

Conclusion
In the context of methods for collecting the practical and experiential knowledge possessed by users through use of video, this paper proposed a method for expanding the scope of recall possible from a video by displaying words related to the video scene. The proposed method involves expressing the concepts for the target field for knowledge collection as an ontology, manually selecting concepts from the ontology and applying them as metadata to the video scene, and then automatically selecting related words by matching the metadata and ontology concepts for each scene.
By posting video footage of mountaineering activities on a video scene linked bulletin board and having 10 mountaineers engage in the exchange of opinions, an experiment involving the collection of the knowledge that such participants possess was conducted. Two cases of commenting while watching video were comparedone where only the video was shown, and one where related words were displayed simultaneous to the video. The results demonstrated that valid comments containing practical and experimental knowledge could be collected in both cases. Further, it was confirmed that in the case where related words were displayed along with video, the proposed method resulted in the appropriate selection of related words, expanding the scope of recall when viewing the video, and more effectively promoting the expression of the knowledge that the users possess.
In the proposed method, related words are generated through ontology. Therefore, the content of the collected practical knowledge can be controlled by customizing the ontology. For example, if knowledge of mountaineering gear and tactics is required, ontologies on gears and tactics can be prepared. The content of conventional video-based communication is constrained by the video scene. Our proposed method clarified that the communication content can also be controlled by the related words displayed in the video. It is difficult to edit the video scene, but changing the related words is easy. We have established a mechanism to control the communication content and easily collect the required practical knowledge.
In the experiments, Round 1 (no related words) was administered first, then Round 2 with related words. It is assumed that there is no influence from the order in which Round 1 and Round 2 are performed. Furthermore, when performed in a large community, a further effect can be expected from displaying related comments in the video. This could not be confirmed in this study because the experiment was small. These hypotheses will be verified by building a practical system. Future research directions include investigation of methods of user training that utilize practical knowledge that has been collected using the proposed method.