CN112241462B

CN112241462B - Knowledge point mark generation system and method thereof

Info

Publication number: CN112241462B
Application number: CN201910646422.9A
Authority: CN
Inventors: 郑旭成
Original assignee: Wisdom Garden Hong Kong Ltd
Current assignee: Wisdom Garden Hong Kong Ltd
Priority date: 2019-07-17
Filing date: 2019-07-17
Publication date: 2024-04-23
Anticipated expiration: 2039-07-17
Also published as: CN112241462A

Abstract

A knowledge point mark generation system and method thereof, wherein at least one first key word emphasized in a classroom by text, at least one second key word emphasized in a classroom by sound, at least one first candidate word repeatedly appearing in a classroom by text, and at least one second candidate word repeatedly appearing in a classroom by sound are analyzed according to their corresponding weights to obtain label words, and corresponding to the time segment in which the label words appear, knowledge point marks are set on the time axis of the audio and video files shot in the classroom to form an audio and video file with knowledge point marks. Therefore, learners can understand the knowledge points of the class and the segments in which they exist without browsing the audio and video files of the entire class, which is convenient for learners to focus on learning or reviewing.

Description

Knowledge point tag generation system and method

技术领域Technical Field

本发明涉及一种标记生成系统及其方法，特别是知识点标记生成系统及其方法。The present invention relates to a tag generation system and method thereof, in particular to a knowledge point tag generation system and method thereof.

背景技术Background technique

随着科技的进步以及网路的发展，学习者可以在课堂结束后通过教学时所录制的影音档案进行学习或复习。With the advancement of technology and the development of the Internet, learners can study or review through the audio and video files recorded during the class after the class.

目前学习者欲进行学习时，仅能通过影音档案的标题进行检索，然而，标题能提供的信息有限，学习者可能需完整浏览整个影片后才能知道所述是否符合其学习需求，存在有耗时的问题。Currently, when learners want to study, they can only search by the title of the video file. However, the title can only provide limited information. Learners may need to browse the entire video to know whether it meets their learning needs, which is time-consuming.

此外，当学习者欲进行复习时，因通常不会知道自己要复习的课堂的重点(即知识点)在影音档案中的播放时间，使得学习者必须要持续拖动播放时间轴上的播放进度指针或是快转影片，通过搜寻欲观看的片段，明显造成学习者的不便。In addition, when learners want to review, they usually do not know the playback time of the key points (i.e., knowledge points) of the class they want to review in the audio and video files, so they have to continuously drag the playback progress pointer on the playback timeline or fast-forward the video to search for the clips they want to watch, which obviously causes inconvenience to learners.

综上所述，可知先前技术中存在需要浏览整个课堂的影音档案才能知道所述是否符合其学习需求以及需通过拖动播放进度指针或是快转影片方式搜寻知识点所在的片段而造成学习者的不便问题，因此实有必要提出改进的技术手段，来解决此一问题。In summary, it can be seen that the prior art has the problem that learners need to browse the entire class audio and video files to know whether they meet their learning needs and need to search for the segments where the knowledge points are located by dragging the playback progress pointer or fast-forwarding the video, which causes inconvenience to learners. Therefore, it is necessary to propose improved technical means to solve this problem.

发明内容Summary of the invention

本发明公开一种知识点标记生成系统及其方法。The invention discloses a knowledge point mark generation system and a method thereof.

首先，本发明公开一种知识点标记生成系统，其包括：撷取装置、语音辨识装置、处理装置以及整合装置。撷取装置用以在课堂中持续撷取并分析电脑画面影像、投影影像与/或板书影像，以持续取得文本，并基于电脑画面影像、投影影像与/或板书影像的字型与/或字体颜色以及被点选的文字撷取文本中的至少一第一关键词汇。语音辨识装置用以在课堂中持续接收声音信号，并持续将声音信号通过语音转文字方式转换成文字字串，以及通过声纹辨识或声源辨识方式判断发出声音信号的身分，并基于发出声音信号的身分与/或多个预设词汇撷取文字字串中的至少一第二关键词汇。处理装置用以在课堂结束后通过统计方式分析撷取装置所持续取得的文本，以取得至少一第一候选词汇；在课堂结束后通过统计方式分析语音辨识装置所持续取得的文字字串，以取得至少一第二候选词汇；以及将至少一第一关键词汇、至少一第二关键词汇、至少一第一候选词汇与至少一第二候选词汇依据其对应的权重进行分析程序而取得标签词汇。整合装置用以在课堂结束后依据语音辨识装置所持续取得的文字字串中出现具有标签词汇的每一语句的时间区段，且当相邻的所述些时间区段之间的时间差小于特定时间长度时，将相邻的所述些时间区段合并为时间区间，接着，在课堂中所拍摄的影音档案的时间轴上设置对应未被合并的时间区段与时间区间的多个知识点标记，以形成具有所述些知识点标记的影音档案。First, the present invention discloses a knowledge point mark generation system, which includes: a capture device, a speech recognition device, a processing device and an integration device. The capture device is used to continuously capture and analyze computer screen images, projection images and/or blackboard images in the classroom to continuously obtain text, and capture at least one first key word in the text based on the font and/or font color of the computer screen image, projection image and/or blackboard image and the clicked text. The speech recognition device is used to continuously receive sound signals in the classroom, and continuously convert the sound signals into text strings through speech-to-text, and determine the identity of the sound signal through voiceprint recognition or sound source recognition, and capture at least one second key word in the text string based on the identity of the sound signal and/or multiple preset words. The processing device is used to analyze the text continuously obtained by the capture device in a statistical manner after the class is over to obtain at least one first candidate word; analyze the text string continuously obtained by the voice recognition device in a statistical manner after the class is over to obtain at least one second candidate word; and analyze at least one first key word, at least one second key word, at least one first candidate word and at least one second candidate word according to their corresponding weights to obtain a label word. The integration device is used to analyze the time segment of each sentence with the label word in the text string continuously obtained by the voice recognition device after the class is over, and when the time difference between the adjacent time segments is less than a specific time length, merge the adjacent time segments into a time interval, and then set a plurality of knowledge point marks corresponding to the unmerged time segments and time intervals on the time axis of the audio and video file shot in the class to form an audio and video file with the knowledge point marks.

此外，本发明公开一种知识点标记生成方法，其步骤包括：提供知识点标记生成系统，其包括撷取装置、语音辨识装置、处理装置以及整合装置；撷取装置在课堂中持续撷取并分析电脑画面影像、投影影像与/或板书影像，以持续取得文本；撷取装置基于电脑画面影像、投影影像与/或板书影像内的字型与/或字体颜色以及被点选的文字撷取文本中的至少一第一关键词汇；语音辨识装置在课堂中持续接收声音信号，并持续将声音信号通过语音转文字方式转换成文字字串；语音辨识装置通过声纹辨识或声源辨识方式判断发出声音信号的身份；语音辨识装置基于发出声音信号的身份与/或多个预设词汇撷取文字字串中的至少一第二关键词汇；处理装置在课堂结束后通过统计方式分析撷取装置所持续取得的文本，以取得至少一第一候选词汇；处理装置在课堂结束后通过统计方式分析语音辨识装置所持续取得的文字字串，以取得至少一第二候选词汇；处理装置将至少一第一关键词汇、至少一第二关键词汇、至少一第一候选词汇与至少一第二候选词汇依据其对应的权重进行分析程序而取得标签词汇；以及整合装置在课堂结束后依据语音辨识装置所持续取得的文字字串中出现具有标签词汇的每一语句的时间区段，且当相邻的所述些时间区段之间的时间差小于特定时间长度时，将相邻的所述些时间区段合并为时间区间，接着，在课堂中所拍摄的影音档案的时间轴上设置对应未被合并的时间区段与时间区间的多个知识点标记，以形成具有所述些知识点标记的影音档案。In addition, the present invention discloses a knowledge point mark generation method, the steps of which include: providing a knowledge point mark generation system, which includes a capture device, a speech recognition device, a processing device and an integration device; the capture device continuously captures and analyzes computer screen images, projection images and/or blackboard images in the classroom to continuously obtain text; the capture device captures at least one first key word in the text based on the font and/or font color in the computer screen image, projection image and/or blackboard image and the clicked text; the speech recognition device continuously receives sound signals in the classroom, and continuously converts the sound signals into text strings through speech-to-text conversion; the speech recognition device determines the identity of the sound signal through voiceprint recognition or sound source recognition; the speech recognition device captures at least one second key word in the text string based on the identity of the sound signal and/or multiple preset words; the processing device analyzes the text through statistical methods after the class ends The text continuously obtained by the capture device is used to obtain at least one first candidate word; the processing device analyzes the text string continuously obtained by the voice recognition device in a statistical manner after the class is over to obtain at least one second candidate word; the processing device performs an analysis procedure on at least one first key word, at least one second key word, at least one first candidate word and at least one second candidate word according to their corresponding weights to obtain a label word; and the integration device analyzes the time segment of each sentence with the label word appearing in the text string continuously obtained by the voice recognition device after the class is over, and when the time difference between adjacent time segments is less than a specific time length, the adjacent time segments are merged into a time interval, and then, a plurality of knowledge point marks corresponding to the unmerged time segments and time intervals are set on the time axis of the audio and video file shot in the class to form an audio and video file with the knowledge point marks.

本发明所公开的系统与方法如上，与现有技术的差异在于本发明是通过将在课堂中通过文字方式强调的至少一第一关键词汇、在课堂中通过声音方式强调的至少一第二关键词汇、在课堂中通过文字方式重复出现的至少一第一候选词汇以及在课堂中通过声音方式重复出现的至少一第二候选词汇依据其对应的权重进行分析程序而取得至少一标签词汇，并对应标签词汇出现的时间区段与时间区间，在课堂中所拍摄的影音档案的时间轴上设置知识点标记，以形成具有知识点标记的影音档案。The system and method disclosed in the present invention are as described above. The difference from the prior art is that the present invention obtains at least one label word by analyzing at least one first key word emphasized in the classroom by text, at least one second key word emphasized in the classroom by sound, at least one first candidate word repeatedly appearing in the classroom by text, and at least one second candidate word repeatedly appearing in the classroom by sound according to their corresponding weights, and sets knowledge point marks on the time axis of the audio and video files shot in the classroom corresponding to the time segment and time interval when the label words appear, so as to form an audio and video file with knowledge point marks.

通过上述的技术手段，本发明可以让学习者不需要浏览整个课堂的影音档案就可以了解所述课堂的知识点及其所存在的片段，方便学习者进行重点学习或复习。Through the above-mentioned technical means, the present invention allows learners to understand the knowledge points and existing fragments of the class without browsing the audio and video files of the entire class, which is convenient for learners to focus on learning or reviewing.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明知识点标记生成系统的一实施例系统方块图。FIG1 is a system block diagram of an embodiment of a knowledge point tag generation system of the present invention.

图2A与图2B为图1的知识点标记生成系统执行知识点标记生成方法的一实施例方法流程图。2A and 2B are method flow charts of an embodiment of a knowledge point tag generation method executed by the knowledge point tag generation system of FIG. 1 .

【附图标记列表】[List of Reference Numerals]

50 直播模块50 Live Streaming Module

60 标记模块60 Marking Module

70 传输模块70 Transmission Module

100 知识点标记生成系统100 Knowledge Point Marking Generation System

110 撷取装置110 Capture Device

112 摄影模块112 Photography Module

114 解析模块114 Parsing module

120 语音辨识装置120 Voice recognition device

122 麦克风模块122 Microphone Module

124 转换模块124 Conversion Module

126 声纹辨识模块126 Voiceprint Recognition Module

130 处理装置130 Processing device

140 整合装置140 Integrated Device

150 用户端150 Client

160 行为侦测装置160 Behavior Detection Device

162 摄影模块162 Photography Module

164 解析模块164 Parsing module

步骤210 提供知识点标记生成系统，其包括：撷取装置、语音辨识装置、处理装置以及整合装置Step 210 provides a knowledge point tag generation system, which includes: a capture device, a speech recognition device, a processing device, and an integration device.

步骤220 撷取装置在课堂中持续撷取并分析电脑画面影像、投影影像与/或板书影像，以持续取得文本Step 220: The capture device continuously captures and analyzes computer screen images, projection images and/or blackboard images in the classroom to continuously obtain text

步骤230 撷取装置基于电脑画面影像、投影影像与/或板书影像内的字型与/或字体颜色以及被点选的文字撷取文本中的至少一第一关键词汇Step 230: The capture device captures at least one first keyword in the text based on the font and/or font color in the computer screen image, the projection image and/or the blackboard image and the clicked text.

步骤240 语音辨识装置在课堂中持续接收声音信号，并持续将声音信号通过语音转文字方式转换成文字字串Step 240: The voice recognition device continuously receives voice signals in the classroom and continuously converts the voice signals into text strings through voice-to-text conversion.

步骤250 语音辨识装置通过声纹辨识或声源辨识方式判断发出声音信号的身份Step 250: The voice recognition device determines the identity of the person who sent the voice signal by voiceprint recognition or sound source recognition.

步骤260 语音辨识装置基于发出声音信号的身份与/或多个预设词汇撷取文字字串中的至少一第二关键词汇Step 260: The voice recognition device extracts at least one second key word from the text string based on the identity of the person emitting the voice signal and/or a plurality of preset words.

步骤270 处理装置在课堂结束后通过统计方式分析撷取装置所持续取得的文本，以取得至少一第一候选词汇Step 270: After the class is over, the processing device analyzes the text continuously obtained by the capture device in a statistical manner to obtain at least one first candidate vocabulary.

步骤280 处理装置在课堂结束后通过统计方式分析语音辨识装置所持续取得的文字字串，以取得至少一第二候选词汇Step 280: After the class is over, the processing device analyzes the text strings continuously obtained by the speech recognition device in a statistical manner to obtain at least one second candidate vocabulary.

步骤290 处理装置将至少一第一关键词汇、至少一第二关键词汇、至少一第一候选词汇与至少一第二候选词汇依据其对应的权重进行分析程序而取得标签词汇Step 290 The processing device performs an analysis process on at least one first keyword, at least one second keyword, at least one first candidate word, and at least one second candidate word according to their corresponding weights to obtain a tag word.

步骤300 整合装置在课堂结束后依据语音辨识装置所持续取得的文字字串中出现具有标签词汇的每一语句的时间区段，且当相邻的该些时间区段之间的时间差小于一特定时间长度时，将相邻的该些时间区段合并为一时间区间，接着，在课堂中所拍摄的影音档案的时间轴上设置对应未被合并的时间区段与时间区间的多个知识点标记，以形成具有该些知识点标记的影音档案Step 300 After the class is over, the integration device combines the adjacent time segments into a time interval based on the time segments of each sentence with the label vocabulary appearing in the text string continuously obtained by the voice recognition device, and when the time difference between the adjacent time segments is less than a specific time length, then sets a plurality of knowledge point marks corresponding to the uncombined time segments and time intervals on the time axis of the audio and video file shot in the class to form an audio and video file with the knowledge point marks

具体实施方式Detailed ways

以下将配合附图及实施例来详细说明本发明的实施方式，通过对本发明如何应用技术手段来解决技术问题并达成技术功效的实现过程能充分理解并据以实施。The following will describe the implementation modes of the present invention in detail with reference to the accompanying drawings and embodiments, so that the implementation process of how the present invention applies technical means to solve technical problems and achieve technical effects can be fully understood and implemented accordingly.

在说明本发明所公开的知识点标记生成系统及其方法之前，先对本发明的名词进行说明，本发明所述的知识点是指课程中资讯传递的基本单元，因此，可以知道知识点与课程的学习导航具有重要的作用。本发明可通过课堂上所发生的行为与事件依据其对应的权重进行分析，以取得该课堂所具有的知识点，可以让学习者通过教学时所录制的影音档案进行学习或复习时，不需要浏览整个课堂的影音档案就可以了解该课堂的知识点及其所存在的片段。此外，本发明所述的撷取装置、语音辨识装置与行为侦测装置可于每一课堂开始时同步启动运行，并于每一课堂结束后同步停止运行。Before explaining the knowledge point marking generation system and method disclosed in the present invention, the terms of the present invention are first explained. The knowledge point described in the present invention refers to the basic unit of information transmission in the course. Therefore, it can be known that knowledge points play an important role in the learning navigation of the course. The present invention can analyze the behaviors and events that occur in the classroom according to their corresponding weights to obtain the knowledge points of the classroom, so that learners can understand the knowledge points of the classroom and the existing fragments thereof when learning or reviewing through the audio and video files recorded during teaching, without browsing the audio and video files of the entire classroom. In addition, the capture device, voice recognition device and behavior detection device described in the present invention can be started and operated synchronously at the beginning of each class, and stopped synchronously after each class ends.

请参阅图1、图2A与图2B，图1为本发明知识点标记生成系统的一实施例系统方块图，图2A与图2B为图1的知识点标记生成系统执行知识点标记生成方法的一实施例方法流程图。在本实施例中，知识点标记生成系统100，其包括：撷取装置110、语音辨识装置120、处理装置130以及整合装置140(步骤210)。其中，撷取装置110与处理装置130连接，语音辨识装置120与处理装置130连接，处理装置130与整合装置140连接。Please refer to Figures 1, 2A and 2B. Figure 1 is a system block diagram of an embodiment of the knowledge point tag generation system of the present invention, and Figures 2A and 2B are method flow charts of an embodiment of the knowledge point tag generation system of Figure 1 executing the knowledge point tag generation method. In this embodiment, the knowledge point tag generation system 100 includes: a capture device 110, a speech recognition device 120, a processing device 130 and an integration device 140 (step 210). Among them, the capture device 110 is connected to the processing device 130, the speech recognition device 120 is connected to the processing device 130, and the processing device 130 is connected to the integration device 140.

其中，撷取装置110、语音辨识装置120、处理装置130以及整合装置140可以利用各种方式来实现，包括软体、硬体、韧体或其任意组合。在实施例中提出的技术使用软体或韧体可以被储存在机器可读储存媒体上，例如：唯读记忆体(ROM)、随机存取记忆体(RAM)、磁盘储存媒体、光储存媒体、快闪记忆体装置等等，并且可以由一个或多个通用或专用的可程式化微处理器执行。撷取装置110与处理装置130之间、语音辨识装置120与处理装置130之间以及处理装置130与整合装置140之间可通过无线或有线的方式相互连接，以进行信号与资料的传递。The capture device 110, the speech recognition device 120, the processing device 130 and the integration device 140 can be implemented in various ways, including software, hardware, firmware or any combination thereof. The technology proposed in the embodiments uses software or firmware that can be stored on a machine-readable storage medium, such as a read-only memory (ROM), a random access memory (RAM), a magnetic disk storage medium, an optical storage medium, a flash memory device, etc., and can be executed by one or more general or special programmable microprocessors. The capture device 110 and the processing device 130, the speech recognition device 120 and the processing device 130, and the processing device 130 and the integration device 140 can be connected to each other wirelessly or wiredly to transmit signals and data.

撷取装置110在课堂中持续撷取并分析电脑画面影像、投影影像与/或板书影像，以持续取得文本(步骤220)。更详细地说，撷取装置110可包括摄影模块112与解析模块114，摄影模块112连接解析模块114。其中，摄影模块112可用以在每一课堂上持续拍摄讲台上的影像，讲台上的影像的内容可包括：投影画面与/或教室黑板或白板，以撷取所述投影影像与/或板书影像，但本实施例并非用以限定本发明，可依据实际需求进行调整。举例而言，当上电脑课程时，摄影模块112可用以持续拍摄教学者所操作的电脑画面，以撷取所述电脑画面影像。需注意的是，摄影模块112所持续拍摄的内容需包括教学者进行教学时所提供具有文字的辅助教学资料，例如：讲义、投影片、黑板或白板上的板书等。解析模块114可持续接收并分析摄影模块112所撷取的电脑画面影像、投影影像与/或板书影像，以取得每一电脑画面影像、每一投影影像与/或每一板书影像中的文字，以产生对应的文本(text)。其中，解析模块114系通过光学字元辨识(optical character recognition，OCR)技术将每一电脑画面影像、每一投影影像与/或每一板书影像所具有的文字撷取出来，以形成文本(即图像转文字)。The capture device 110 continuously captures and analyzes computer screen images, projection images and/or blackboard images in the classroom to continuously obtain text (step 220). In more detail, the capture device 110 may include a photographing module 112 and an analysis module 114, and the photographing module 112 is connected to the analysis module 114. Among them, the photographing module 112 can be used to continuously shoot images on the podium in each class, and the content of the image on the podium may include: projection images and/or classroom blackboards or whiteboards, so as to capture the projection images and/or blackboard images, but this embodiment is not used to limit the present invention and can be adjusted according to actual needs. For example, when taking a computer course, the photographing module 112 can be used to continuously shoot the computer screen operated by the teacher to capture the computer screen image. It should be noted that the content continuously shot by the photographing module 112 must include auxiliary teaching materials with text provided by the teacher during teaching, such as: handouts, slides, blackboards or whiteboards, etc. The analysis module 114 can continuously receive and analyze the computer screen images, projection images and/or blackboard images captured by the photography module 112 to obtain the text in each computer screen image, each projection image and/or each blackboard image to generate corresponding text. The analysis module 114 uses optical character recognition (OCR) technology to extract the text in each computer screen image, each projection image and/or each blackboard image to form text (i.e., image to text conversion).

撷取装置110基于电脑画面影像、投影影像与/或板书影像内的字型与/或字体颜色以及被点选的文字撷取文本中的至少一第一关键词汇(步骤230)。更详细地说，由于教学者于课堂进行教学时所提供的辅助教学资料中的文字可具有不同的字型与/或字体颜色，以加强某些知识点(即重点)的传递，使得学习者可通过具有特殊的字型与/或字体颜色的文字了解教学者所要传达的知识点(即重点)，因此，撷取装置110可基于电脑画面影像、投影影像与/或板书影像内的字型与/或字体颜色撷取文本中的至少一第一关键词汇(即可能的知识点)，其中，字型可包括但不限于字体大小、字体粗细、字体的类型、字体是否为斜体、字体是否有底线与字体是否有文字效果，每一第一关键词汇即由具有特殊的字型与/或字体颜色的相邻文字所组成的词汇。此外，在本实施例中，也可将教学者于课堂进行教学时所点选(其包含用手、雷射笔或电脑游标指向或选取)电脑画面影像、投影影像与/或板书影像内的文字为其想加强传达的知识点(即重点)，因此，撷取装置110也可基于电脑画面影像、投影影像与/或板书影像内被点选的文字撷取文本中的至少一第一关键词汇(即可能的知识点)，其中，每一第一关键词汇即由被点选的文字所组成的词汇。需注意的是，通过不同方式(例如：特殊字型、特殊字体颜色与被点选的文字)所撷取的每一第一关键词汇，其在后续处理装置130进行分析程序所对应的权重可能相同或不相同，可依据实际需求进行调整。The capture device 110 captures at least one first key word in the text based on the font and/or font color in the computer screen image, the projection image and/or the blackboard image and the selected text (step 230). More specifically, since the text in the auxiliary teaching materials provided by the teacher during the classroom teaching may have different fonts and/or font colors to strengthen the transmission of certain knowledge points (i.e., key points), so that the learner can understand the knowledge points (i.e., key points) that the teacher wants to convey through the text with special fonts and/or font colors, the capture device 110 may capture at least one first key word (i.e., possible knowledge point) in the text based on the font and/or font color in the computer screen image, the projection image and/or the blackboard image, wherein the font may include but is not limited to font size, font weight, font type, whether the font is italic, whether the font has an underline, and whether the font has a text effect, and each first key word is a word composed of adjacent words with special fonts and/or font colors. In addition, in this embodiment, the text in the computer screen image, projection image and/or blackboard image that the teacher clicks (including pointing or selecting with the hand, laser pen or computer cursor) when teaching in the classroom can also be regarded as the knowledge point (i.e., key point) that the teacher wants to emphasize and convey. Therefore, the capture device 110 can also capture at least one first key word (i.e., possible knowledge point) in the text based on the text clicked in the computer screen image, projection image and/or blackboard image, wherein each first key word is a word composed of the clicked text. It should be noted that each first key word captured by different methods (for example: special fonts, special font colors and clicked text) may have the same or different corresponding weights in the subsequent processing device 130 for the analysis procedure, and can be adjusted according to actual needs.

语音辨识装置120在课堂中持续接收声音信号，并持续将声音信号通过语音转文字方式转换成文字字串(步骤240)。更详细地说，语音辨识装置120可包括麦克风模块122与转换模块124，麦克风模块122可用以持续接收课堂中教学者与学习者所发出的声音(即声音信号)，转换模块124可通过语音转文字方式将麦克风模块122所持续接收到的声音信号转换成文字字串。其中，麦克风模块122可包括多个麦克风单元(未绘制)，用以配置于教室的各个地方，以完整接收整个教室内于课堂中教学者与学习者所发出声音(即声音信号)，麦克风单元的数量与配置位置可依据实际需求进行调整。The speech recognition device 120 continuously receives sound signals in the classroom, and continuously converts the sound signals into text strings through speech-to-text conversion (step 240). In more detail, the speech recognition device 120 may include a microphone module 122 and a conversion module 124. The microphone module 122 may be used to continuously receive the sounds (i.e., sound signals) emitted by teachers and learners in the classroom, and the conversion module 124 may convert the sound signals continuously received by the microphone module 122 into text strings through speech-to-text conversion. The microphone module 122 may include a plurality of microphone units (not shown) for configuration in various places in the classroom to fully receive the sounds (i.e., sound signals) emitted by teachers and learners in the classroom. The number and configuration positions of the microphone units may be adjusted according to actual needs.

语音辨识装置120通过声纹辨识或声源辨识方式判断发出声音信号的身份(步骤250)。更详细地说，由于语音辨识装置120还可包括声纹辨识模块126，用以辨识麦克风模块122所接收到的声音信号为教学者或学习者所发出，进而判断转换模块124所对应转换出来的文字字串为教学者或学习者所说的话语。此外，在本实施例中，由于教学者的位置通常为讲台附近(即教室较为前方的位置)，而学习者的位置相较于教学者的位置通常为教室中间或后方的位置，因此，也可通过麦克风模块122判断出声源位置进而判断发出声音信号的身份。更详细地说，由于麦克风模块122可包括多个配置于教室的各个地方的麦克风单元，因此，麦克风模块122可根据该些麦克风单元接收到同一声音信号的时间差与该些麦克风单元的相对配置位置来判断该声音信号的位置，再依据该声音信号的位置判断该声音信号为教学者或学习者所发出，进而判断转换模块124所对应转换出来的文字字串为教学者或学习者所说的话语。The speech recognition device 120 determines the identity of the person who sends the sound signal by voiceprint recognition or sound source recognition (step 250). In more detail, since the speech recognition device 120 may also include a voiceprint recognition module 126, it is used to identify that the sound signal received by the microphone module 122 is sent by the teacher or the learner, and then determine that the text string converted by the conversion module 124 is the words spoken by the teacher or the learner. In addition, in this embodiment, since the teacher's position is usually near the podium (i.e., the front position of the classroom), and the learner's position is usually in the middle or back of the classroom compared to the teacher's position, the microphone module 122 can also be used to determine the sound source position and then determine the identity of the sound signal. In more detail, since the microphone module 122 may include multiple microphone units arranged in various places in the classroom, the microphone module 122 can determine the position of the sound signal based on the time difference when the microphone units receive the same sound signal and the relative configuration positions of the microphone units, and then determine whether the sound signal is emitted by the teacher or the learner based on the position of the sound signal, and then determine that the text string converted by the conversion module 124 is the words spoken by the teacher or the learner.

语音辨识装置120基于发出声音信号的身份与/或多个预设词汇撷取文字字串中的至少一第二关键词汇(步骤260)。更详细地说，由于教学者发出的声音信号其所对应取得的文字字串与/或含有预设词汇(例如：特别、关键、重点、必背、考点等)其所对应取得的文字字串可能包含该课堂的知识点的机率很高，因此，语音辨识装置120可在教学者发出的声音信号其所对应取得的文字字串与/或含有预设词汇(例如：特别、关键、重点、必背、考点等)其所对应取得的文字字串中撷取出至少一第二关键词汇(即可能的知识点)。其中，第二关键词汇的撷取可通过语意分析方式取得，但本实施例并非用以限定本发明。此外，在另一实施例中，教学者于教学过程中所发出音量较大的声音信号其所对应取得的文字字串也可做为撷取第二关键词汇的参数之一。The speech recognition device 120 extracts at least one second key word from the text string based on the identity of the person who sends the sound signal and/or multiple preset words (step 260). In more detail, since the text string corresponding to the sound signal sent by the teacher and/or contains preset words (for example: special, key, key point, must memorize, test point, etc.) and the corresponding text string may contain the knowledge points of the class, the probability is very high. Therefore, the speech recognition device 120 can extract at least one second key word (i.e., possible knowledge point) from the text string corresponding to the sound signal sent by the teacher and/or contains preset words (for example: special, key, key point, must memorize, test point, etc.). The extraction of the second key word can be obtained by semantic analysis, but this embodiment is not used to limit the present invention. In addition, in another embodiment, the text string corresponding to the sound signal with a higher volume sent by the teacher during the teaching process can also be used as one of the parameters for extracting the second key word.

需注意的是，在教学者发出的声音信号其所对应取得的文字字串与/或含有预设词汇(例如：特别、关键、重点、必背、考点等)其所对应取得的文字字串中所撷取的每一第二关键词汇，其在后续处理装置130进行分析程序所对应的权重可能相同或不相同，可依据实际需求进行调整。It should be noted that the text string corresponding to the sound signal emitted by the teacher and/or each second key word extracted from the text string corresponding to the preset vocabulary (for example: special, key, key point, must-memorize, test point, etc.) may have the same or different corresponding weights in the subsequent analysis process performed by the processing device 130, and can be adjusted according to actual needs.

处理装置130在课堂结束后通过统计方式分析撷取装置110所持续取得的文本，以取得至少一第一候选词汇(步骤270)。更详细地说，处理装置130先将撷取装置110所取得的该些文本中的词汇进行统计，再将出现频率较高的前几个词汇定义为第一候选词汇(即可能的知识点)。需注意的是，由于任一字汇出现频率出现过高时，该字汇可能为该课堂的主轴，不适宜成为后面步骤所述之标签词汇，因此，当处理装置130在该课堂结束后通过统计方式分析该撷取装置110所持续取得的该文本时，若判断任一字汇出现频率超出预设值时，排除该字汇成为第一候选词汇，其中，预设值的大小可依据实际需求进行调整。After the class is over, the processing device 130 analyzes the text continuously obtained by the capture device 110 in a statistical manner to obtain at least one first candidate vocabulary (step 270). In more detail, the processing device 130 first counts the vocabulary in the texts obtained by the capture device 110, and then defines the first few vocabulary with higher frequency of occurrence as the first candidate vocabulary (i.e., possible knowledge points). It should be noted that since the frequency of occurrence of any word is too high, the word may be the main axis of the class and is not suitable to be the label vocabulary described in the subsequent steps. Therefore, when the processing device 130 analyzes the text continuously obtained by the capture device 110 in a statistical manner after the class is over, if it is determined that the frequency of occurrence of any word exceeds a preset value, the word is excluded from becoming the first candidate vocabulary, wherein the size of the preset value can be adjusted according to actual needs.

处理装置130在课堂结束后通过统计方式分析语音辨识装置120所持续取得的文字字串，以取得至少一第二候选词汇(步骤280)。更详细地说，处理装置130先将语音辨识装置120所取得的该些文字字串中的词汇进行统计，再将出现频率较高的前几个词汇定义为第二候选词汇(即可能的知识点)。需注意的是，由于任一字汇出现频率出现过高时，该字汇可能为该课堂的主轴，不适宜成为后面步骤所述的标签词汇，因此，当处理装置130在该课堂结束后通过统计方式分析语音辨识装置120所持续取得的文字字串时，若判断任一字汇出现频率超出预设值时，排除该字汇成为第二候选词汇，其中，预设值的大小可依据实际需求进行调整。After the class is over, the processing device 130 analyzes the text strings continuously obtained by the speech recognition device 120 in a statistical manner to obtain at least one second candidate vocabulary (step 280). In more detail, the processing device 130 first counts the vocabulary in the text strings obtained by the speech recognition device 120, and then defines the first few words with higher frequency of occurrence as second candidate vocabulary (i.e., possible knowledge points). It should be noted that since the frequency of occurrence of any word is too high, the word may be the main axis of the class and is not suitable to be the label vocabulary described in the subsequent steps. Therefore, when the processing device 130 analyzes the text strings continuously obtained by the speech recognition device 120 in a statistical manner after the class is over, if it is determined that the frequency of occurrence of any word exceeds a preset value, the word is excluded from becoming the second candidate vocabulary, wherein the size of the preset value can be adjusted according to actual needs.

处理装置130将至少一第一关键词汇、至少一第二关键词汇、至少一第一候选词汇与至少一第二候选词汇依据其对应的权重进行分析程序而取得标签词汇(步骤290)。更详细地说，由于第一关键词汇、第二关键词汇、第一候选词汇与第二候选词汇是否成为知识点的机率不同，因此，在决定该课堂的知识点所进行的分析程序中，第一关键词汇、第二关键词汇、第一候选词汇与第二候选词汇所对应的权重不同，可依据实际需求进行调整。其中，分析程序即通过第一关键词汇、第二关键词汇、第一候选词汇与第二候选词汇及其所对应的权重决定该课堂的知识点(即标签词汇)，知识点(即标签词汇)的数量可依据实际需求进行调整。The processing device 130 performs an analysis procedure on at least one first key word, at least one second key word, at least one first candidate word, and at least one second candidate word according to their corresponding weights to obtain a label word (step 290). In more detail, since the probability of the first key word, the second key word, the first candidate word, and the second candidate word becoming a knowledge point is different, therefore, in the analysis procedure for determining the knowledge point of the class, the weights corresponding to the first key word, the second key word, the first candidate word, and the second candidate word are different, and can be adjusted according to actual needs. Among them, the analysis procedure determines the knowledge point (i.e., label word) of the class through the first key word, the second key word, the first candidate word, and the second candidate word and their corresponding weights, and the number of knowledge points (i.e., label words) can be adjusted according to actual needs.

当知识点(即标签词汇)的数量为一个时，整合装置140在课堂结束后依据语音辨识装置120所持续取得的文字字串中出现具有标签词汇的每一语句的时间区段，且当相邻的该些时间区段之间的时间差小于特定时间长度时，将相邻的该些时间区段合并为时间区间，接着，在课堂中所拍摄的影音档案的时间轴上设置对应未被合并的时间区段与时间区间的多个知识点标记，以形成具有该些知识点标记的影音档案(步骤300)。更详细地说，知识点标记生成系统100还可包括摄像装置(未绘制)，用以拍摄欲放置在平台或网站上以供学习者学习或复习的影音档案以及用以拍摄直播该课堂所需的串流影音(即可同时直播并储存该课堂的串流影音，以在课堂结束后产生该课堂的影音档案)，其中，摄像装置、撷取装置110与语音辨识装置120可于每一课堂开始时同步启动运行，并于每一课堂结束后同步停止运行。通过上述步骤290可得到该课堂的一个知识点(即一个标签词汇)，因此，整合装置140可在语音辨识装置120所取得的该些文字字串中搜寻具有该知识点(即该标签词汇)的每一语句出现的时间区段，且当相邻的该些时间区段之间的时间差(即时间间隔)小于特定时间长度时，将相邻的该些时间区段合并为一时间区间，其中，特定时间长度的大小可依据实际需求进行调整。接着，整合装置140可在课堂结束后依据上述未被合并的时间区段与时间区间对应在课堂中摄像装置所拍摄产生的影音档案的时间轴上设置多个知识点标记，以形成具有该些知识点标记的影音档案。When the number of knowledge points (i.e., label words) is one, the integration device 140 merges the adjacent time segments into a time interval according to the time segment of each sentence with the label word appearing in the text string continuously obtained by the speech recognition device 120 after the class ends, and when the time difference between the adjacent time segments is less than the specific time length, then sets a plurality of knowledge point marks corresponding to the unmerged time segments and time intervals on the time axis of the audio and video file shot in the class to form an audio and video file with the knowledge point marks (step 300). In more detail, the knowledge point mark generation system 100 may also include a camera device (not shown) for shooting the audio and video files to be placed on the platform or website for learners to learn or review, and for shooting the streaming audio and video required for live broadcasting the class (i.e., the streaming audio and video of the class can be simultaneously broadcast and stored to generate the audio and video file of the class after the class ends), wherein the camera device, the capture device 110 and the speech recognition device 120 can be started and run synchronously at the beginning of each class, and stopped synchronously after each class ends. Through the above step 290, a knowledge point (i.e., a label word) of the class can be obtained. Therefore, the integration device 140 can search the time segments in which each sentence with the knowledge point (i.e., the label word) appears in the text strings obtained by the speech recognition device 120, and when the time difference (i.e., the time interval) between the adjacent time segments is less than the specific time length, the adjacent time segments are merged into a time interval, wherein the size of the specific time length can be adjusted according to actual needs. Then, after the class ends, the integration device 140 can set multiple knowledge point marks on the time axis of the audio and video file captured by the camera device in the classroom according to the above-mentioned unmerged time segments and time intervals, so as to form an audio and video file with these knowledge point marks.

其中，当标签词汇的数量为多数个时，整合装置140可依据上述流程找到每一标签词汇对应的未被合并时间区段与时间区间，再依据不同颜色区分不同标签词汇所对应的该些知识点标记，方便学习者区分不同标签词汇所对应的知识点标记。举例而言，当标签词汇为「傅立叶变换(Fourier transform)」与「拉普拉斯变换(Laplace transform)」时，影音档案的时间轴上所设置对应「傅立叶变换」的知识点标记可为但不限于黄色，对应「傅立叶变换」的知识点标记可为但不限于绿色，但本举例并非用以限定本发明。Among them, when the number of label words is a plurality, the integration device 140 can find the unmerged time segment and time interval corresponding to each label word according to the above process, and then distinguish the knowledge point marks corresponding to different label words according to different colors, so as to facilitate learners to distinguish the knowledge point marks corresponding to different label words. For example, when the label words are "Fourier transform" and "Laplace transform", the knowledge point mark corresponding to "Fourier transform" set on the time axis of the audio and video file can be but not limited to yellow, and the knowledge point mark corresponding to "Fourier transform" can be but not limited to green, but this example is not used to limit the present invention.

在本实施例中，除了通过第一关键词汇、第二关键词汇、第一候选词汇与第二候选词汇及其所对应的权重决定该课堂的标签词汇之外，还可将该课堂中每一学习者的行为，例如：抬头看黑板、低头写笔记等，加入决定该课堂的标签词汇的参数之一，详细说明如下所示。在本实施例中，知识点标记生成系统100还可包括行为侦测装置160，知识点标记生成方法还可包括：行为侦测装置160在该课堂中持续接收并分析学习者课堂影像，以取得每一学习者的行为辨识信号；当行为侦测装置160取得任一学习者的行为辨识信号为抬头或写笔记时，处理装置130依据其前后一预期时间区间内语音辨识装置120所取得的文字字串，产生行为字串；处理装置130通过统计方式、全班抬头率与/或全班写笔记的比例分析该些行为字串，以取得至少一第四候选词汇；以及处理装置130还将该至少一第四候选词汇依据其对应的权重加入分析程序而取得标签词汇。In this embodiment, in addition to determining the label vocabulary of the class by the first key vocabulary, the second key vocabulary, the first candidate vocabulary and the second candidate vocabulary and their corresponding weights, the behavior of each learner in the class, such as looking up at the blackboard, looking down to write notes, etc., can also be added as one of the parameters for determining the label vocabulary of the class, as described in detail below. In this embodiment, the knowledge point mark generation system 100 can also include a behavior detection device 160, and the knowledge point mark generation method can also include: the behavior detection device 160 continuously receives and analyzes the learner classroom image in the class to obtain the behavior recognition signal of each learner; when the behavior detection device 160 obtains the behavior recognition signal of any learner as looking up or writing notes, the processing device 130 generates a behavior string according to the text string obtained by the voice recognition device 120 within an expected time interval before and after; the processing device 130 analyzes the behavior strings by statistical methods, the head-up rate of the whole class and/or the proportion of the whole class writing notes to obtain at least one fourth candidate vocabulary; and the processing device 130 also adds the at least one fourth candidate vocabulary to the analysis program according to its corresponding weight to obtain the label vocabulary.

更详细地说，行为侦测装置160可包括摄影模块162与解析模块164，摄影模块162连接解析模块164。摄影模块162可用以在每一课堂上持续拍摄教室中每一学习者所在位置的影像(即每一学习者在课堂上的影像，也就是学习者课堂影像)，通过分析摄影模块162所持续拍摄的该些影像可以取得每一学习者的行为辨识信号(即每一学习者的动态行为)。由于当学习者抬头看投影影像、黑板与/或白板或低头写笔记时，代表当时段教学者所教授的内容可能为重点(即知识点)，因此，当行为侦测装置160取得任一学习者的行为辨识信号为抬头看投影影像、黑板与/或白板或低头写笔记时，处理装置130可依据该学习者抬头看投影影像、黑板与/或白板或低头写笔记的发生时间点前后一预期时间区间内语音辨识装置120所取得的文字字串，产生一行为字串，其中，预期时间区间的大小可依据实际需求进行调整。处理装置130可先将其所产生的该些行为字串中的词汇进行统计，再将出现频率较高的前几个词汇定义为第四候选词汇(即可能的知识点)。In more detail, the behavior detection device 160 may include a camera module 162 and an analysis module 164, wherein the camera module 162 is connected to the analysis module 164. The camera module 162 may be used to continuously capture images of each learner's location in the classroom (i.e., images of each learner in the classroom, i.e., learner classroom images) in each class, and the behavior recognition signal of each learner (i.e., the dynamic behavior of each learner) may be obtained by analyzing the images continuously captured by the camera module 162. When a learner looks up at a projection image, blackboard and/or whiteboard or lowers his head to write notes, it means that the content taught by the teacher at that time may be the key point (i.e., knowledge point). Therefore, when the behavior detection device 160 obtains a behavior recognition signal of any learner looking up at a projection image, blackboard and/or whiteboard or lowering his head to write notes, the processing device 130 can generate a behavior string based on the text string obtained by the speech recognition device 120 within an expected time interval before and after the time point when the learner looks up at the projection image, blackboard and/or whiteboard or lowers his head to write notes, wherein the size of the expected time interval can be adjusted according to actual needs. The processing device 130 can first count the words in the behavior strings it generates, and then define the first few words with higher frequency of occurrence as the fourth candidate words (i.e., possible knowledge points).

此外，当同一时间抬头看投影影像、黑板与/或白板或低头写笔记的学习者数量越多时，代表该时间点前后语音辨识装置120所取得的文字字串越有可能是该课堂的知识点，因此，处理装置130在取得第四候选词汇的过程中，需将全班抬头率与/或全班写笔记的比例加入参考的因素，进而取得至少一第四候选词汇。接着，处理装置130还可将该至少一第四候选词汇依据其对应的权重加入分析程序而取得标签词汇，其中，该至少一第四候选词汇所对应的权重可依据实际需求进行调整。In addition, when the number of learners who look up at the projection image, blackboard and/or whiteboard or look down to take notes at the same time is greater, it means that the text string obtained by the speech recognition device 120 before and after the time point is more likely to be the knowledge point of the class. Therefore, in the process of obtaining the fourth candidate vocabulary, the processing device 130 needs to add the head-up rate of the whole class and/or the proportion of the whole class taking notes as a reference factor to obtain at least one fourth candidate vocabulary. Then, the processing device 130 can also add the at least one fourth candidate vocabulary to the analysis program according to its corresponding weight to obtain the label vocabulary, wherein the weight corresponding to the at least one fourth candidate vocabulary can be adjusted according to actual needs.

再者，在本实施例中，除了通过第一关键词汇、第二关键词汇、第一候选词汇、第二候选词汇、第四候选词汇及其所对应的权重决定该课堂的标签词汇之外，还可将通过直播进行学习的每一学习者的行为，例如：在直播串流影音过程中设置至少一标记信息，加入决定该课堂的标签词汇的参数之一，详细说明如下所示。在本实施例中，知识点标记生成系统100还可包括至少一用户端150，其中，每一学习者可通过其拥有的用户端150通过直播进行学习。Furthermore, in this embodiment, in addition to determining the label vocabulary of the class through the first key vocabulary, the second key vocabulary, the first candidate vocabulary, the second candidate vocabulary, the fourth candidate vocabulary and their corresponding weights, the behavior of each learner who studies through live broadcasting can also be added to one of the parameters that determine the label vocabulary of the class, for example: at least one tag information is set during the live streaming video, and the detailed description is as follows. In this embodiment, the knowledge point tag generation system 100 can also include at least one user terminal 150, wherein each learner can learn through live broadcasting through the user terminal 150 he owns.

每一用户端150包括直播模块50、标记模块60以及传输模块70，知识点标记生成方法还可包括：每一用户端150的直播模块50在该课堂中持续直播串流影音；每一用户端150的标记模块60允许在直播串流影音过程中设置至少一标记信息；每一用户端150的传输模块70将设置的该至少一标记信息的时间点传输予处理装置130；处理装置130在该课堂结束后依据每一用户端150设置该至少一标记信息的时间点的前后一预定时间区间内语音辨识装置120所取得的文字字串，产生标记字串；处理装置130通过统计方式分析该些标记字串，以取得至少一第三候选词汇；以及处理装置130还将该至少一第三候选词汇依据其对应的权重加入分析程序而取得标签词汇。其中，用户端150的数量可依据实际需求进行调整。为避免图1的图面过于复杂，于此仅绘制出两个用户端150，实际用户端150的数量可依据实际需求进行调整。Each client 150 includes a live broadcast module 50, a marking module 60 and a transmission module 70. The knowledge point marking generation method may also include: the live broadcast module 50 of each client 150 continuously broadcasts the streaming video in the class; the marking module 60 of each client 150 allows at least one marking information to be set during the live streaming video; the transmission module 70 of each client 150 transmits the time point of the at least one marking information to the processing device 130; the processing device 130 generates a marking string based on the text string obtained by the speech recognition device 120 within a predetermined time interval before and after the time point of the at least one marking information set by each client 150 after the class ends; the processing device 130 analyzes the marking strings in a statistical manner to obtain at least one third candidate word; and the processing device 130 also adds the at least one third candidate word to the analysis program according to its corresponding weight to obtain a label word. Among them, the number of client 150 can be adjusted according to actual needs. In order to avoid over-complexity in the diagram of FIG. 1 , only two user terminals 150 are drawn here, and the actual number of user terminals 150 can be adjusted according to actual needs.

换句话说，每一学习者通过其拥有的用户端150通过直播进行学习时(即在直播串流影音过程中)，可随时针对教学者于当前时段所教授的部分设置标记信息(类似上述的低头写笔记的概念)。由于当学习者设置标记信息时，代表当时段教学者所教授的内容可能为重点(即知识点)，因此，当任一学习者通过其拥有的用户端150设置标记信息时，处理装置130可依据该学习者设置标记信息的发生时间点前后一预定时间区间内语音辨识装置120所取得的文字字串，产生一标记字串，其中，预定时间区间的大小可依据实际需求进行调整。处理装置130可先将其所取得的该些标记字串中的词汇进行统计，再将出现频率较高的前几个词汇定义为第三候选词汇(即可能的知识点)。接着，处理装置130还可将该至少一第三候选词汇依据其对应的权重加入分析程序而取得标签词汇，其中，该至少一第三候选词汇所对应的权重可依据实际需求进行调整。In other words, when each learner learns through live broadcast through the user terminal 150 owned by him (i.e., during the live streaming video process), he can set marking information for the part taught by the teacher in the current period at any time (similar to the concept of writing notes with his head down as mentioned above). Since when the learner sets the marking information, it means that the content taught by the teacher at that time may be the focus (i.e., the knowledge point), therefore, when any learner sets the marking information through the user terminal 150 owned by him, the processing device 130 can generate a marking string based on the text string obtained by the speech recognition device 120 within a predetermined time interval before and after the time point when the learner sets the marking information, wherein the size of the predetermined time interval can be adjusted according to actual needs. The processing device 130 can first count the words in the marked strings it has obtained, and then define the first few words with higher frequency of occurrence as third candidate words (i.e., possible knowledge points). Then, the processing device 130 can also add the at least one third candidate word to the analysis program according to its corresponding weight to obtain a label word, wherein the weight corresponding to the at least one third candidate word can be adjusted according to actual needs.

需要特别注意的是，除了有说明其因果关系之外，本实施例的知识点标记生成方法可以依照任何顺序执行上述步骤。It should be noted that, except for the explanation of the causal relationship, the knowledge point tag generation method of this embodiment can perform the above steps in any order.

综上所述，可知本发明与现有技术之间的差异在于通过将在课堂中通过文字方式强调的至少一第一关键词汇、在课堂中通过声音方式强调的至少一第二关键词汇、在课堂中通过文字方式重复出现的至少一第一候选词汇以及在课堂中通过声音方式重复出现的至少一第二候选词汇依据其对应的权重进行分析程序而取得标签词汇，并对应标签词汇出现的时间区段与时间区间，在课堂中所拍摄的影音档案的时间轴上设置知识点标记，以形成具有知识点标记的影音档案，通过此一技术手段可以解决现有技术所存在的问题，进而让学习者不需要浏览整个课堂的影音档案就可以了解该课堂的知识点及其所存在的片段，方便学习者进行重点学习或复习。In summary, it can be seen that the difference between the present invention and the prior art lies in that at least one first key word emphasized in the classroom by text, at least one second key word emphasized in the classroom by sound, at least one first candidate word repeatedly appearing in the classroom by text, and at least one second candidate word repeatedly appearing in the classroom by sound are analyzed according to their corresponding weights to obtain label words, and corresponding to the time segment and time interval of the appearance of the label words, knowledge point marks are set on the time axis of the audio and video files shot in the classroom to form an audio and video file with knowledge point marks. This technical means can solve the problems existing in the prior art, so that learners can understand the knowledge points of the class and the existing fragments therein without browsing the audio and video files of the entire class, which is convenient for learners to focus on learning or reviewing.

虽然本发明以前述的实施例公开如上，然其并非用以限定本发明，任何熟习相像技艺者，在不脱离本发明的精神和范围内，当可作些许之更动与润饰，因此本发明的专利保护范围须视本说明书所附的权利要求书所界定者为准。Although the present invention is disclosed as above with the aforementioned embodiments, it is not intended to limit the present invention. Any person skilled in the art may make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the patent protection scope of the present invention shall be subject to that defined in the claims attached to this specification.

Claims

1. A knowledge point tag generation system, comprising:

The capturing device is used for continuously capturing and analyzing the computer picture image, the projection image and/or the board image in a classroom so as to continuously obtain texts, and capturing at least one first key word in the texts based on the character types and/or the character colors in the computer picture image, the projection image and/or the board image and the clicked characters;

The voice recognition device is used for continuously receiving a voice signal in the classroom, continuously converting the voice signal into a character string in a voice-to-character mode, judging the identity of the voice signal in a voiceprint recognition or sound source recognition mode, and capturing at least one second key word in the character string based on the identity of the voice signal and/or a plurality of preset words;

The processing device is used for analyzing the text continuously acquired by the acquisition device in a statistical mode after the class is finished so as to acquire at least one first candidate vocabulary; after the class is finished, the character strings continuously obtained by the voice recognition device are analyzed in a statistical mode so as to obtain at least one second candidate vocabulary; the at least one first key word, the at least one second key word, the at least one first candidate word and the at least one second candidate word are subjected to an analysis program according to the corresponding weights so as to obtain a tag word; and

The integrating device is used for generating a time section of each sentence with the tag word in the word string continuously acquired by the voice recognition device after the class is finished, combining the adjacent time sections into a time section when the time difference between the adjacent time sections is smaller than a specific time length, and setting a plurality of knowledge point marks corresponding to the time sections which are not combined and the time section on a time axis of the video file shot in the class so as to form the video file with the knowledge point marks.

2. The knowledge point tag generation system of claim 1, wherein the knowledge point tag generation system further comprises:

At least one user terminal, each user terminal includes:

The live broadcast module is used for continuously broadcasting streaming video and audio in the class;

The marking module is used for allowing the setting of the at least one marking information in the process of live broadcasting the streaming video and audio; and

The transmission module is used for transmitting the set time point of the at least one piece of marking information to the processing device;

After the class is finished, the processing device generates a marked word string according to the word string acquired by the voice recognition device in a preset time interval before and after the time point of setting the at least one marked information by each user side; analyzing the marked word strings in a statistical mode to obtain at least one third candidate word; and adding the at least one third candidate vocabulary into the analysis program according to the corresponding weight to obtain the tag vocabulary.

3. The knowledge point marking generation system of claim 1 or 2, wherein the knowledge point marking generation system further comprises:

The behavior detection device is used for continuously receiving and analyzing the learner classroom images in the classroom so as to acquire behavior identification signals of each learner;

When the behavior detection device obtains that any behavior identification signal of the learner is head-up or writing, the processing device generates a behavior string according to the character string obtained by the voice recognition device in the expected time interval; analyzing the behavior word strings through a statistical mode, a whole-class head-up rate and/or a whole-class written note proportion to obtain at least one fourth candidate vocabulary; and adding the at least one fourth candidate vocabulary into the analysis program according to the corresponding weight to obtain the tag vocabulary.

4. The knowledge point tag generation system of claim 1, wherein when the processing device performs the analysis procedure on the at least one first keyword, the at least one second keyword, the at least one first candidate vocabulary, and the at least one second candidate vocabulary according to their corresponding weights to obtain a plurality of tag vocabularies, the integrating device distinguishes the knowledge point tags corresponding to different tag vocabularies according to different colors.

5. The knowledge point mark generation system according to claim 1, wherein when the processing device statistically analyzes the text obtained continuously by the capturing device or the text string obtained continuously by the voice recognition device after the end of the class, if it is determined that the frequency of occurrence of any vocabulary exceeds a predetermined value, the vocabulary is excluded as the first candidate vocabulary or the second candidate vocabulary.

6. The knowledge point mark generation method is characterized by comprising the following steps:

providing a knowledge point mark generation system, which comprises a capturing device, a voice recognition device, a processing device and an integration device;

The capturing device continuously captures and analyzes the computer picture image, the projection image and/or the board book image in the class so as to continuously obtain the text;

The capturing device captures at least one first keyword in the text based on the computer picture image, the projection image and/or the font color in the blackboard-writing image and the clicked text;

the voice recognition device continuously receives a voice signal in the class and continuously converts the voice signal into a character string in a voice-to-character mode;

The voice recognition device judges the identity of the sound signal through voiceprint recognition or sound source recognition;

the voice recognition device captures at least one second key word in the word string based on the identity of the sent sound signal and/or a plurality of preset words;

The processing device analyzes the text continuously obtained by the capturing device in a statistical mode after the class is finished so as to obtain at least one first candidate vocabulary;

the processing device analyzes the character strings continuously obtained by the voice recognition device in a statistical mode after the class is finished so as to obtain at least one second candidate vocabulary;

The processing device carries out an analysis procedure on the at least one first key word, the at least one second key word, the at least one first candidate word and the at least one second candidate word according to the corresponding weights so as to obtain a tag word; and

The integration device is used for generating a time zone of each sentence with the tag word in the word string continuously acquired by the voice recognition device after the class is finished, combining the adjacent time zones into a time zone when the time difference between the adjacent time zones is smaller than a specific time length, and setting a plurality of knowledge point marks corresponding to the time zone which is not combined and the time zone on a time axis of the video file shot in the class so as to form the video file with the knowledge point marks.

7. The knowledge point tag generation method of claim 6, wherein the knowledge point tag generation system further comprises at least one user side, each of the user sides comprises a live broadcast module, a tag module, and a transmission module, the knowledge point tag generation method further comprising:

the live broadcast module of each user terminal continuously broadcasts streaming video and audio in the class;

The marking module of each user terminal allows setting the at least one marking information in the process of directly broadcasting the streaming video;

the transmission module of each user side transmits the set time point of the at least one piece of marking information to the processing device;

The processing device generates a marked word string according to the word string acquired by the voice recognition device in a preset time interval before and after the user terminal sets the time point of the at least one marked information after the class is finished;

The processing device analyzes the marked word strings in a statistical mode to obtain at least one third candidate vocabulary; and

The processing device also adds the at least one third candidate vocabulary into the analysis program according to the corresponding weight to obtain the tag vocabulary.

8. The knowledge point mark generation method according to claim 6 or 7, wherein the knowledge point mark generation system further comprises a behavior detection device, the knowledge point mark generation method further comprising:

the behavior detection device continuously receives and analyzes learner classroom images in the classroom so as to acquire behavior identification signals of each learner;

When the behavior detection device obtains any behavior identification signal of the learner as a head-up or writing mark, the processing device generates a behavior string according to the character strings obtained by the voice recognition device in the expected time interval;

The processing device analyzes the behavior word strings through a statistical mode, a full-shift head-up rate and/or a full-shift writing ratio so as to obtain at least one fourth candidate vocabulary; and

The processing device also adds the at least one fourth candidate vocabulary into the analysis program according to the corresponding weight to obtain the tag vocabulary.

9. The method of claim 6, wherein when the processing device performs the analysis procedure on the at least one first keyword, the at least one second keyword, the at least one first candidate vocabulary, and the at least one second candidate vocabulary according to weights corresponding to the at least one first candidate vocabulary to obtain a plurality of tag vocabularies, the integrating device distinguishes the knowledge point tags corresponding to different tag vocabularies according to different colors.

10. The knowledge point tag generation method according to claim 6, wherein when the processing device statistically analyzes the text obtained continuously by the capturing device or the text string obtained continuously by the voice recognition device after the end of the class, if it is determined that the frequency of occurrence of any vocabulary exceeds a predetermined value, the vocabulary is excluded as the first candidate vocabulary or the second candidate vocabulary.