CN112241462A

CN112241462A - Knowledge point mark generation system and method

Info

Publication number: CN112241462A
Application number: CN201910646422.9A
Authority: CN
Inventors: 郑旭成
Original assignee: Wisdom Garden Hong Kong Ltd
Current assignee: Wisdom Garden Hong Kong Ltd
Priority date: 2019-07-17
Filing date: 2019-07-17
Publication date: 2021-01-19
Anticipated expiration: 2039-07-17
Also published as: CN112241462B

Abstract

A system and a method for generating knowledge point marks are provided, wherein a video file with knowledge point marks is formed by analyzing at least one first key word emphasized in a text mode in a classroom, at least one second key word emphasized in a sound mode in the classroom, at least one first candidate word repeatedly appearing in the text mode in the classroom and at least one second candidate word repeatedly appearing in the sound mode in the classroom according to corresponding weights to obtain label words, and setting the knowledge point marks on a time axis of a video file shot in the classroom corresponding to time sections in which the label words appear. Therefore, the learner can know the knowledge points and the existing segments of the classroom without browsing the audio-video files of the whole classroom, and the learner can conveniently learn key points or review the classroom.

Description

Knowledge point mark generation system and method

Technical Field

The invention relates to a mark generation system and a method thereof, in particular to a knowledge point mark generation system and a method thereof.

Background

With the progress of science and technology and the development of networks, learners can learn or review through audio-video files recorded during teaching after the classroom is finished.

Currently, when a learner wants to learn, the learner can only search through the title of the video file, however, the information provided by the title is limited, and the learner may need to completely browse the whole movie to know whether the learner meets the learning requirement, which is time-consuming.

In addition, when the learner wants to review, the learner usually does not know the playing time of the important points (i.e. knowledge points) of the classroom to be reviewed in the video file, so that the learner must continuously drag the playing progress pointer or fast-forwarding film on the playing time axis, and the learner is obviously inconvenient by searching the clips to be watched.

In summary, it can be seen that there exists a problem in the prior art that whether the learning requirement is met or not can be known only by browsing the audio/video files of the whole classroom and that the learner is inconvenient because the learner needs to drag the playing progress pointer or search the segment where the knowledge point is located in the fast-forward movie mode, so an improved technical means is needed to solve the problem.

Disclosure of Invention

The invention discloses a knowledge point mark generation system and a method thereof.

First, the present invention discloses a knowledge point mark generation system, which includes: the device comprises a capturing device, a voice recognition device, a processing device and an integration device. The capturing device is used for continuously capturing and analyzing the computer picture image, the projection image and/or the blackboard writing image in a classroom to continuously obtain the text, and capturing at least one first key word in the text based on the font types and/or the font colors of the computer picture image, the projection image and/or the blackboard writing image and the clicked text. The voice recognition device is used for continuously receiving the voice signal in a classroom, continuously converting the voice signal into character strings in a voice-to-character mode, judging the identity of the voice signal in a voiceprint recognition or sound source recognition mode, and capturing at least one second key word in the character strings based on the identity of the voice signal and/or a plurality of preset words. The processing device is used for analyzing the texts continuously acquired by the acquisition device in a statistical manner after the classroom is finished so as to acquire at least one first candidate vocabulary; after the classroom is over, analyzing the character strings continuously obtained by the voice recognition device in a statistical mode to obtain at least one second candidate vocabulary; and analyzing the at least one first key word, the at least one second key word, the at least one first candidate word and the at least one second candidate word according to the corresponding weights to obtain the tagged words. The integration device is used for generating time sections of each sentence with a label vocabulary in character strings continuously obtained by the voice recognition device after the classroom is finished, merging the adjacent time sections into time intervals when the time difference between the adjacent time sections is less than a specific time length, and then setting a plurality of knowledge point marks corresponding to the time sections which are not merged and the time intervals on the time axis of the audio-video file shot in the classroom to form the audio-video file with the knowledge point marks.

In addition, the invention discloses a knowledge point mark generation method, which comprises the following steps: providing a knowledge point mark generation system which comprises a capturing device, a voice recognition device, a processing device and an integration device; the acquisition device continuously acquires and analyzes the computer picture image, the projection image and/or the blackboard writing image in a classroom to continuously acquire the text; the capturing device captures at least one first key word in the text based on the font and/or font color in the computer picture image, the projection image and/or the blackboard-writing image and the clicked character; the voice recognition device continuously receives the voice signal in the classroom and continuously converts the voice signal into a character string in a voice-to-character mode; the voice identification device judges the identity of the sound signal through a voiceprint identification or sound source identification mode; the voice recognition device captures at least one second key word in the character string based on the identity of the sound signal and/or a plurality of preset words; after the classroom is finished, the processing device analyzes the texts continuously obtained by the acquisition device in a statistical mode to obtain at least one first candidate vocabulary; after the classroom is finished, the processing device analyzes the character strings continuously obtained by the voice recognition device in a statistical mode to obtain at least one second candidate vocabulary; the processing device analyzes the program according to the corresponding weight of at least one first key word, at least one second key word, at least one first candidate word and at least one second candidate word to obtain a label word; and the integration device generates time sections of each sentence with a label vocabulary in the character string continuously obtained by the voice recognition device after the classroom is finished, and when the time difference between the adjacent time sections is less than a specific time length, the adjacent time sections are merged into a time section, and then a plurality of knowledge point marks corresponding to the time sections which are not merged and the time section are arranged on the time axis of the audio-video file shot in the classroom to form the audio-video file with the knowledge point marks.

The system and the method disclosed by the invention are different from the prior art in that the system and the method obtain at least one label word by analyzing at least one first key word emphasized in a text mode in a classroom, at least one second key word emphasized in a sound mode in the classroom, at least one first candidate word repeatedly appearing in the classroom in a text mode and at least one second candidate word repeatedly appearing in the classroom in a sound mode according to the corresponding weight, and a knowledge point mark is arranged on a time axis of a video file shot in the classroom corresponding to the time section and the time interval where the label word appears to form the video file with the knowledge point mark.

By the technical means, the learner can know the knowledge points and the existing segments of the classroom without browsing the audio-video files of the whole classroom, and the learner can conveniently learn key points or review.

Drawings

FIG. 1 is a system block diagram of a knowledge point mark generating system according to an embodiment of the present invention.

Fig. 2A and fig. 2B are flowcharts of an embodiment of the knowledge point mark generating system of fig. 1 executing a method for generating a knowledge point mark.

[ List of reference numerals ]

50 live module

60 marking module

70 transmission module

100 knowledge point mark generation system

110 pick-up device

112 photographic module

114 analysis module

120 voice recognition device

122 microphone module

124 conversion module

126 voiceprint identification module

130 processing device

140 integrating device

150 user terminal

160 behavior detection device

162 photographic module

164 analysis module

Step 210 provides a knowledge point marker generation system comprising: capturing device, voice recognition device, processing device and integration device

Step 220 the capturing device continuously captures and analyzes the computer screen image, the projection image and/or the blackboard-writing image in the classroom to continuously obtain the text

Step 230, the capturing device captures at least a first keyword in the text based on the font and/or font color in the computer image, the projected image and/or the blackboard-writing image and the clicked character

Step 240, the voice recognition device continuously receives the voice signal in the classroom and continuously converts the voice signal into a character string by a voice-to-character manner

Step 250, the voice recognition device judges the identity of the voice signal through voiceprint recognition or sound source recognition

Step 260 the speech recognition device retrieves at least a second keyword in the keyword string based on the identity of the uttered voice signal and/or the predetermined vocabulary

Step 270, the processing device analyzes the continuously obtained text by the capturing device in a statistical manner after the classroom is over to obtain at least one first candidate vocabulary

Step 280, the processing device analyzes the character string continuously obtained by the speech recognition device in a statistical manner after the classroom is over to obtain at least one second candidate vocabulary

Step 290, the processing device analyzes at least one first key word, at least one second key word, at least one first candidate word and at least one second candidate word according to the corresponding weights to obtain the labeled words

Step 300, the integration device generates time segments of each sentence with a label vocabulary in the character string continuously obtained by the voice recognition device after the classroom is over, and when the time difference between the adjacent time segments is less than a specific time length, the adjacent time segments are merged into a time interval, and then a plurality of knowledge point marks corresponding to the time segments and the time intervals which are not merged are arranged on the time axis of the audio-video file shot in the classroom to form the audio-video file with the knowledge point marks

Detailed Description

The following detailed description will be provided to explain embodiments of the present invention in conjunction with the accompanying drawings and examples, so that the implementation process of solving the technical problems and achieving the technical effects through applying the technical means can be fully understood and implemented.

Before describing the system and method for generating knowledge point marks disclosed by the invention, the terms of the invention are described, and the knowledge points refer to basic units for information transmission in courses, so that the knowledge points and the learning navigation of the courses can be known to have important functions. The invention can analyze the behavior and the event which occur on the classroom according to the corresponding weight to obtain the knowledge points of the classroom, so that the learner can learn the knowledge points of the classroom and the existing segments thereof without browsing the audio-video file of the whole classroom when learning or reviewing the audio-video file recorded during teaching. In addition, the capturing device, the voice recognition device and the behavior detection device can be synchronously started to operate when each classroom begins and synchronously stopped to operate after each classroom is finished.

Referring to fig. 1, fig. 2A and fig. 2B, fig. 1 is a system block diagram of an embodiment of a knowledge point mark generating system according to the present invention, and fig. 2A and fig. 2B are a flow chart of an embodiment of a method for executing a knowledge point mark generating method by the knowledge point mark generating system of fig. 1. In the present embodiment, the knowledge point marker generating system 100 includes: the retrieving device 110, the voice recognition device 120, the processing device 130 and the integrating device 140 (step 210). The capturing device 110 is connected to the processing device 130, the speech recognition device 120 is connected to the processing device 130, and the processing device 130 is connected to the integration device 140.

The capturing device 110, the speech recognizing device 120, the processing device 130 and the integrating device 140 can be implemented by various methods, including software, hardware, firmware or any combination thereof. The techniques presented in the embodiments may be stored on machine-readable storage media using software or firmware, such as: read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, etc., and can be implemented by one or more general-purpose or special-purpose programmable microprocessors. The capturing device 110 and the processing device 130, the speech recognition device 120 and the processing device 130, and the processing device 130 and the integration device 140 may be connected to each other in a wireless or wired manner for transmitting signals and data.

The capturing device 110 continuously captures and analyzes the computer screen image, the projection image and/or the blackboard-writing image in the classroom to continuously obtain the text (step 220). More specifically, the capturing device 110 may include a photographing module 112 and an analyzing module 114, wherein the photographing module 112 is connected to the analyzing module 114. The photographing module 112 may be configured to continuously photograph the images on the platform in each classroom, and the content of the images on the platform may include: the projection image and/or the classroom blackboard or whiteboard are used to capture the projection image and/or the blackboard-writing image, but the embodiment is not limited to the invention and can be adjusted according to the actual requirement. For example, when a computer course is taken, the photographing module 112 can be used to continuously photograph a computer screen operated by a teacher, so as to capture the computer screen image. It should be noted that the content continuously shot by the shooting module 112 includes auxiliary teaching materials with characters provided by the teacher for teaching, such as: lectures, slides, writing on blackboards or whiteboards, and the like. The analysis module 114 can continuously receive and analyze the computer screen image, the projection image and/or the blackboard-writing image captured by the photographing module 112 to obtain characters in each computer screen image, each projection image and/or each blackboard-writing image so as to generate a corresponding text (text). The parsing module 114 extracts the characters of each computer screen image, each projection image and/or each blackboard-writing image by an Optical Character Recognition (OCR) technique to form a text (i.e. image-to-character).

The capturing device 110 captures at least one first keyword in the text based on the font type and/or font color in the computer screen image, the projected image and/or the blackboard-writing image and the clicked text (step 230). In more detail, since the text in the auxiliary teaching material provided by the instructor in the classroom can have different fonts and/or font colors, to enhance the transmission of some knowledge points (i.e., emphasis), so that the learner can understand the knowledge points (i.e., emphasis) to be transmitted by the teacher through the text with special font type and/or font color, therefore, the capturing device 110 can capture at least one first keyword (i.e. possible knowledge point) in the text based on the font style and/or font color in the computer screen image, the projected image and/or the blackboard writing image, the font may include, but is not limited to, font size, font weight, font type, whether the font is italic, whether the font has a bottom line, and whether the font has a text effect, and each first key word is a word composed of adjacent text with a specific font and/or font color. In addition, in the embodiment, the words in the computer screen image, the projection image and/or the blackboard writing image clicked (including pointed or selected by hand, laser pen or computer cursor) by the teacher during the classroom teaching can be taken as the knowledge points (i.e. key points) that the teacher wants to enhance the transmission, so the capturing device 110 can also capture at least one first key word (i.e. possible knowledge points) in the text based on the clicked words in the computer screen image, the projection image and/or the blackboard writing image, wherein each first key word is a word composed of the clicked words. It should be noted that, the weights of each first key vocabulary captured by different methods (e.g., special font color, and clicked text) may be the same or different, and may be adjusted according to actual requirements.

The speech recognition device 120 continuously receives the audio signal in the classroom and continuously converts the audio signal into a text string in a speech-to-text manner (step 240). In more detail, the speech recognition device 120 may include a microphone module 122 and a conversion module 124, the microphone module 122 may be configured to continuously receive the sounds (i.e., the sound signals) emitted by the instructor and the learner in the classroom, and the conversion module 124 may convert the sound signals continuously received by the microphone module 122 into the text strings through a speech-to-text manner. The microphone module 122 may include a plurality of microphone units (not shown) configured at various locations of the classroom for completely receiving sounds (i.e., sound signals) made by the instructor and the learner in the classroom, and the number and the configuration positions of the microphone units may be adjusted according to actual requirements.

The speech recognition device 120 determines the identity of the voice signal through voiceprint recognition or sound source recognition (step 250). In more detail, the voice recognition device 120 may further include a voiceprint recognition module 126 for recognizing the voice signal received by the microphone module 122 as being uttered by the learner or the teacher, so as to determine the text string converted by the conversion module 124 to be the spoken word of the learner or the teacher. In addition, in the embodiment, since the position of the teacher is usually near the platform (i.e. the position in front of the classroom), and the position of the learner is usually at the middle or rear of the classroom compared to the position of the teacher, the position of the sound source can be determined by the microphone module 122 to determine the identity of the sound signal. More specifically, since the microphone module 122 may include a plurality of microphone units disposed at various locations of the classroom, the microphone module 122 may determine the position of the sound signal according to the time difference between the sound signals received by the microphone units and the relative positions of the microphone units, and then determine that the sound signal is emitted by a learner or learner according to the position of the sound signal, thereby determining that the text string converted by the conversion module 124 is the speech spoken by the learner or learner.

The speech recognition device 120 retrieves at least a second keyword in the word string based on the identity of the uttered sound signal and/or a plurality of predetermined words (step 260). More specifically, since the probability that the word string corresponding to the audio signal and/or the word string corresponding to the predetermined vocabulary (e.g., special, key, focus, back, examination point, etc.) may include the knowledge point of the classroom is high, the speech recognition device 120 can extract at least one second key word (i.e., a possible knowledge point) from the word string corresponding to the audio signal and/or the word string corresponding to the predetermined vocabulary (e.g., special, key, focus, back, examination point, etc.) generated by the instructor. The second key words can be extracted by semantic analysis, but the present embodiment is not intended to limit the present invention. In addition, in another embodiment, the text string corresponding to the voice signal with larger volume emitted by the teacher during the teaching process can also be used as one of the parameters for retrieving the second keyword.

It should be noted that the weights of each second keyword extracted from the word string corresponding to the audio signal generated by the instructor and/or the word string corresponding to the predetermined vocabulary (e.g., special, key, focus, must-be-carried, examination point, etc.) may be the same or different, and may be adjusted according to actual requirements.

The processing device 130 statistically analyzes the continuously obtained texts from the capturing device 110 after the classroom is over to obtain at least one first candidate vocabulary (step 270). In more detail, the processing device 130 first performs statistics on the words in the texts acquired by the capturing device 110, and then defines the first words with higher occurrence frequency as the first candidate words (i.e. possible knowledge points). It should be noted that, when the occurrence frequency of any vocabulary appears too high, the vocabulary may be the main axis of the classroom and is not suitable for being the labeled vocabulary in the following steps, so when the processing device 130 analyzes the text continuously obtained by the capturing device 110 through a statistical manner after the classroom is over, if the occurrence frequency of any vocabulary is determined to exceed a preset value, the vocabulary is excluded to become a first candidate vocabulary, wherein the size of the preset value can be adjusted according to actual requirements.

After the classroom is over, the processing device 130 statistically analyzes the word strings continuously obtained by the speech recognition device 120 to obtain at least one second candidate vocabulary (step 280). More specifically, the processing device 130 first counts the words in the character strings obtained by the speech recognition device 120, and then defines the first words with higher occurrence frequency as the second candidate words (i.e., possible knowledge points). It should be noted that, when the occurrence frequency of any vocabulary appears too high, the vocabulary may be the main axis of the classroom and is not suitable for being the labeled vocabulary in the following steps, so when the processing device 130 analyzes the character strings continuously obtained by the speech recognition device 120 through a statistical manner after the classroom is over, if the occurrence frequency of any vocabulary is determined to exceed a preset value, the vocabulary is excluded to become a second candidate vocabulary, wherein the size of the preset value can be adjusted according to actual requirements.

The processing device 130 performs an analysis procedure on the at least one first key word, the at least one second key word, the at least one first candidate word and the at least one second candidate word according to the corresponding weights thereof to obtain the tagged words (step 290). In more detail, since the probabilities of whether the first key word, the second key word, the first candidate word and the second candidate word become the knowledge points are different, the weights corresponding to the first key word, the second key word, the first candidate word and the second candidate word in the analysis procedure for determining the knowledge points of the classroom are different and can be adjusted according to actual requirements. The analysis program determines the knowledge points (i.e. the tagged words) of the classroom through the first key words, the second key words, the first candidate words and the second candidate words and the corresponding weights thereof, and the number of the knowledge points (i.e. the tagged words) can be adjusted according to the actual requirements.

When the number of the knowledge points (i.e., the tagged words) is one, the integration device 140 generates a time segment of each sentence having the tagged words in the character string continuously obtained by the speech recognition device 120 after the end of the classroom, and merges the adjacent time segments into a time interval when the time difference between the adjacent time segments is smaller than a specific time length, and then sets a plurality of knowledge point markers corresponding to the time segments and the time intervals which are not merged on the time axis of the audio-video file captured in the classroom to form the audio-video file having the knowledge point markers (step 300). In more detail, the system 100 for generating knowledge point markers may further include a camera device (not shown) for capturing an audio-video file to be placed on a platform or a website for learners to study or review and for capturing a streaming audio-video file required for live broadcasting the classroom (i.e., live broadcasting and storing the streaming audio-video file of the classroom simultaneously to generate the audio-video file of the classroom after the classroom is over), wherein the camera device, the capturing device 110, and the voice recognition device 120 may be synchronously started and operated at the beginning of each classroom and synchronously stopped after the classroom is over. Therefore, the integration apparatus 140 can search the word strings obtained by the speech recognition apparatus 120 for the time segment where each sentence with the knowledge point (i.e. the tagged word) appears, and merge the adjacent time segments into a time interval when the time difference (i.e. the time interval) between the adjacent time segments is smaller than the specific time length, wherein the size of the specific time length can be adjusted according to the actual requirement. Then, the integration device 140 sets a plurality of knowledge point markers on the time axis of the video file captured by the image capturing device in the classroom according to the non-merged time segment and time interval after the classroom is finished, so as to form the video file with the knowledge point markers.

When the number of the tagged words is multiple, the integration device 140 can find the un-merged time segment and the time interval corresponding to each tagged word according to the above process, and then distinguish the knowledge point markers corresponding to different tagged words according to different colors, so that the learner can conveniently distinguish the knowledge point markers corresponding to different tagged words. For example, when the label words are Fourier transform and Laplace transform, the knowledge point labels corresponding to Fourier transform may be, but not limited to, yellow and the knowledge point labels corresponding to Fourier transform may be, but not limited to, green.

In this embodiment, in addition to determining the tagged vocabulary of the classroom according to the first key vocabulary, the second key vocabulary, the first candidate vocabulary and the second candidate vocabulary and the corresponding weights, the behavior of each learner in the classroom, for example: the details of looking up at the blackboard, writing notes with the head down, etc. are described below, with one of the parameters that determine the tagged vocabulary for the class being added. In this embodiment, the knowledge point mark generating system 100 may further include a behavior detecting device 160, and the knowledge point mark generating method may further include: the behavior detection device 160 continuously receives and analyzes the classroom image to obtain the behavior recognition signal of each learner; when the behavior detection device 160 obtains the behavior recognition signal of any learner as a new line or a written note, the processing device 130 generates a behavior string according to the character string obtained by the speech recognition device 120 in a previous expected time interval and a next expected time interval; the processing device 130 analyzes the behavior strings through a statistical method, a full shift heading rate and/or a full shift note-writing ratio to obtain at least one fourth candidate vocabulary; the processing device 130 further adds the at least one fourth candidate word to the analysis procedure according to the corresponding weight to obtain the tagged word.

In more detail, the behavior detection device 160 may include a camera module 162 and an analysis module 164, wherein the camera module 162 is connected to the analysis module 164. The camera module 162 can be used to continuously capture images of the location of each learner in the classroom (i.e., images of each learner in the classroom, i.e., images of the learner classroom) in each classroom, and the behavior recognition signal (i.e., the dynamic behavior of each learner) can be obtained by analyzing the images continuously captured by the camera module 162. Since the learner may have an emphasis (i.e., a knowledge point) on the content taught by the teacher in the current period when looking up the projected image, the blackboard and/or the whiteboard or the note-writing, when the behavior detection device 160 obtains the behavior recognition signal of any learner as the projected image, the blackboard and/or the whiteboard or the note-writing, the processing device 130 may generate a string according to the string obtained by the speech recognition device 120 in a desired time interval before and after the occurrence time point of the projected image, the blackboard and/or the whiteboard or the note-writing, wherein the size of the desired time interval may be adjusted according to the actual requirement. The processing device 130 may first perform statistics on the words in the behavior strings generated by the processing device, and then define the first words with higher occurrence frequency as the fourth candidate words (i.e., possible knowledge points).

In addition, when the number of learners who look up the projected image, the blackboard and/or the whiteboard or write down the notes at the same time is larger, the word string obtained by the speech recognition device 120 before and after the time point is more likely to be the knowledge point of the classroom, so the processing device 130 needs to add the whole shift rate and/or the whole note ratio to the reference factor in the process of obtaining the fourth candidate vocabulary, and further obtains at least one fourth candidate vocabulary. Then, the processing device 130 may further add the at least one fourth candidate word to the analysis procedure according to the corresponding weight thereof to obtain the tagged word, wherein the weight corresponding to the at least one fourth candidate word may be adjusted according to the actual requirement.

Furthermore, in the embodiment, in addition to determining the tagged vocabulary of the classroom according to the first key vocabulary, the second key vocabulary, the first candidate vocabulary, the second candidate vocabulary, the fourth candidate vocabulary and the corresponding weights, the behavior of each learner who learns by live broadcasting, such as: at least one marking message is set in the process of live streaming video and audio, and one of the parameters for determining the label vocabulary of the classroom is added, and the detailed description is shown as follows. In this embodiment, the system 100 for generating knowledge point marks may further include at least one client 150, wherein each learner can learn via the client 150 owned by each learner through live broadcasting.

Each user terminal 150 includes a live broadcast module 50, a marking module 60, and a transmission module 70, and the method for generating knowledge point marks may further include: the live broadcasting module 50 of each user 150 continuously broadcasts the streaming video in the classroom; the marking module 60 of each client 150 allows setting at least one marking message in the process of live streaming video; the transmission module 70 of each ue 150 transmits the set time point of the at least one flag message to the processing device 130; the processing device 130 generates a marked word string according to the word string obtained by the speech recognition device 120 within a predetermined time interval before and after the time point at which the at least one marking information is set by each user terminal 150 after the classroom is finished; the processing device 130 statistically analyzes the tagged word strings to obtain at least a third candidate word; the processing device 130 further adds the at least one third candidate vocabulary to the analysis procedure according to the corresponding weight to obtain the tagged vocabulary. The number of the clients 150 can be adjusted according to actual requirements. To avoid the complexity of the diagram of fig. 1, only two ues 150 are drawn, and the number of the actual ues 150 can be adjusted according to actual requirements.

In other words, each learner can set the mark information (similar to the above-mentioned concept of writing notes with a lower head) for the part of the teacher who teaches in the current time period at any time when learning through the live broadcast (i.e. during the live streaming video and audio process) through the client 150 owned by each learner. Since the learner sets the mark information, which may be an important point (i.e., a knowledge point) for the content of the teacher being taught by the current period of time, when any learner sets the mark information through the user terminal 150 owned by the learner, the processing device 130 may generate a mark string according to the character string acquired by the speech recognition device 120 within a predetermined time interval before and after the occurrence time point of the mark information set by the learner, wherein the size of the predetermined time interval may be adjusted according to actual needs. The processing device 130 may first perform statistics on the words in the obtained token strings, and then define the first words with higher occurrence frequency as the third candidate words (i.e. possible knowledge points). Then, the processing device 130 may further add the at least one third candidate vocabulary to the analysis procedure according to the corresponding weight thereof to obtain the tagged vocabulary, wherein the weight corresponding to the at least one third candidate vocabulary may be adjusted according to the actual requirement.

It should be particularly noted that the knowledge point mark generating method of the present embodiment may perform the above steps in any order except for illustrating the cause and effect relationship thereof.

In summary, it can be seen that the difference between the present invention and the prior art is that at least one first key word emphasized in a text manner in a classroom, at least one second key word emphasized in a classroom in a sound manner, at least one first candidate word repeatedly appeared in a classroom in a text manner, and at least one second candidate word repeatedly appeared in a classroom in a sound manner are analyzed according to their corresponding weights to obtain tagged words, and a knowledge point mark is set on a time axis of an audio-visual file photographed in the classroom corresponding to a time section and a time interval where the tagged words appear to form an audio-visual file with knowledge point marks, by which the problems of the prior art can be solved, so that a learner can know knowledge points and existing segments of the audio-visual file in the classroom without browsing the audio-visual file in the whole classroom, the learner can conveniently learn or review the key points.

Although the present invention has been described with reference to the foregoing embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention.

Claims

1. A knowledge point marker generation system, comprising:

the acquisition device is used for continuously acquiring and analyzing the computer picture image, the projection image and/or the blackboard-writing image in a classroom to continuously acquire a text, and acquiring at least one first key word in the text based on the font and/or the font color in the computer picture image, the projection image and/or the blackboard-writing image and the clicked character;

the voice identification device is used for continuously receiving the voice signal in the classroom, continuously converting the voice signal into a character string in a voice-to-character mode, judging the identity of the voice signal in a voiceprint identification or sound source identification mode, and capturing at least one second keyword in the character string based on the identity of the voice signal and/or a plurality of preset vocabularies;

the processing device is used for analyzing the text continuously obtained by the acquisition device in a statistical mode after the classroom is finished so as to obtain at least one first candidate vocabulary; after the classroom is finished, analyzing the character word strings continuously obtained by the voice recognition device in a statistical mode to obtain at least one second candidate word; analyzing the at least one first key word, the at least one second key word, the at least one first candidate word and the at least one second candidate word according to the corresponding weights to obtain a label word; and

an integration device, configured to merge adjacent time segments into a time interval when a time difference between adjacent time segments is smaller than a specific time length according to a time segment of each sentence having the tag vocabulary appearing in the character string continuously obtained by the voice recognition device after the classroom is over, and then set a plurality of knowledge point markers corresponding to the time segments and the time interval that are not merged on a time axis of the audio-visual file captured in the classroom to form the audio-visual file having the knowledge point markers.

2. The knowledge point marker generating system according to claim 1, further comprising:

at least one user terminal, each of the user terminals comprising:

the live broadcasting module is used for continuously live broadcasting streaming video in the classroom;

the marking module is used for allowing the at least one piece of marking information to be set in the process of live broadcasting the streaming video; and

a transmission module, configured to transmit the set time point of the at least one piece of tag information to the processing device;

after the classroom is finished, the processing device generates a marked word string according to the word string obtained by the voice recognition device within a preset time interval before and after the time point of the at least one piece of marked information set by each user side; analyzing the marked word strings in a statistical mode to obtain at least one third candidate word; and adding the at least one third candidate vocabulary into the analysis program according to the corresponding weight to obtain the tagged vocabulary.

3. The knowledge point marker generating system according to claim 1 or 2, further comprising:

the behavior detection device is used for continuously receiving and analyzing the classroom images of learners in the classroom to obtain a behavior identification signal of each learner;

when the behavior detection device acquires that the behavior identification signal of any learner is head-up or writing, the processing device generates a behavior string according to the character string acquired by the voice identification device in a previous expected time interval and a next expected time interval; analyzing the behavior word strings through a statistical mode, a full shift head raising rate and/or a full shift note writing proportion to obtain at least one fourth candidate word; and adding the at least one fourth candidate word into the analysis program according to the corresponding weight of the fourth candidate word to obtain the tag word.

4. The system of claim 1, wherein the integrating device distinguishes the plurality of point of knowledge tags corresponding to different tagged words according to different colors when the processing device performs the analysis procedure on the at least one first keyword, the at least one second keyword, the at least one first candidate word, and the at least one second candidate word according to their corresponding weights to obtain a plurality of tagged words.

5. The system of claim 1, wherein when the processing device statistically analyzes the text continuously obtained by the capturing device or the word strings continuously obtained by the speech recognition device after the classroom is over, if the occurrence frequency of any word exceeds a predetermined value, the word is excluded as the first candidate word or the second candidate word.

6. A knowledge point mark generation method is characterized by comprising the following steps:

providing a knowledge point mark generation system which comprises a capturing device, a voice recognition device, a processing device and an integration device;

the capturing device continuously captures and analyzes the computer picture image, the projection image and/or the blackboard writing image in a classroom to continuously obtain the text;

the capturing device captures at least one first key word in the text based on the font and/or font color in the computer picture image, the projection image and/or the blackboard writing image and the clicked character;

the voice recognition device continuously receives the voice signals in the classroom and continuously converts the voice signals into character strings in a voice-to-character mode;

the voice recognition device judges the identity of the voice signal through a voiceprint recognition or sound source recognition mode;

the voice recognition device captures at least one second key word in the character string based on the identity of the voice signal and/or a plurality of preset words;

after the classroom is finished, the processing device analyzes the text continuously obtained by the acquisition device in a statistical mode to obtain at least one first candidate vocabulary;

after the classroom is finished, the processing device analyzes the character string continuously obtained by the voice recognition device in a statistical mode to obtain at least one second candidate vocabulary;

the processing device performs an analysis procedure on the at least one first key word, the at least one second key word, the at least one first candidate word and the at least one second candidate word according to the corresponding weights of the at least one first key word, the at least one second key word and the at least one second candidate word to obtain a tagged word; and

the integration device is used for merging adjacent time sections into a time interval according to the time difference between the adjacent time sections when the time section of each sentence with the label vocabulary appears in the character string continuously acquired by the voice recognition device after the classroom is finished, and then setting a plurality of knowledge point marks corresponding to the time sections which are not merged and the time interval on a time axis of the audio-video file shot in the classroom so as to form the audio-video file with the knowledge point marks.

7. The method of claim 6, wherein the system further comprises at least one client, each client comprises a live broadcast module, a mark module and a transmission module, and the method further comprises:

the live broadcast module of each user side continuously broadcasts streaming video in the classroom;

the marking module of each user side allows the setting of the at least one marking information in the process of live broadcasting the streaming video;

the transmission module of each user side transmits the set time point of the at least one piece of marking information to the processing device;

the processing device generates a marked word string according to the word string acquired by the voice recognition device within a preset time interval before and after the time point of the at least one piece of marked information set by each user side after the classroom is finished;

the processing device analyzes the marked word strings in a statistical mode to obtain at least one third candidate word; and

the processing device further adds the at least one third candidate vocabulary to the analysis program according to the corresponding weight thereof to obtain the tagged vocabulary.

8. The knowledge point mark generation method according to claim 6 or 7, wherein the knowledge point mark generation system further includes a behavior detection device, and the knowledge point mark generation method further includes:

the behavior detection device continuously receives and analyzes the classroom images of learners in the classroom to obtain a behavior identification signal of each learner;

when the behavior detection device acquires that the behavior identification signal of any learner is head-up or writing, the processing device generates a behavior string according to the character string acquired by the voice identification device in a previous expected time interval and a next expected time interval;

the processing device analyzes the behavior word strings through a statistical mode, a full shift head lifting rate and/or a full shift note writing proportion to obtain at least one fourth candidate word; and

the processing device further adds the at least one fourth candidate word to the analysis program according to the corresponding weight of the fourth candidate word to obtain the tagged word.

9. The method as claimed in claim 6, wherein when the processing device performs the analysis procedure on the at least one first key word, the at least one second key word, the at least one first candidate word and the at least one second candidate word according to their corresponding weights to obtain a plurality of labeled words, the integrating device distinguishes the plurality of the knowledge point labels corresponding to different labeled words according to different colors.

10. The method as claimed in claim 6, wherein when the processing device statistically analyzes the text continuously obtained by the capturing device or the word strings continuously obtained by the speech recognition device after the classroom is over, if the occurrence frequency of any word exceeds a predetermined value, the word is excluded as the first candidate word or the second candidate word.