CN112418088B

CN112418088B - Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing

Info

Publication number: CN112418088B
Application number: CN202011319851.4A
Authority: CN
Inventors: 杜旭; 李�浩; 班倩茹; 杨娟
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2022-04-29
Anticipated expiration: 2040-11-23
Also published as: CN112418088A

Abstract

The invention discloses a video learning resource extraction and knowledge annotation method and system based on crowd-sourcing. The method comprises the following steps: s1, acquiring labeling information of a plurality of users on the knowledge points in the video learning resources, wherein the labeling information comprises position labeling information and content labeling information of the knowledge points in the video learning resources; s2, classifying the labeling information according to the position labeling information, constructing a labeling information set, calculating the comprehensive confidence of the labeling information set, if the comprehensive confidence of the labeling information set reaches a preset threshold, extracting video segments from the video learning resources according to the labeling information set, and performing fusion processing on the labeling information of the labeling information set to obtain the fusion labeling information of the video segments. According to the method and the device, the confidence of the annotation information set is judged, so that the influence of random annotation of videos by some users on the annotation result is reduced, and the quality and the reliability of the crowd-sourcing annotation are improved.

Description

Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing

Technical Field

The invention belongs to the technical field of education information, and particularly relates to a video learning resource extraction and knowledge annotation method and system based on crowd-sourcing.

Background

With the development of internet technology, video resources on the internet also show a knowledge growth trend, wherein more and more video resources contain a large amount of knowledge values, and the video clips with the knowledge values can be applied to the education and teaching process, so that the teaching contents can be visually displayed, and the attention of students can be attracted. For the video clips with knowledge value, how to mine implicit knowledge points contained in the video clips and associate the video clips with the knowledge points, so that a learner can quickly and efficiently acquire personalized learning resources is a hot point of current research.

The existing video learning resource extraction method comprises the steps of manual marking of experts and automatic marking of machines, and the manual marking of video segments by the experts in a few different fields needs to consume huge manpower, financial resources and time cost; the method of machine learning can realize automatic labeling, but for the extraction of the video segments with the implicit knowledge points, the machine learning is difficult to realize automatic processing, and the extraction of the video segments with the implicit knowledge points by only relying on the machine learning is difficult.

Disclosure of Invention

Aiming at least one defect or improvement requirement in the prior art, the invention provides a video learning resource extraction and knowledge labeling method and system based on the crowd's intelligence, and the video learning resource extraction and knowledge labeling method based on the crowd's intelligence can improve the quality and reliability of crowd's intelligence labeling.

To achieve the above object, according to a first aspect of the present invention, there is provided a crowd-sourcing-based video learning resource extraction and knowledge annotation method, comprising the steps of:

s1, acquiring labeling information of a plurality of users on the knowledge points in the video learning resources, wherein the labeling information comprises position labeling information and content labeling information of the knowledge points in the video learning resources;

s2, classifying the labeling information according to the position labeling information, constructing a labeling information set, calculating the comprehensive confidence of the labeling information set, if the comprehensive confidence of the labeling information set reaches a preset threshold, extracting video segments from the video learning resources according to the labeling information set, and performing fusion processing on the labeling information of the labeling information set to obtain the fusion labeling information of the video segments.

Preferably, the S2 includes the steps of:

s21, initializing segment segmentation position tolerance and user subject field confidence, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, constructing an annotation information set, and calculating the comprehensive confidence of the annotation information set according to the user subject field confidence and the annotation information;

s22, extracting video segments of the annotation information set with the comprehensive confidence reaching a preset threshold according to the confidence of the user subject field and the position annotation information to obtain video segments corresponding to the annotation information set;

s23, for the annotation information set with the comprehensive confidence reaching the preset threshold, fusing and standardizing a plurality of annotation information of the annotation information set based on the user subject confidence to obtain fused annotation information of the video segment corresponding to the annotation information set, wherein the fused annotation information comprises fused content annotation information and fused position annotation information;

s24, calculating the content labeling similarity of each piece of content labeling information and the fused labeling information, and updating the confidence coefficient of the user subject field corresponding to each piece of labeling information;

and S25, calculating the content labeling difference between each piece of content labeling information and the fused content labeling information, calculating the position difference between each piece of position labeling information and the fused position labeling information, and updating the segment segmentation position tolerance according to the relationship between the labeling difference and the position difference.

Preferably, in step S24, if the similarity between the annotation information and the fused annotation information is large, the confidence of the user subject field is increased, otherwise, the confidence of the user subject field is decreased, and the calculation formula for updating the confidence of the user subject field is:

subjectCredit′_Krepresents the updated user subject field confidence, objectCredit, of the user in the Kth subject_KRepresenting the user subject field confidence of the user in the Kth subject before updating, Sim () representing the content label similarity, Sim₀Represents a preset adjustment threshold, eta represents a preset adjustment step length,

indicating fused annotation information, Mark_iThe ith label information.

Preferably, the step S25 includes the steps of: setting a segment division position tolerance update period according to the previous update period

And adjusting the tolerance of the segmentation position of the segment, if the difference between the mark and the final fusion result does not change along with the position difference, increasing the tolerance of the segmentation position of the segment, otherwise, reducing the tolerance of the segmentation position of the segment, and updating the tolerance of the segmentation position of the segment by the calculation formula:

E_f,kis the k-thThe relationship between the labeling difference and the position difference of the video segments, N is the total number of labeling information of the kth video segment, M is the number of the fused labeling information of the last updating period,

is M, E_f,kCov () represents correlation, Difference () represents content annotation Difference, Distance () represents position Difference, Mark_i,kThe ith annotation information representing the kth video segment,

represents the k-th video segment finally obtained by fusion and convergence, E_f0Adjusting a reference value, delta ', for a preset segment segmentation position tolerance'_PSegmenting the position tolerance value, Delta, for the updated segment_PThe positional tolerance values are segmented for the segment before updating.

Preferably, the S21 includes the steps of:

s211, initializing segment segmentation position tolerance and user subject field confidence;

s212, traversing the annotation information, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, and classifying all annotation information of the position difference between the position annotation information in the segment segmentation position tolerance into a set to obtain an annotation information set;

s213, obtaining the subject field of the labeling information set according to all the labeling information in the labeling information set;

s214, obtaining the user subject field confidence of each marking information corresponding to the user in the subject field to which the marking information set belongs in the marking information set, and calculating the comprehensive confidence of the marking information set, wherein the calculation formula of the comprehensive confidence is as follows:

wherein SetCredit is the comprehensive confidence of the labeling information set, and SubjectCredit is the labeling information set_K,iRepresenting annotated information setsAnd the ith marking information in the summary corresponds to the user subject field confidence of the user in the subject field to which the marking information set belongs, and N is the total amount of the marking information in the marking information set.

Preferably, in S22, the video segment extraction is implemented by a weighted voting method based on user confidence.

Preferably, the content annotation information includes knowledge points, and the S23 includes the steps of:

s231, classifying the annotation information in one annotation information set according to the knowledge points of the annotation information in the annotation information set, and acquiring all annotation information of each knowledge point in the video clip;

s231, fusing and standardizing all the labeling information of each knowledge point in the video clip to obtain the fused labeling information of each knowledge point in the video clip.

Preferably, the step S231 includes the steps of:

acquiring all annotation information of each knowledge point in a video segment, user identification corresponding to each annotation information of each knowledge point and user subject field confidence, and vectorizing each annotation information of each knowledge point in the video segment to obtain vectorized text data;

inputting the vectorized text data into a first long-short term memory artificial neural network to obtain text distributed expression data;

inputting the text distributed expression data into a second long-short term memory artificial neural network, outputting predicted abstract distributed expression data, and adjusting the influence degree of the input value on the output predicted value based on the user subject confidence coefficient by using an attention mechanism;

and converting the abstract distributed expression data into a text form to obtain the fusion marking information of each knowledge point in the video clip.

Preferably, the crowd-sourcing-based video learning resource extraction and knowledge annotation method further includes step S3: and taking the video learning resource before marking as a parent video, acquiring the position offset of the extracted video clip relative to the parent video, generating a video head file according to the parent video and the position offset, and managing the video clip by adopting the parent video and the video head file.

According to a second aspect of the present invention, there is provided a crowd-sourcing based video learning resource extraction and knowledge annotation system, comprising:

the annotation information acquisition module is used for acquiring annotation information of the knowledge points in the video learning resources from a plurality of users, wherein the annotation information comprises position annotation information and content annotation information of the knowledge points in the video learning resources;

and the marking module is used for classifying the marking information according to the position marking information, constructing a marking information set, calculating the comprehensive confidence coefficient of the marking information set, if the comprehensive confidence coefficient of the marking information set reaches a preset threshold value, extracting a video segment from the video learning resource according to the marking information set, and fusing the marking information of the marking information set to obtain the fused marking information of the video segment.

In general, compared with the prior art, the invention has the following beneficial effects:

(1) according to the video learning resource extraction and knowledge annotation method and system based on the crowd-sourcing, user annotation information is traversed, classification is carried out according to position annotation information, an annotation information set is constructed, the comprehensive confidence coefficient of the annotation information set is further calculated, and if the confidence coefficient reaches a threshold value, standardization processing is carried out. And (4) integrating the labeling information of all the positions to extract the video segments and fusing the labeling information based on the confidence of the user. The method judges the confidence of the annotation information set based on the user confidence, reduces the influence of random annotation videos of some users on the annotation result, and improves the quality and the confidence of the crowd-sourcing annotation.

(2) According to the video learning resource extraction and knowledge annotation method and system based on the crowd-sourcing intelligence, after annotation information is fused, the similarity degree of user annotation information and an annotation result is calculated, and the confidence of a user in the subject field is dynamically calculated; and calculating the relation between the marking information and the position of the video segment, and dynamically determining the position tolerance of the marking information of the video segment. The method can dynamically determine the user confidence and the position tolerance of the labeling information, and improve the accuracy and the reliability of the labeling data.

(3) According to the video learning resource extraction and knowledge labeling method and system based on the crowd-sourcing, the sub-videos divided by the user are managed by adopting a virtual division strategy mechanism based on the data block, the file header and the knowledge information separation storage, and the method can dynamically combine the data block and the file header as required to extract the virtual videos. The method adopts a virtual partition mode, improves the utilization rate of the storage space, improves the application processing capacity and reduces the video playing delay.

The invention extracts the video learning resources by exerting the intelligent power of the public and comprehensively considering the confidence of the user, is particularly suitable for automatically extracting the video segments with the implicit knowledge points in the video, further provides data for various large educational video resource libraries and helps educators and learners to obtain more high-quality, multi-dimensional and multi-granularity educational resources.

Drawings

FIG. 1 is a schematic diagram illustrating a video learning resource extraction and knowledge annotation method based on crowd-sourcing in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart of video segment extraction and annotation information fusion provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating the principle of annotation information fusion based on user confidence according to an embodiment of the present invention;

fig. 4 is a schematic resource management diagram of a video clip according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Fig. 1 is a schematic diagram illustrating a video learning resource extraction and knowledge annotation method based on crowd-sourcing according to an embodiment of the present invention, where the method includes steps S1 to S3.

In S1, annotation information of the knowledge points in the video learning resource by the multiple users is obtained, and the annotation information includes position annotation information and content annotation information of the knowledge points in the video learning resource.

When a plurality of users watch videos, a certain knowledge point or a segment which is helpful for learning the knowledge point in the videos is marked according to self understanding, preferably, the marking information comprises the starting point and the ending point of the segment, a title, the knowledge point, description information and the like, if the users watch the videos, the principle of pinhole imaging is stated in the position 5:50-8:20, namely, the marking point can be dragged to the corresponding position, and then the title is filled: "principle of pinhole imaging", knowledge points: "junior middle school-second grade-physics-pinhole imaging, straight-line propagation of light", description information: "the wall will form the reflection of the object when the wall is covered by a board with small holes.

In S2, as shown in fig. 2, the annotation information is classified according to the position annotation information, an annotation information set is constructed, a comprehensive confidence of the annotation information set is calculated, and if the comprehensive confidence of the annotation information set reaches a preset threshold, a video segment is extracted from the video learning resource according to the annotation information set, and the annotation information of the annotation information set is fused, so as to obtain fused annotation information of the video segment.

and S22, extracting video segments of the annotation information set with the comprehensive confidence reaching the preset threshold according to the confidence of the user subject field and the position annotation information, and obtaining the video segments corresponding to the annotation information set.

And (3) extracting the segments of the marked segment head and tail positions by adopting a weighted voting method based on user confidence, dividing the head positions and the tail positions in the marked information set into a head position group and a tail position group respectively, calculating the weighted votes for the marked points by using the user confidence weights, and taking the marked points with the highest weighted votes as the nodes in the set, namely the head and tail positions of the video segments marked by the set.

s24, calculating the content labeling similarity of each piece of content labeling information and the fused labeling information, and updating the user subject field confidence level objectCredit corresponding to each piece of labeling information_K. And if the similarity between the user label and the final fusion result is high, increasing the user confidence coefficient, otherwise, reducing the user confidence coefficient. The specific algorithm is as follows:

indicating fused annotation information, Mark_iThe ith label information.

When S21 video clips are regulated into a set, the clip division position range is combined with fusion according to the user marking informationAnd dynamically adjusting the difference of the effects to ensure that the same video clip set has similar contents as much as possible. Calculating the difference between each label in the set and the final fusion result, and investigating the relation E between the difference and the head and tail positions of the segments_fAnd updating the tolerance value. If the position tolerance Δ_PAdjusting the period to be in accordance with_PLast adjustment period

Mean value, adjusted by_P. If the difference degree between the labeling result and the final fusion result does not change along with the position difference, the position tolerance delta is increased_POtherwise, the position tolerance Δ is reduced_P. The specific algorithm is as follows:

E_f,kthe relationship between the annotation difference and the position difference of the kth video segment, N is the total number of annotation information of the kth video segment, M is the number of the fused annotation information of the last update period,

represents the k-th video segment finally obtained by fusion and convergence, E_f0Adjusting a reference value for preset segment segmentation position tolerance, and obtaining delta 'according to actual statistics'_PSegmenting the position tolerance value, Delta, for the updated segment_PThe positional tolerance values are segmented for the segment before updating.

S21 includes the steps of:

(1) initializing segment segmentation position tolerance Δ_PAnd user subject field confidence SubjectCreditK, K being the subject field number.

(2) And traversing the user annotation information periodically, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, and for the segments with head-tail position differences within the tolerance range, indicating that the two video segments are very similar, namely the two video segments can be regarded as the same segment, and the annotated annotation information can also be regarded as the annotation of the same segment, namely the annotation information can be classified into an annotation information set.

(3) And classifying the labeled information set according to the labeled information in subject field, and determining the subject to which the labeled information belongs, wherein the labeled information is preprocessed, keywords and weight of the labeled information are calculated by adopting a TF-IDF algorithm, and then the subject field to which the labeled information belongs is obtained by adopting an SVM algorithm. Other algorithm models based on machine learning and deep learning can be integrated for subject field classification.

(4) And according to the subject field to which the labeling information belongs, obtaining the confidence of the user in the subject field, and further calculating the comprehensive confidence of the labeling information of the set. The comprehensive confidence calculation model of the labeling information set is as follows:

wherein SetCredit is the comprehensive confidence of the labeling information set, and SubjectCredit is the labeling information set_K,iAnd representing the user subject field confidence of the ith marking information in the marking information set corresponding to the subject field to which the marking information set belongs, wherein N is the total number of the marking information in the marking information set.

In S23, as shown in fig. 3, for a set in which the overall confidence of the annotation information reaches a threshold, the human annotation information is fused and normalized based on the user subject confidence.

(1) Classifying the labeling information in one labeling information set according to the knowledge points of the labeling information in the labeling information set to obtain all the labeling information of each knowledge point in the video clip;

(2) and fusing and standardizing all the labeling information of each knowledge point in the video segment to obtain the fused labeling information of each knowledge point in the video segment. The method specifically comprises the following steps:

all annotation information of each knowledge point in the video segment, user identification corresponding to each annotation information of each knowledge point and user subject field confidence are obtained, vectorization processing is carried out on each annotation information of each knowledge point in the video segment, and vectorization text data are obtained. Specifically, a triple is formed for the annotation information, { user ID, user subject confidence, [ title, knowledge point, description ] }, and the annotation information under the same knowledge point is formed into a corpus and vectorized;

transmitting the vectorized text data into an LSTM model to obtain distributed expression data of the text;

adjusting the influence degree of each input value on a predicted value by using an Attention mechanism based on the confidence coefficient of the user subject;

and transmitting the distributed expression data of the text into the LSTM model, and predicting and outputting the distributed expression of the abstract.

And converting the distributed expression of the abstract into a text form, and obtaining the fusion marking information of each knowledge point in the video segment, wherein the fusion marking information comprises data such as a title, the knowledge point, description information and the like.

In S3, as shown in fig. 4, the video clip is organized and managed by using a virtual partitioning policy mechanism based on separate storage of data blocks, file headers and knowledge information.

It should be noted that step S3 is not an essential step, and step S3 is a preferred implementation of resource management.

(1) After determining the head and the tail of the video clip, the method finds that the child video clip is partial data of the father video clip, so that the method does not generate a real video clip for the child video clip to be stored in a database, but generates a video head file according to the ID of the father video and the offset of the clip relative to the father video according to the coding specification of a related video file, and stores the video head file into a video clip head file database; and after the video annotation information is fused, storing the ID and the annotation information of the fused video clip head file into a video resource database. And constructing a virtual file which accords with the operating system specification based on the parent video data block, the segment header file library and the resource list, supporting the transparent access of the video segments on the file operating layer, and supporting the transparent further labeling, segmentation or merging extraction of the video segments. The method adopts a virtual segmentation mode to greatly reduce the consumption of the generated multiple sub-videos on the storage space.

(2) When a user needs to browse, download or divide a virtual video segment, only the corresponding database and the corresponding file header need to be dynamically combined according to needs. The actual data block position is found layer by layer recursion according to the relative offset from the parent video.

(3) When a certain virtual video clip is accessed frequently or is divided into a plurality of sub-clips, in order to improve the application processing capacity and reduce the video playing delay, a header file of the virtual video is spliced with video data to form an independent video, and relevant item information of a video clip database is updated.

The embodiment of the invention also provides a video learning resource extraction and knowledge annotation system based on crowd-sourcing, which comprises the following steps:

Preferably, the crowd-sourcing-based video learning resource extraction and knowledge annotation system further comprises a storage module, wherein the storage module is configured to use the video learning resource before being marked as a parent video, obtain a position offset of an extracted video segment relative to the parent video, generate a video header file according to the parent video and the position offset, and store and manage the video segment in a manner of the parent video and the video header file.

The implementation principle and technical effect of the system are the same as those of the method, and are not described herein again.

It must be noted that in any of the above embodiments, the methods are not necessarily executed in order of sequence number, and as long as it cannot be assumed from the execution logic that they are necessarily executed in a certain order, it means that they can be executed in any other possible order.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A video learning resource extraction and knowledge annotation method based on crowd-sourcing is characterized by comprising the following steps:

s2, classifying the annotation information according to the position annotation information, constructing an annotation information set, calculating the comprehensive confidence of the annotation information set, if the comprehensive confidence of the annotation information set reaches a preset threshold, extracting video segments from the video learning resources according to the annotation information set, and performing fusion processing on the annotation information of the annotation information set to obtain the fusion annotation information of the video segments;

the S2 includes the steps of:

s25, calculating the content labeling difference between each piece of content labeling information and the fused content labeling information, calculating the position difference between each piece of position labeling information and the fused position labeling information, and updating the segment segmentation position tolerance according to the relationship between the labeling difference and the position difference;

in step S24, if the similarity between the annotation information and the fused annotation information is large, the confidence of the user subject field is increased, otherwise, the confidence of the user subject field is decreased, and the calculation formula for updating the confidence of the user subject field is:

indicating fused annotation information, Mark_iMarking information for the ith;

the step S25 includes the steps of: setting a segment division position tolerance update period according to the previous update period

E_f，kthe relation between the labeling difference and the position difference of the kth video segment is shown, N is the total number of the labeling information of the kth video segment, and M is the number of the fused labeling information of the last updating period，

Is M, E_f，kCov () represents correlation, Difference () represents content annotation Difference, Distance () represents position Difference, Mark_i，kThe ith annotation information representing the kth video segment,

2. The crowd-sourcing-based video learning resource extraction and knowledge annotation process of claim 1, wherein said S21 comprises the steps of:

wherein SetCredit is the comprehensive confidence of the labeling information set, and SubjectCredit is the labeling information set_K，iAnd representing the user subject field confidence of the ith marking information in the marking information set corresponding to the subject field to which the marking information set belongs, wherein N is the total number of the marking information in the marking information set.

3. The crowd-sourcing-based video learning resource extraction and knowledge annotation method of claim 1, wherein in S22, the video segment extraction is implemented by a weighted voting method based on user confidence.

4. The crowd-sourcing-based video learning resource extraction and knowledge annotation process of claim 1, wherein the content annotation information comprises knowledge points, said S23 comprising the steps of:

s232, fusing and standardizing all the labeling information of each knowledge point in the video segment to obtain the fused labeling information of each knowledge point in the video segment.

5. The crowd-sourcing-based video learning resource extraction and knowledge annotation process of claim 4, wherein said step S232 comprises the steps of:

6. The crowd-sourcing-based video learning resource extraction and knowledge annotation process of claim 1, further comprising step S3: the video learning resource before marking is used as a parent video, the position offset of the extracted video clip relative to the parent video is obtained, a video head file is generated according to the parent video and the position offset, and the video clip is stored and managed in a parent video and video head file mode.

7. A crowd-sourcing-based video learning resource extraction and knowledge annotation system is characterized by comprising:

the annotation module is used for classifying the annotation information according to the position annotation information, constructing an annotation information set, calculating the comprehensive confidence of the annotation information set, if the comprehensive confidence of the annotation information set reaches a preset threshold, extracting a video segment from the video learning resource according to the annotation information set, and performing fusion processing on the annotation information of the annotation information set to obtain the fusion annotation information of the video segment;

the specific implementation of the labeling module comprises the following steps:

indicating fused annotation information, Mark_iMarking information for the ith;

E_f，kis the relation between the labeling difference and the position difference of the kth video segment, N is the total number of the labeling information of the kth video segment, M is the number of the fusion labeling information of the last updating period,