CN112418088B - Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing - Google Patents
Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing Download PDFInfo
- Publication number
- CN112418088B CN112418088B CN202011319851.4A CN202011319851A CN112418088B CN 112418088 B CN112418088 B CN 112418088B CN 202011319851 A CN202011319851 A CN 202011319851A CN 112418088 B CN112418088 B CN 112418088B
- Authority
- CN
- China
- Prior art keywords
- annotation information
- annotation
- video
- information
- confidence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000000605 extraction Methods 0.000 title claims abstract description 30
- 238000012358 sourcing Methods 0.000 title claims abstract description 23
- 238000002372 labelling Methods 0.000 claims abstract description 117
- 230000004927 fusion Effects 0.000 claims abstract description 22
- 238000007499 fusion processing Methods 0.000 claims abstract description 4
- 230000011218 segmentation Effects 0.000 claims description 38
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000007246 mechanism Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 4
- 230000003247 decreasing effect Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 238000003384 imaging method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Multimedia (AREA)
- Tourism & Hospitality (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Educational Administration (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Educational Technology (AREA)
- Medical Informatics (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a video learning resource extraction and knowledge annotation method and system based on crowd-sourcing. The method comprises the following steps: s1, acquiring labeling information of a plurality of users on the knowledge points in the video learning resources, wherein the labeling information comprises position labeling information and content labeling information of the knowledge points in the video learning resources; s2, classifying the labeling information according to the position labeling information, constructing a labeling information set, calculating the comprehensive confidence of the labeling information set, if the comprehensive confidence of the labeling information set reaches a preset threshold, extracting video segments from the video learning resources according to the labeling information set, and performing fusion processing on the labeling information of the labeling information set to obtain the fusion labeling information of the video segments. According to the method and the device, the confidence of the annotation information set is judged, so that the influence of random annotation of videos by some users on the annotation result is reduced, and the quality and the reliability of the crowd-sourcing annotation are improved.
Description
Technical Field
The invention belongs to the technical field of education information, and particularly relates to a video learning resource extraction and knowledge annotation method and system based on crowd-sourcing.
Background
With the development of internet technology, video resources on the internet also show a knowledge growth trend, wherein more and more video resources contain a large amount of knowledge values, and the video clips with the knowledge values can be applied to the education and teaching process, so that the teaching contents can be visually displayed, and the attention of students can be attracted. For the video clips with knowledge value, how to mine implicit knowledge points contained in the video clips and associate the video clips with the knowledge points, so that a learner can quickly and efficiently acquire personalized learning resources is a hot point of current research.
The existing video learning resource extraction method comprises the steps of manual marking of experts and automatic marking of machines, and the manual marking of video segments by the experts in a few different fields needs to consume huge manpower, financial resources and time cost; the method of machine learning can realize automatic labeling, but for the extraction of the video segments with the implicit knowledge points, the machine learning is difficult to realize automatic processing, and the extraction of the video segments with the implicit knowledge points by only relying on the machine learning is difficult.
Disclosure of Invention
Aiming at least one defect or improvement requirement in the prior art, the invention provides a video learning resource extraction and knowledge labeling method and system based on the crowd's intelligence, and the video learning resource extraction and knowledge labeling method based on the crowd's intelligence can improve the quality and reliability of crowd's intelligence labeling.
To achieve the above object, according to a first aspect of the present invention, there is provided a crowd-sourcing-based video learning resource extraction and knowledge annotation method, comprising the steps of:
s1, acquiring labeling information of a plurality of users on the knowledge points in the video learning resources, wherein the labeling information comprises position labeling information and content labeling information of the knowledge points in the video learning resources;
s2, classifying the labeling information according to the position labeling information, constructing a labeling information set, calculating the comprehensive confidence of the labeling information set, if the comprehensive confidence of the labeling information set reaches a preset threshold, extracting video segments from the video learning resources according to the labeling information set, and performing fusion processing on the labeling information of the labeling information set to obtain the fusion labeling information of the video segments.
Preferably, the S2 includes the steps of:
s21, initializing segment segmentation position tolerance and user subject field confidence, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, constructing an annotation information set, and calculating the comprehensive confidence of the annotation information set according to the user subject field confidence and the annotation information;
s22, extracting video segments of the annotation information set with the comprehensive confidence reaching a preset threshold according to the confidence of the user subject field and the position annotation information to obtain video segments corresponding to the annotation information set;
s23, for the annotation information set with the comprehensive confidence reaching the preset threshold, fusing and standardizing a plurality of annotation information of the annotation information set based on the user subject confidence to obtain fused annotation information of the video segment corresponding to the annotation information set, wherein the fused annotation information comprises fused content annotation information and fused position annotation information;
s24, calculating the content labeling similarity of each piece of content labeling information and the fused labeling information, and updating the confidence coefficient of the user subject field corresponding to each piece of labeling information;
and S25, calculating the content labeling difference between each piece of content labeling information and the fused content labeling information, calculating the position difference between each piece of position labeling information and the fused position labeling information, and updating the segment segmentation position tolerance according to the relationship between the labeling difference and the position difference.
Preferably, in step S24, if the similarity between the annotation information and the fused annotation information is large, the confidence of the user subject field is increased, otherwise, the confidence of the user subject field is decreased, and the calculation formula for updating the confidence of the user subject field is:
subjectCredit′Krepresents the updated user subject field confidence, objectCredit, of the user in the Kth subjectKRepresenting the user subject field confidence of the user in the Kth subject before updating, Sim () representing the content label similarity, Sim0Represents a preset adjustment threshold, eta represents a preset adjustment step length,indicating fused annotation information, MarkiThe ith label information.
Preferably, the step S25 includes the steps of: setting a segment division position tolerance update period according to the previous update periodAnd adjusting the tolerance of the segmentation position of the segment, if the difference between the mark and the final fusion result does not change along with the position difference, increasing the tolerance of the segmentation position of the segment, otherwise, reducing the tolerance of the segmentation position of the segment, and updating the tolerance of the segmentation position of the segment by the calculation formula:
Ef,kis the k-thThe relationship between the labeling difference and the position difference of the video segments, N is the total number of labeling information of the kth video segment, M is the number of the fused labeling information of the last updating period,is M, Ef,kCov () represents correlation, Difference () represents content annotation Difference, Distance () represents position Difference, Marki,kThe ith annotation information representing the kth video segment,represents the k-th video segment finally obtained by fusion and convergence, Ef0Adjusting a reference value, delta ', for a preset segment segmentation position tolerance'PSegmenting the position tolerance value, Delta, for the updated segmentPThe positional tolerance values are segmented for the segment before updating.
Preferably, the S21 includes the steps of:
s211, initializing segment segmentation position tolerance and user subject field confidence;
s212, traversing the annotation information, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, and classifying all annotation information of the position difference between the position annotation information in the segment segmentation position tolerance into a set to obtain an annotation information set;
s213, obtaining the subject field of the labeling information set according to all the labeling information in the labeling information set;
s214, obtaining the user subject field confidence of each marking information corresponding to the user in the subject field to which the marking information set belongs in the marking information set, and calculating the comprehensive confidence of the marking information set, wherein the calculation formula of the comprehensive confidence is as follows:
wherein SetCredit is the comprehensive confidence of the labeling information set, and SubjectCredit is the labeling information setK,iRepresenting annotated information setsAnd the ith marking information in the summary corresponds to the user subject field confidence of the user in the subject field to which the marking information set belongs, and N is the total amount of the marking information in the marking information set.
Preferably, in S22, the video segment extraction is implemented by a weighted voting method based on user confidence.
Preferably, the content annotation information includes knowledge points, and the S23 includes the steps of:
s231, classifying the annotation information in one annotation information set according to the knowledge points of the annotation information in the annotation information set, and acquiring all annotation information of each knowledge point in the video clip;
s231, fusing and standardizing all the labeling information of each knowledge point in the video clip to obtain the fused labeling information of each knowledge point in the video clip.
Preferably, the step S231 includes the steps of:
acquiring all annotation information of each knowledge point in a video segment, user identification corresponding to each annotation information of each knowledge point and user subject field confidence, and vectorizing each annotation information of each knowledge point in the video segment to obtain vectorized text data;
inputting the vectorized text data into a first long-short term memory artificial neural network to obtain text distributed expression data;
inputting the text distributed expression data into a second long-short term memory artificial neural network, outputting predicted abstract distributed expression data, and adjusting the influence degree of the input value on the output predicted value based on the user subject confidence coefficient by using an attention mechanism;
and converting the abstract distributed expression data into a text form to obtain the fusion marking information of each knowledge point in the video clip.
Preferably, the crowd-sourcing-based video learning resource extraction and knowledge annotation method further includes step S3: and taking the video learning resource before marking as a parent video, acquiring the position offset of the extracted video clip relative to the parent video, generating a video head file according to the parent video and the position offset, and managing the video clip by adopting the parent video and the video head file.
According to a second aspect of the present invention, there is provided a crowd-sourcing based video learning resource extraction and knowledge annotation system, comprising:
the annotation information acquisition module is used for acquiring annotation information of the knowledge points in the video learning resources from a plurality of users, wherein the annotation information comprises position annotation information and content annotation information of the knowledge points in the video learning resources;
and the marking module is used for classifying the marking information according to the position marking information, constructing a marking information set, calculating the comprehensive confidence coefficient of the marking information set, if the comprehensive confidence coefficient of the marking information set reaches a preset threshold value, extracting a video segment from the video learning resource according to the marking information set, and fusing the marking information of the marking information set to obtain the fused marking information of the video segment.
In general, compared with the prior art, the invention has the following beneficial effects:
(1) according to the video learning resource extraction and knowledge annotation method and system based on the crowd-sourcing, user annotation information is traversed, classification is carried out according to position annotation information, an annotation information set is constructed, the comprehensive confidence coefficient of the annotation information set is further calculated, and if the confidence coefficient reaches a threshold value, standardization processing is carried out. And (4) integrating the labeling information of all the positions to extract the video segments and fusing the labeling information based on the confidence of the user. The method judges the confidence of the annotation information set based on the user confidence, reduces the influence of random annotation videos of some users on the annotation result, and improves the quality and the confidence of the crowd-sourcing annotation.
(2) According to the video learning resource extraction and knowledge annotation method and system based on the crowd-sourcing intelligence, after annotation information is fused, the similarity degree of user annotation information and an annotation result is calculated, and the confidence of a user in the subject field is dynamically calculated; and calculating the relation between the marking information and the position of the video segment, and dynamically determining the position tolerance of the marking information of the video segment. The method can dynamically determine the user confidence and the position tolerance of the labeling information, and improve the accuracy and the reliability of the labeling data.
(3) According to the video learning resource extraction and knowledge labeling method and system based on the crowd-sourcing, the sub-videos divided by the user are managed by adopting a virtual division strategy mechanism based on the data block, the file header and the knowledge information separation storage, and the method can dynamically combine the data block and the file header as required to extract the virtual videos. The method adopts a virtual partition mode, improves the utilization rate of the storage space, improves the application processing capacity and reduces the video playing delay.
The invention extracts the video learning resources by exerting the intelligent power of the public and comprehensively considering the confidence of the user, is particularly suitable for automatically extracting the video segments with the implicit knowledge points in the video, further provides data for various large educational video resource libraries and helps educators and learners to obtain more high-quality, multi-dimensional and multi-granularity educational resources.
Drawings
FIG. 1 is a schematic diagram illustrating a video learning resource extraction and knowledge annotation method based on crowd-sourcing in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart of video segment extraction and annotation information fusion provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating the principle of annotation information fusion based on user confidence according to an embodiment of the present invention;
fig. 4 is a schematic resource management diagram of a video clip according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Fig. 1 is a schematic diagram illustrating a video learning resource extraction and knowledge annotation method based on crowd-sourcing according to an embodiment of the present invention, where the method includes steps S1 to S3.
In S1, annotation information of the knowledge points in the video learning resource by the multiple users is obtained, and the annotation information includes position annotation information and content annotation information of the knowledge points in the video learning resource.
When a plurality of users watch videos, a certain knowledge point or a segment which is helpful for learning the knowledge point in the videos is marked according to self understanding, preferably, the marking information comprises the starting point and the ending point of the segment, a title, the knowledge point, description information and the like, if the users watch the videos, the principle of pinhole imaging is stated in the position 5:50-8:20, namely, the marking point can be dragged to the corresponding position, and then the title is filled: "principle of pinhole imaging", knowledge points: "junior middle school-second grade-physics-pinhole imaging, straight-line propagation of light", description information: "the wall will form the reflection of the object when the wall is covered by a board with small holes.
In S2, as shown in fig. 2, the annotation information is classified according to the position annotation information, an annotation information set is constructed, a comprehensive confidence of the annotation information set is calculated, and if the comprehensive confidence of the annotation information set reaches a preset threshold, a video segment is extracted from the video learning resource according to the annotation information set, and the annotation information of the annotation information set is fused, so as to obtain fused annotation information of the video segment.
S21, initializing segment segmentation position tolerance and user subject field confidence, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, constructing an annotation information set, and calculating the comprehensive confidence of the annotation information set according to the user subject field confidence and the annotation information;
and S22, extracting video segments of the annotation information set with the comprehensive confidence reaching the preset threshold according to the confidence of the user subject field and the position annotation information, and obtaining the video segments corresponding to the annotation information set.
And (3) extracting the segments of the marked segment head and tail positions by adopting a weighted voting method based on user confidence, dividing the head positions and the tail positions in the marked information set into a head position group and a tail position group respectively, calculating the weighted votes for the marked points by using the user confidence weights, and taking the marked points with the highest weighted votes as the nodes in the set, namely the head and tail positions of the video segments marked by the set.
S23, for the annotation information set with the comprehensive confidence reaching the preset threshold, fusing and standardizing a plurality of annotation information of the annotation information set based on the user subject confidence to obtain fused annotation information of the video segment corresponding to the annotation information set, wherein the fused annotation information comprises fused content annotation information and fused position annotation information;
s24, calculating the content labeling similarity of each piece of content labeling information and the fused labeling information, and updating the user subject field confidence level objectCredit corresponding to each piece of labeling informationK. And if the similarity between the user label and the final fusion result is high, increasing the user confidence coefficient, otherwise, reducing the user confidence coefficient. The specific algorithm is as follows:
subjectCredit′Krepresents the updated user subject field confidence, objectCredit, of the user in the Kth subjectKRepresenting the user subject field confidence of the user in the Kth subject before updating, Sim () representing the content label similarity, Sim0Represents a preset adjustment threshold, eta represents a preset adjustment step length,indicating fused annotation information, MarkiThe ith label information.
And S25, calculating the content labeling difference between each piece of content labeling information and the fused content labeling information, calculating the position difference between each piece of position labeling information and the fused position labeling information, and updating the segment segmentation position tolerance according to the relationship between the labeling difference and the position difference.
When S21 video clips are regulated into a set, the clip division position range is combined with fusion according to the user marking informationAnd dynamically adjusting the difference of the effects to ensure that the same video clip set has similar contents as much as possible. Calculating the difference between each label in the set and the final fusion result, and investigating the relation E between the difference and the head and tail positions of the segmentsfAnd updating the tolerance value. If the position tolerance ΔPAdjusting the period to be in accordance withPLast adjustment periodMean value, adjusted byP. If the difference degree between the labeling result and the final fusion result does not change along with the position difference, the position tolerance delta is increasedPOtherwise, the position tolerance Δ is reducedP. The specific algorithm is as follows:
Ef,kthe relationship between the annotation difference and the position difference of the kth video segment, N is the total number of annotation information of the kth video segment, M is the number of the fused annotation information of the last update period,is M, Ef,kCov () represents correlation, Difference () represents content annotation Difference, Distance () represents position Difference, Marki,kThe ith annotation information representing the kth video segment,represents the k-th video segment finally obtained by fusion and convergence, Ef0Adjusting a reference value for preset segment segmentation position tolerance, and obtaining delta 'according to actual statistics'PSegmenting the position tolerance value, Delta, for the updated segmentPThe positional tolerance values are segmented for the segment before updating.
S21 includes the steps of:
(1) initializing segment segmentation position tolerance ΔPAnd user subject field confidence SubjectCreditK, K being the subject field number.
(2) And traversing the user annotation information periodically, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, and for the segments with head-tail position differences within the tolerance range, indicating that the two video segments are very similar, namely the two video segments can be regarded as the same segment, and the annotated annotation information can also be regarded as the annotation of the same segment, namely the annotation information can be classified into an annotation information set.
(3) And classifying the labeled information set according to the labeled information in subject field, and determining the subject to which the labeled information belongs, wherein the labeled information is preprocessed, keywords and weight of the labeled information are calculated by adopting a TF-IDF algorithm, and then the subject field to which the labeled information belongs is obtained by adopting an SVM algorithm. Other algorithm models based on machine learning and deep learning can be integrated for subject field classification.
(4) And according to the subject field to which the labeling information belongs, obtaining the confidence of the user in the subject field, and further calculating the comprehensive confidence of the labeling information of the set. The comprehensive confidence calculation model of the labeling information set is as follows:
wherein SetCredit is the comprehensive confidence of the labeling information set, and SubjectCredit is the labeling information setK,iAnd representing the user subject field confidence of the ith marking information in the marking information set corresponding to the subject field to which the marking information set belongs, wherein N is the total number of the marking information in the marking information set.
In S23, as shown in fig. 3, for a set in which the overall confidence of the annotation information reaches a threshold, the human annotation information is fused and normalized based on the user subject confidence.
(1) Classifying the labeling information in one labeling information set according to the knowledge points of the labeling information in the labeling information set to obtain all the labeling information of each knowledge point in the video clip;
(2) and fusing and standardizing all the labeling information of each knowledge point in the video segment to obtain the fused labeling information of each knowledge point in the video segment. The method specifically comprises the following steps:
all annotation information of each knowledge point in the video segment, user identification corresponding to each annotation information of each knowledge point and user subject field confidence are obtained, vectorization processing is carried out on each annotation information of each knowledge point in the video segment, and vectorization text data are obtained. Specifically, a triple is formed for the annotation information, { user ID, user subject confidence, [ title, knowledge point, description ] }, and the annotation information under the same knowledge point is formed into a corpus and vectorized;
transmitting the vectorized text data into an LSTM model to obtain distributed expression data of the text;
adjusting the influence degree of each input value on a predicted value by using an Attention mechanism based on the confidence coefficient of the user subject;
and transmitting the distributed expression data of the text into the LSTM model, and predicting and outputting the distributed expression of the abstract.
And converting the distributed expression of the abstract into a text form, and obtaining the fusion marking information of each knowledge point in the video segment, wherein the fusion marking information comprises data such as a title, the knowledge point, description information and the like.
In S3, as shown in fig. 4, the video clip is organized and managed by using a virtual partitioning policy mechanism based on separate storage of data blocks, file headers and knowledge information.
It should be noted that step S3 is not an essential step, and step S3 is a preferred implementation of resource management.
(1) After determining the head and the tail of the video clip, the method finds that the child video clip is partial data of the father video clip, so that the method does not generate a real video clip for the child video clip to be stored in a database, but generates a video head file according to the ID of the father video and the offset of the clip relative to the father video according to the coding specification of a related video file, and stores the video head file into a video clip head file database; and after the video annotation information is fused, storing the ID and the annotation information of the fused video clip head file into a video resource database. And constructing a virtual file which accords with the operating system specification based on the parent video data block, the segment header file library and the resource list, supporting the transparent access of the video segments on the file operating layer, and supporting the transparent further labeling, segmentation or merging extraction of the video segments. The method adopts a virtual segmentation mode to greatly reduce the consumption of the generated multiple sub-videos on the storage space.
(2) When a user needs to browse, download or divide a virtual video segment, only the corresponding database and the corresponding file header need to be dynamically combined according to needs. The actual data block position is found layer by layer recursion according to the relative offset from the parent video.
(3) When a certain virtual video clip is accessed frequently or is divided into a plurality of sub-clips, in order to improve the application processing capacity and reduce the video playing delay, a header file of the virtual video is spliced with video data to form an independent video, and relevant item information of a video clip database is updated.
The embodiment of the invention also provides a video learning resource extraction and knowledge annotation system based on crowd-sourcing, which comprises the following steps:
the annotation information acquisition module is used for acquiring annotation information of the knowledge points in the video learning resources from a plurality of users, wherein the annotation information comprises position annotation information and content annotation information of the knowledge points in the video learning resources;
and the marking module is used for classifying the marking information according to the position marking information, constructing a marking information set, calculating the comprehensive confidence coefficient of the marking information set, if the comprehensive confidence coefficient of the marking information set reaches a preset threshold value, extracting a video segment from the video learning resource according to the marking information set, and fusing the marking information of the marking information set to obtain the fused marking information of the video segment.
Preferably, the crowd-sourcing-based video learning resource extraction and knowledge annotation system further comprises a storage module, wherein the storage module is configured to use the video learning resource before being marked as a parent video, obtain a position offset of an extracted video segment relative to the parent video, generate a video header file according to the parent video and the position offset, and store and manage the video segment in a manner of the parent video and the video header file.
The implementation principle and technical effect of the system are the same as those of the method, and are not described herein again.
It must be noted that in any of the above embodiments, the methods are not necessarily executed in order of sequence number, and as long as it cannot be assumed from the execution logic that they are necessarily executed in a certain order, it means that they can be executed in any other possible order.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (7)
1. A video learning resource extraction and knowledge annotation method based on crowd-sourcing is characterized by comprising the following steps:
s1, acquiring labeling information of a plurality of users on the knowledge points in the video learning resources, wherein the labeling information comprises position labeling information and content labeling information of the knowledge points in the video learning resources;
s2, classifying the annotation information according to the position annotation information, constructing an annotation information set, calculating the comprehensive confidence of the annotation information set, if the comprehensive confidence of the annotation information set reaches a preset threshold, extracting video segments from the video learning resources according to the annotation information set, and performing fusion processing on the annotation information of the annotation information set to obtain the fusion annotation information of the video segments;
the S2 includes the steps of:
s21, initializing segment segmentation position tolerance and user subject field confidence, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, constructing an annotation information set, and calculating the comprehensive confidence of the annotation information set according to the user subject field confidence and the annotation information;
s22, extracting video segments of the annotation information set with the comprehensive confidence reaching a preset threshold according to the confidence of the user subject field and the position annotation information to obtain video segments corresponding to the annotation information set;
s23, for the annotation information set with the comprehensive confidence reaching the preset threshold, fusing and standardizing a plurality of annotation information of the annotation information set based on the user subject confidence to obtain fused annotation information of the video segment corresponding to the annotation information set, wherein the fused annotation information comprises fused content annotation information and fused position annotation information;
s24, calculating the content labeling similarity of each piece of content labeling information and the fused labeling information, and updating the confidence coefficient of the user subject field corresponding to each piece of labeling information;
s25, calculating the content labeling difference between each piece of content labeling information and the fused content labeling information, calculating the position difference between each piece of position labeling information and the fused position labeling information, and updating the segment segmentation position tolerance according to the relationship between the labeling difference and the position difference;
in step S24, if the similarity between the annotation information and the fused annotation information is large, the confidence of the user subject field is increased, otherwise, the confidence of the user subject field is decreased, and the calculation formula for updating the confidence of the user subject field is:
subjectCredit′Krepresents the updated user subject field confidence, objectCredit, of the user in the Kth subjectKRepresenting the user subject field confidence of the user in the Kth subject before updating, Sim () representing the content label similarity, Sim0Represents a preset adjustment threshold, eta represents a preset adjustment step length,indicating fused annotation information, MarkiMarking information for the ith;
the step S25 includes the steps of: setting a segment division position tolerance update period according to the previous update periodAnd adjusting the tolerance of the segmentation position of the segment, if the difference between the mark and the final fusion result does not change along with the position difference, increasing the tolerance of the segmentation position of the segment, otherwise, reducing the tolerance of the segmentation position of the segment, and updating the tolerance of the segmentation position of the segment by the calculation formula:
Ef,kthe relation between the labeling difference and the position difference of the kth video segment is shown, N is the total number of the labeling information of the kth video segment, and M is the number of the fused labeling information of the last updating period,Is M, Ef,kCov () represents correlation, Difference () represents content annotation Difference, Distance () represents position Difference, Marki,kThe ith annotation information representing the kth video segment,represents the k-th video segment finally obtained by fusion and convergence, Ef0Adjusting a reference value, delta ', for a preset segment segmentation position tolerance'PSegmenting the position tolerance value, Delta, for the updated segmentPThe positional tolerance values are segmented for the segment before updating.
2. The crowd-sourcing-based video learning resource extraction and knowledge annotation process of claim 1, wherein said S21 comprises the steps of:
s211, initializing segment segmentation position tolerance and user subject field confidence;
s212, traversing the annotation information, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, and classifying all annotation information of the position difference between the position annotation information in the segment segmentation position tolerance into a set to obtain an annotation information set;
s213, obtaining the subject field of the labeling information set according to all the labeling information in the labeling information set;
s214, obtaining the user subject field confidence of each marking information corresponding to the user in the subject field to which the marking information set belongs in the marking information set, and calculating the comprehensive confidence of the marking information set, wherein the calculation formula of the comprehensive confidence is as follows:
wherein SetCredit is the comprehensive confidence of the labeling information set, and SubjectCredit is the labeling information setK,iAnd representing the user subject field confidence of the ith marking information in the marking information set corresponding to the subject field to which the marking information set belongs, wherein N is the total number of the marking information in the marking information set.
3. The crowd-sourcing-based video learning resource extraction and knowledge annotation method of claim 1, wherein in S22, the video segment extraction is implemented by a weighted voting method based on user confidence.
4. The crowd-sourcing-based video learning resource extraction and knowledge annotation process of claim 1, wherein the content annotation information comprises knowledge points, said S23 comprising the steps of:
s231, classifying the annotation information in one annotation information set according to the knowledge points of the annotation information in the annotation information set, and acquiring all annotation information of each knowledge point in the video clip;
s232, fusing and standardizing all the labeling information of each knowledge point in the video segment to obtain the fused labeling information of each knowledge point in the video segment.
5. The crowd-sourcing-based video learning resource extraction and knowledge annotation process of claim 4, wherein said step S232 comprises the steps of:
acquiring all annotation information of each knowledge point in a video segment, user identification corresponding to each annotation information of each knowledge point and user subject field confidence, and vectorizing each annotation information of each knowledge point in the video segment to obtain vectorized text data;
inputting the vectorized text data into a first long-short term memory artificial neural network to obtain text distributed expression data;
inputting the text distributed expression data into a second long-short term memory artificial neural network, outputting predicted abstract distributed expression data, and adjusting the influence degree of the input value on the output predicted value based on the user subject confidence coefficient by using an attention mechanism;
and converting the abstract distributed expression data into a text form to obtain the fusion marking information of each knowledge point in the video clip.
6. The crowd-sourcing-based video learning resource extraction and knowledge annotation process of claim 1, further comprising step S3: the video learning resource before marking is used as a parent video, the position offset of the extracted video clip relative to the parent video is obtained, a video head file is generated according to the parent video and the position offset, and the video clip is stored and managed in a parent video and video head file mode.
7. A crowd-sourcing-based video learning resource extraction and knowledge annotation system is characterized by comprising:
the annotation information acquisition module is used for acquiring annotation information of the knowledge points in the video learning resources from a plurality of users, wherein the annotation information comprises position annotation information and content annotation information of the knowledge points in the video learning resources;
the annotation module is used for classifying the annotation information according to the position annotation information, constructing an annotation information set, calculating the comprehensive confidence of the annotation information set, if the comprehensive confidence of the annotation information set reaches a preset threshold, extracting a video segment from the video learning resource according to the annotation information set, and performing fusion processing on the annotation information of the annotation information set to obtain the fusion annotation information of the video segment;
the specific implementation of the labeling module comprises the following steps:
s21, initializing segment segmentation position tolerance and user subject field confidence, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, constructing an annotation information set, and calculating the comprehensive confidence of the annotation information set according to the user subject field confidence and the annotation information;
s22, extracting video segments of the annotation information set with the comprehensive confidence reaching a preset threshold according to the confidence of the user subject field and the position annotation information to obtain video segments corresponding to the annotation information set;
s23, for the annotation information set with the comprehensive confidence reaching the preset threshold, fusing and standardizing a plurality of annotation information of the annotation information set based on the user subject confidence to obtain fused annotation information of the video segment corresponding to the annotation information set, wherein the fused annotation information comprises fused content annotation information and fused position annotation information;
s24, calculating the content labeling similarity of each piece of content labeling information and the fused labeling information, and updating the confidence coefficient of the user subject field corresponding to each piece of labeling information;
s25, calculating the content labeling difference between each piece of content labeling information and the fused content labeling information, calculating the position difference between each piece of position labeling information and the fused position labeling information, and updating the segment segmentation position tolerance according to the relationship between the labeling difference and the position difference;
in step S24, if the similarity between the annotation information and the fused annotation information is large, the confidence of the user subject field is increased, otherwise, the confidence of the user subject field is decreased, and the calculation formula for updating the confidence of the user subject field is:
subjectCredit′Krepresents the updated user subject field confidence, objectCredit, of the user in the Kth subjectKRepresenting the user subject field confidence of the user in the Kth subject before updating, Sim () representing the content label similarity, Sim0Represents a preset adjustment threshold, eta represents a preset adjustment step length,indicating fused annotation information, MarkiMarking information for the ith;
the step S25 includes the steps of: setting a segment division position tolerance update period according to the previous update periodAnd adjusting the tolerance of the segmentation position of the segment, if the difference between the mark and the final fusion result does not change along with the position difference, increasing the tolerance of the segmentation position of the segment, otherwise, reducing the tolerance of the segmentation position of the segment, and updating the tolerance of the segmentation position of the segment by the calculation formula:
Ef,kis the relation between the labeling difference and the position difference of the kth video segment, N is the total number of the labeling information of the kth video segment, M is the number of the fusion labeling information of the last updating period,is M, Ef,kCov () represents correlation, Difference () represents content annotation Difference, Distance () represents position Difference, Marki,kThe ith annotation information representing the kth video segment,represents the k-th video segment finally obtained by fusion and convergence, Ef0Adjusting a reference value, delta ', for a preset segment segmentation position tolerance'PSegmenting the position tolerance value, Delta, for the updated segmentPThe positional tolerance values are segmented for the segment before updating.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011319851.4A CN112418088B (en) | 2020-11-23 | 2020-11-23 | Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011319851.4A CN112418088B (en) | 2020-11-23 | 2020-11-23 | Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112418088A CN112418088A (en) | 2021-02-26 |
CN112418088B true CN112418088B (en) | 2022-04-29 |
Family
ID=74777896
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011319851.4A Active CN112418088B (en) | 2020-11-23 | 2020-11-23 | Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112418088B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115334354B (en) * | 2022-08-15 | 2023-12-29 | 北京百度网讯科技有限公司 | Video labeling method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104090955A (en) * | 2014-07-07 | 2014-10-08 | 科大讯飞股份有限公司 | Automatic audio/video label labeling method and system |
CN107609084A (en) * | 2017-09-06 | 2018-01-19 | 华中师范大学 | One kind converges convergent resource correlation method based on gunz |
WO2018081751A1 (en) * | 2016-10-28 | 2018-05-03 | Vilynx, Inc. | Video tagging system and method |
CN110245259A (en) * | 2019-05-21 | 2019-09-17 | 北京百度网讯科技有限公司 | The video of knowledge based map labels method and device, computer-readable medium |
CN110442710A (en) * | 2019-07-03 | 2019-11-12 | 广州探迹科技有限公司 | A kind of short text semantic understanding of knowledge based map and accurate matching process and device |
CN110968708A (en) * | 2019-12-20 | 2020-04-07 | 华中师范大学 | Method and system for labeling education information resource attributes |
CN111222500A (en) * | 2020-04-24 | 2020-06-02 | 腾讯科技(深圳)有限公司 | Label extraction method and device |
-
2020
- 2020-11-23 CN CN202011319851.4A patent/CN112418088B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104090955A (en) * | 2014-07-07 | 2014-10-08 | 科大讯飞股份有限公司 | Automatic audio/video label labeling method and system |
WO2018081751A1 (en) * | 2016-10-28 | 2018-05-03 | Vilynx, Inc. | Video tagging system and method |
CN107609084A (en) * | 2017-09-06 | 2018-01-19 | 华中师范大学 | One kind converges convergent resource correlation method based on gunz |
CN110245259A (en) * | 2019-05-21 | 2019-09-17 | 北京百度网讯科技有限公司 | The video of knowledge based map labels method and device, computer-readable medium |
CN110442710A (en) * | 2019-07-03 | 2019-11-12 | 广州探迹科技有限公司 | A kind of short text semantic understanding of knowledge based map and accurate matching process and device |
CN110968708A (en) * | 2019-12-20 | 2020-04-07 | 华中师范大学 | Method and system for labeling education information resource attributes |
CN111222500A (en) * | 2020-04-24 | 2020-06-02 | 腾讯科技(深圳)有限公司 | Label extraction method and device |
Non-Patent Citations (3)
Title |
---|
Unsupervised video summarization framework using keyframe extraction and video skimming;Shruti Jadon;《2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA)》;20201110;1-6 * |
教育信息资源用户标注模型构建及仿真研究;熊雅萍等;《现代远距离教育》;20170115(第01期);37-44 * |
视频和图像文本提取方法综述;蒋梦迪等;《计算机科学》;20171115;18-28 * |
Also Published As
Publication number | Publication date |
---|---|
CN112418088A (en) | 2021-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111737552A (en) | Method, device and equipment for extracting training information model and acquiring knowledge graph | |
CN112819023B (en) | Sample set acquisition method, device, computer equipment and storage medium | |
CN112836487B (en) | Automatic comment method and device, computer equipment and storage medium | |
CN105117429A (en) | Scenario image annotation method based on active learning and multi-label multi-instance learning | |
CN112131430B (en) | Video clustering method, device, storage medium and electronic equipment | |
CN111046275A (en) | User label determining method and device based on artificial intelligence and storage medium | |
CN112597296A (en) | Abstract generation method based on plan mechanism and knowledge graph guidance | |
CN111831924A (en) | Content recommendation method, device, equipment and readable storage medium | |
CN113515669A (en) | Data processing method based on artificial intelligence and related equipment | |
CN112989212A (en) | Media content recommendation method, device and equipment and computer storage medium | |
CN115062709B (en) | Model optimization method, device, equipment, storage medium and program product | |
CN115114974A (en) | Model distillation method, device, computer equipment and storage medium | |
CN116578717A (en) | Multi-source heterogeneous knowledge graph construction method for electric power marketing scene | |
CN114330703A (en) | Method, device and equipment for updating search model and computer-readable storage medium | |
CN114519397B (en) | Training method, device and equipment for entity link model based on contrast learning | |
CN112418088B (en) | Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing | |
CN111930981A (en) | Data processing method for sketch retrieval | |
CN117952200A (en) | Knowledge graph and personalized learning path construction method and system | |
CN117711001B (en) | Image processing method, device, equipment and medium | |
CN116935170B (en) | Processing method and device of video processing model, computer equipment and storage medium | |
CN113407776A (en) | Label recommendation method and device, training method and medium of label recommendation model | |
CN115712855A (en) | Self-learning-based label rule generation method and device | |
Liu | Restricted Boltzmann machine collaborative filtering recommendation algorithm based on project tag improvement | |
CN118568568B (en) | Training method of content classification model and related equipment | |
CN116340552B (en) | Label ordering method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |