CN112418088B - Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing - Google Patents

Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing Download PDF

Info

Publication number
CN112418088B
CN112418088B CN202011319851.4A CN202011319851A CN112418088B CN 112418088 B CN112418088 B CN 112418088B CN 202011319851 A CN202011319851 A CN 202011319851A CN 112418088 B CN112418088 B CN 112418088B
Authority
CN
China
Prior art keywords
annotation information
annotation
video
information
confidence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011319851.4A
Other languages
Chinese (zh)
Other versions
CN112418088A (en
Inventor
杜旭
李�浩
班倩茹
杨娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN202011319851.4A priority Critical patent/CN112418088B/en
Publication of CN112418088A publication Critical patent/CN112418088A/en
Application granted granted Critical
Publication of CN112418088B publication Critical patent/CN112418088B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Tourism & Hospitality (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Educational Administration (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Educational Technology (AREA)
  • Medical Informatics (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a video learning resource extraction and knowledge annotation method and system based on crowd-sourcing. The method comprises the following steps: s1, acquiring labeling information of a plurality of users on the knowledge points in the video learning resources, wherein the labeling information comprises position labeling information and content labeling information of the knowledge points in the video learning resources; s2, classifying the labeling information according to the position labeling information, constructing a labeling information set, calculating the comprehensive confidence of the labeling information set, if the comprehensive confidence of the labeling information set reaches a preset threshold, extracting video segments from the video learning resources according to the labeling information set, and performing fusion processing on the labeling information of the labeling information set to obtain the fusion labeling information of the video segments. According to the method and the device, the confidence of the annotation information set is judged, so that the influence of random annotation of videos by some users on the annotation result is reduced, and the quality and the reliability of the crowd-sourcing annotation are improved.

Description

Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing
Technical Field
The invention belongs to the technical field of education information, and particularly relates to a video learning resource extraction and knowledge annotation method and system based on crowd-sourcing.
Background
With the development of internet technology, video resources on the internet also show a knowledge growth trend, wherein more and more video resources contain a large amount of knowledge values, and the video clips with the knowledge values can be applied to the education and teaching process, so that the teaching contents can be visually displayed, and the attention of students can be attracted. For the video clips with knowledge value, how to mine implicit knowledge points contained in the video clips and associate the video clips with the knowledge points, so that a learner can quickly and efficiently acquire personalized learning resources is a hot point of current research.
The existing video learning resource extraction method comprises the steps of manual marking of experts and automatic marking of machines, and the manual marking of video segments by the experts in a few different fields needs to consume huge manpower, financial resources and time cost; the method of machine learning can realize automatic labeling, but for the extraction of the video segments with the implicit knowledge points, the machine learning is difficult to realize automatic processing, and the extraction of the video segments with the implicit knowledge points by only relying on the machine learning is difficult.
Disclosure of Invention
Aiming at least one defect or improvement requirement in the prior art, the invention provides a video learning resource extraction and knowledge labeling method and system based on the crowd's intelligence, and the video learning resource extraction and knowledge labeling method based on the crowd's intelligence can improve the quality and reliability of crowd's intelligence labeling.
To achieve the above object, according to a first aspect of the present invention, there is provided a crowd-sourcing-based video learning resource extraction and knowledge annotation method, comprising the steps of:
s1, acquiring labeling information of a plurality of users on the knowledge points in the video learning resources, wherein the labeling information comprises position labeling information and content labeling information of the knowledge points in the video learning resources;
s2, classifying the labeling information according to the position labeling information, constructing a labeling information set, calculating the comprehensive confidence of the labeling information set, if the comprehensive confidence of the labeling information set reaches a preset threshold, extracting video segments from the video learning resources according to the labeling information set, and performing fusion processing on the labeling information of the labeling information set to obtain the fusion labeling information of the video segments.
Preferably, the S2 includes the steps of:
s21, initializing segment segmentation position tolerance and user subject field confidence, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, constructing an annotation information set, and calculating the comprehensive confidence of the annotation information set according to the user subject field confidence and the annotation information;
s22, extracting video segments of the annotation information set with the comprehensive confidence reaching a preset threshold according to the confidence of the user subject field and the position annotation information to obtain video segments corresponding to the annotation information set;
s23, for the annotation information set with the comprehensive confidence reaching the preset threshold, fusing and standardizing a plurality of annotation information of the annotation information set based on the user subject confidence to obtain fused annotation information of the video segment corresponding to the annotation information set, wherein the fused annotation information comprises fused content annotation information and fused position annotation information;
s24, calculating the content labeling similarity of each piece of content labeling information and the fused labeling information, and updating the confidence coefficient of the user subject field corresponding to each piece of labeling information;
and S25, calculating the content labeling difference between each piece of content labeling information and the fused content labeling information, calculating the position difference between each piece of position labeling information and the fused position labeling information, and updating the segment segmentation position tolerance according to the relationship between the labeling difference and the position difference.
Preferably, in step S24, if the similarity between the annotation information and the fused annotation information is large, the confidence of the user subject field is increased, otherwise, the confidence of the user subject field is decreased, and the calculation formula for updating the confidence of the user subject field is:
Figure BDA0002792520330000021
subjectCredit′Krepresents the updated user subject field confidence, objectCredit, of the user in the Kth subjectKRepresenting the user subject field confidence of the user in the Kth subject before updating, Sim () representing the content label similarity, Sim0Represents a preset adjustment threshold, eta represents a preset adjustment step length,
Figure BDA0002792520330000022
indicating fused annotation information, MarkiThe ith label information.
Preferably, the step S25 includes the steps of: setting a segment division position tolerance update period according to the previous update period
Figure BDA0002792520330000031
And adjusting the tolerance of the segmentation position of the segment, if the difference between the mark and the final fusion result does not change along with the position difference, increasing the tolerance of the segmentation position of the segment, otherwise, reducing the tolerance of the segmentation position of the segment, and updating the tolerance of the segmentation position of the segment by the calculation formula:
Figure BDA0002792520330000032
Figure BDA0002792520330000033
Figure BDA0002792520330000034
Ef,kis the k-thThe relationship between the labeling difference and the position difference of the video segments, N is the total number of labeling information of the kth video segment, M is the number of the fused labeling information of the last updating period,
Figure BDA0002792520330000035
is M, Ef,kCov () represents correlation, Difference () represents content annotation Difference, Distance () represents position Difference, Marki,kThe ith annotation information representing the kth video segment,
Figure BDA0002792520330000036
represents the k-th video segment finally obtained by fusion and convergence, Ef0Adjusting a reference value, delta ', for a preset segment segmentation position tolerance'PSegmenting the position tolerance value, Delta, for the updated segmentPThe positional tolerance values are segmented for the segment before updating.
Preferably, the S21 includes the steps of:
s211, initializing segment segmentation position tolerance and user subject field confidence;
s212, traversing the annotation information, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, and classifying all annotation information of the position difference between the position annotation information in the segment segmentation position tolerance into a set to obtain an annotation information set;
s213, obtaining the subject field of the labeling information set according to all the labeling information in the labeling information set;
s214, obtaining the user subject field confidence of each marking information corresponding to the user in the subject field to which the marking information set belongs in the marking information set, and calculating the comprehensive confidence of the marking information set, wherein the calculation formula of the comprehensive confidence is as follows:
Figure BDA0002792520330000041
wherein SetCredit is the comprehensive confidence of the labeling information set, and SubjectCredit is the labeling information setK,iRepresenting annotated information setsAnd the ith marking information in the summary corresponds to the user subject field confidence of the user in the subject field to which the marking information set belongs, and N is the total amount of the marking information in the marking information set.
Preferably, in S22, the video segment extraction is implemented by a weighted voting method based on user confidence.
Preferably, the content annotation information includes knowledge points, and the S23 includes the steps of:
s231, classifying the annotation information in one annotation information set according to the knowledge points of the annotation information in the annotation information set, and acquiring all annotation information of each knowledge point in the video clip;
s231, fusing and standardizing all the labeling information of each knowledge point in the video clip to obtain the fused labeling information of each knowledge point in the video clip.
Preferably, the step S231 includes the steps of:
acquiring all annotation information of each knowledge point in a video segment, user identification corresponding to each annotation information of each knowledge point and user subject field confidence, and vectorizing each annotation information of each knowledge point in the video segment to obtain vectorized text data;
inputting the vectorized text data into a first long-short term memory artificial neural network to obtain text distributed expression data;
inputting the text distributed expression data into a second long-short term memory artificial neural network, outputting predicted abstract distributed expression data, and adjusting the influence degree of the input value on the output predicted value based on the user subject confidence coefficient by using an attention mechanism;
and converting the abstract distributed expression data into a text form to obtain the fusion marking information of each knowledge point in the video clip.
Preferably, the crowd-sourcing-based video learning resource extraction and knowledge annotation method further includes step S3: and taking the video learning resource before marking as a parent video, acquiring the position offset of the extracted video clip relative to the parent video, generating a video head file according to the parent video and the position offset, and managing the video clip by adopting the parent video and the video head file.
According to a second aspect of the present invention, there is provided a crowd-sourcing based video learning resource extraction and knowledge annotation system, comprising:
the annotation information acquisition module is used for acquiring annotation information of the knowledge points in the video learning resources from a plurality of users, wherein the annotation information comprises position annotation information and content annotation information of the knowledge points in the video learning resources;
and the marking module is used for classifying the marking information according to the position marking information, constructing a marking information set, calculating the comprehensive confidence coefficient of the marking information set, if the comprehensive confidence coefficient of the marking information set reaches a preset threshold value, extracting a video segment from the video learning resource according to the marking information set, and fusing the marking information of the marking information set to obtain the fused marking information of the video segment.
In general, compared with the prior art, the invention has the following beneficial effects:
(1) according to the video learning resource extraction and knowledge annotation method and system based on the crowd-sourcing, user annotation information is traversed, classification is carried out according to position annotation information, an annotation information set is constructed, the comprehensive confidence coefficient of the annotation information set is further calculated, and if the confidence coefficient reaches a threshold value, standardization processing is carried out. And (4) integrating the labeling information of all the positions to extract the video segments and fusing the labeling information based on the confidence of the user. The method judges the confidence of the annotation information set based on the user confidence, reduces the influence of random annotation videos of some users on the annotation result, and improves the quality and the confidence of the crowd-sourcing annotation.
(2) According to the video learning resource extraction and knowledge annotation method and system based on the crowd-sourcing intelligence, after annotation information is fused, the similarity degree of user annotation information and an annotation result is calculated, and the confidence of a user in the subject field is dynamically calculated; and calculating the relation between the marking information and the position of the video segment, and dynamically determining the position tolerance of the marking information of the video segment. The method can dynamically determine the user confidence and the position tolerance of the labeling information, and improve the accuracy and the reliability of the labeling data.
(3) According to the video learning resource extraction and knowledge labeling method and system based on the crowd-sourcing, the sub-videos divided by the user are managed by adopting a virtual division strategy mechanism based on the data block, the file header and the knowledge information separation storage, and the method can dynamically combine the data block and the file header as required to extract the virtual videos. The method adopts a virtual partition mode, improves the utilization rate of the storage space, improves the application processing capacity and reduces the video playing delay.
The invention extracts the video learning resources by exerting the intelligent power of the public and comprehensively considering the confidence of the user, is particularly suitable for automatically extracting the video segments with the implicit knowledge points in the video, further provides data for various large educational video resource libraries and helps educators and learners to obtain more high-quality, multi-dimensional and multi-granularity educational resources.
Drawings
FIG. 1 is a schematic diagram illustrating a video learning resource extraction and knowledge annotation method based on crowd-sourcing in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart of video segment extraction and annotation information fusion provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating the principle of annotation information fusion based on user confidence according to an embodiment of the present invention;
fig. 4 is a schematic resource management diagram of a video clip according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Fig. 1 is a schematic diagram illustrating a video learning resource extraction and knowledge annotation method based on crowd-sourcing according to an embodiment of the present invention, where the method includes steps S1 to S3.
In S1, annotation information of the knowledge points in the video learning resource by the multiple users is obtained, and the annotation information includes position annotation information and content annotation information of the knowledge points in the video learning resource.
When a plurality of users watch videos, a certain knowledge point or a segment which is helpful for learning the knowledge point in the videos is marked according to self understanding, preferably, the marking information comprises the starting point and the ending point of the segment, a title, the knowledge point, description information and the like, if the users watch the videos, the principle of pinhole imaging is stated in the position 5:50-8:20, namely, the marking point can be dragged to the corresponding position, and then the title is filled: "principle of pinhole imaging", knowledge points: "junior middle school-second grade-physics-pinhole imaging, straight-line propagation of light", description information: "the wall will form the reflection of the object when the wall is covered by a board with small holes.
In S2, as shown in fig. 2, the annotation information is classified according to the position annotation information, an annotation information set is constructed, a comprehensive confidence of the annotation information set is calculated, and if the comprehensive confidence of the annotation information set reaches a preset threshold, a video segment is extracted from the video learning resource according to the annotation information set, and the annotation information of the annotation information set is fused, so as to obtain fused annotation information of the video segment.
S21, initializing segment segmentation position tolerance and user subject field confidence, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, constructing an annotation information set, and calculating the comprehensive confidence of the annotation information set according to the user subject field confidence and the annotation information;
and S22, extracting video segments of the annotation information set with the comprehensive confidence reaching the preset threshold according to the confidence of the user subject field and the position annotation information, and obtaining the video segments corresponding to the annotation information set.
And (3) extracting the segments of the marked segment head and tail positions by adopting a weighted voting method based on user confidence, dividing the head positions and the tail positions in the marked information set into a head position group and a tail position group respectively, calculating the weighted votes for the marked points by using the user confidence weights, and taking the marked points with the highest weighted votes as the nodes in the set, namely the head and tail positions of the video segments marked by the set.
S23, for the annotation information set with the comprehensive confidence reaching the preset threshold, fusing and standardizing a plurality of annotation information of the annotation information set based on the user subject confidence to obtain fused annotation information of the video segment corresponding to the annotation information set, wherein the fused annotation information comprises fused content annotation information and fused position annotation information;
s24, calculating the content labeling similarity of each piece of content labeling information and the fused labeling information, and updating the user subject field confidence level objectCredit corresponding to each piece of labeling informationK. And if the similarity between the user label and the final fusion result is high, increasing the user confidence coefficient, otherwise, reducing the user confidence coefficient. The specific algorithm is as follows:
Figure BDA0002792520330000071
subjectCredit′Krepresents the updated user subject field confidence, objectCredit, of the user in the Kth subjectKRepresenting the user subject field confidence of the user in the Kth subject before updating, Sim () representing the content label similarity, Sim0Represents a preset adjustment threshold, eta represents a preset adjustment step length,
Figure BDA0002792520330000072
indicating fused annotation information, MarkiThe ith label information.
And S25, calculating the content labeling difference between each piece of content labeling information and the fused content labeling information, calculating the position difference between each piece of position labeling information and the fused position labeling information, and updating the segment segmentation position tolerance according to the relationship between the labeling difference and the position difference.
When S21 video clips are regulated into a set, the clip division position range is combined with fusion according to the user marking informationAnd dynamically adjusting the difference of the effects to ensure that the same video clip set has similar contents as much as possible. Calculating the difference between each label in the set and the final fusion result, and investigating the relation E between the difference and the head and tail positions of the segmentsfAnd updating the tolerance value. If the position tolerance ΔPAdjusting the period to be in accordance withPLast adjustment period
Figure BDA0002792520330000081
Mean value, adjusted byP. If the difference degree between the labeling result and the final fusion result does not change along with the position difference, the position tolerance delta is increasedPOtherwise, the position tolerance Δ is reducedP. The specific algorithm is as follows:
Figure BDA0002792520330000082
Figure BDA0002792520330000083
Figure BDA0002792520330000084
Figure BDA0002792520330000085
Figure BDA0002792520330000086
Figure BDA0002792520330000087
Ef,kthe relationship between the annotation difference and the position difference of the kth video segment, N is the total number of annotation information of the kth video segment, M is the number of the fused annotation information of the last update period,
Figure BDA0002792520330000088
is M, Ef,kCov () represents correlation, Difference () represents content annotation Difference, Distance () represents position Difference, Marki,kThe ith annotation information representing the kth video segment,
Figure BDA0002792520330000089
represents the k-th video segment finally obtained by fusion and convergence, Ef0Adjusting a reference value for preset segment segmentation position tolerance, and obtaining delta 'according to actual statistics'PSegmenting the position tolerance value, Delta, for the updated segmentPThe positional tolerance values are segmented for the segment before updating.
S21 includes the steps of:
(1) initializing segment segmentation position tolerance ΔPAnd user subject field confidence SubjectCreditK, K being the subject field number.
(2) And traversing the user annotation information periodically, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, and for the segments with head-tail position differences within the tolerance range, indicating that the two video segments are very similar, namely the two video segments can be regarded as the same segment, and the annotated annotation information can also be regarded as the annotation of the same segment, namely the annotation information can be classified into an annotation information set.
(3) And classifying the labeled information set according to the labeled information in subject field, and determining the subject to which the labeled information belongs, wherein the labeled information is preprocessed, keywords and weight of the labeled information are calculated by adopting a TF-IDF algorithm, and then the subject field to which the labeled information belongs is obtained by adopting an SVM algorithm. Other algorithm models based on machine learning and deep learning can be integrated for subject field classification.
(4) And according to the subject field to which the labeling information belongs, obtaining the confidence of the user in the subject field, and further calculating the comprehensive confidence of the labeling information of the set. The comprehensive confidence calculation model of the labeling information set is as follows:
Figure BDA0002792520330000091
wherein SetCredit is the comprehensive confidence of the labeling information set, and SubjectCredit is the labeling information setK,iAnd representing the user subject field confidence of the ith marking information in the marking information set corresponding to the subject field to which the marking information set belongs, wherein N is the total number of the marking information in the marking information set.
In S23, as shown in fig. 3, for a set in which the overall confidence of the annotation information reaches a threshold, the human annotation information is fused and normalized based on the user subject confidence.
(1) Classifying the labeling information in one labeling information set according to the knowledge points of the labeling information in the labeling information set to obtain all the labeling information of each knowledge point in the video clip;
(2) and fusing and standardizing all the labeling information of each knowledge point in the video segment to obtain the fused labeling information of each knowledge point in the video segment. The method specifically comprises the following steps:
all annotation information of each knowledge point in the video segment, user identification corresponding to each annotation information of each knowledge point and user subject field confidence are obtained, vectorization processing is carried out on each annotation information of each knowledge point in the video segment, and vectorization text data are obtained. Specifically, a triple is formed for the annotation information, { user ID, user subject confidence, [ title, knowledge point, description ] }, and the annotation information under the same knowledge point is formed into a corpus and vectorized;
transmitting the vectorized text data into an LSTM model to obtain distributed expression data of the text;
adjusting the influence degree of each input value on a predicted value by using an Attention mechanism based on the confidence coefficient of the user subject;
and transmitting the distributed expression data of the text into the LSTM model, and predicting and outputting the distributed expression of the abstract.
And converting the distributed expression of the abstract into a text form, and obtaining the fusion marking information of each knowledge point in the video segment, wherein the fusion marking information comprises data such as a title, the knowledge point, description information and the like.
In S3, as shown in fig. 4, the video clip is organized and managed by using a virtual partitioning policy mechanism based on separate storage of data blocks, file headers and knowledge information.
It should be noted that step S3 is not an essential step, and step S3 is a preferred implementation of resource management.
(1) After determining the head and the tail of the video clip, the method finds that the child video clip is partial data of the father video clip, so that the method does not generate a real video clip for the child video clip to be stored in a database, but generates a video head file according to the ID of the father video and the offset of the clip relative to the father video according to the coding specification of a related video file, and stores the video head file into a video clip head file database; and after the video annotation information is fused, storing the ID and the annotation information of the fused video clip head file into a video resource database. And constructing a virtual file which accords with the operating system specification based on the parent video data block, the segment header file library and the resource list, supporting the transparent access of the video segments on the file operating layer, and supporting the transparent further labeling, segmentation or merging extraction of the video segments. The method adopts a virtual segmentation mode to greatly reduce the consumption of the generated multiple sub-videos on the storage space.
(2) When a user needs to browse, download or divide a virtual video segment, only the corresponding database and the corresponding file header need to be dynamically combined according to needs. The actual data block position is found layer by layer recursion according to the relative offset from the parent video.
(3) When a certain virtual video clip is accessed frequently or is divided into a plurality of sub-clips, in order to improve the application processing capacity and reduce the video playing delay, a header file of the virtual video is spliced with video data to form an independent video, and relevant item information of a video clip database is updated.
The embodiment of the invention also provides a video learning resource extraction and knowledge annotation system based on crowd-sourcing, which comprises the following steps:
the annotation information acquisition module is used for acquiring annotation information of the knowledge points in the video learning resources from a plurality of users, wherein the annotation information comprises position annotation information and content annotation information of the knowledge points in the video learning resources;
and the marking module is used for classifying the marking information according to the position marking information, constructing a marking information set, calculating the comprehensive confidence coefficient of the marking information set, if the comprehensive confidence coefficient of the marking information set reaches a preset threshold value, extracting a video segment from the video learning resource according to the marking information set, and fusing the marking information of the marking information set to obtain the fused marking information of the video segment.
Preferably, the crowd-sourcing-based video learning resource extraction and knowledge annotation system further comprises a storage module, wherein the storage module is configured to use the video learning resource before being marked as a parent video, obtain a position offset of an extracted video segment relative to the parent video, generate a video header file according to the parent video and the position offset, and store and manage the video segment in a manner of the parent video and the video header file.
The implementation principle and technical effect of the system are the same as those of the method, and are not described herein again.
It must be noted that in any of the above embodiments, the methods are not necessarily executed in order of sequence number, and as long as it cannot be assumed from the execution logic that they are necessarily executed in a certain order, it means that they can be executed in any other possible order.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (7)

1. A video learning resource extraction and knowledge annotation method based on crowd-sourcing is characterized by comprising the following steps:
s1, acquiring labeling information of a plurality of users on the knowledge points in the video learning resources, wherein the labeling information comprises position labeling information and content labeling information of the knowledge points in the video learning resources;
s2, classifying the annotation information according to the position annotation information, constructing an annotation information set, calculating the comprehensive confidence of the annotation information set, if the comprehensive confidence of the annotation information set reaches a preset threshold, extracting video segments from the video learning resources according to the annotation information set, and performing fusion processing on the annotation information of the annotation information set to obtain the fusion annotation information of the video segments;
the S2 includes the steps of:
s21, initializing segment segmentation position tolerance and user subject field confidence, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, constructing an annotation information set, and calculating the comprehensive confidence of the annotation information set according to the user subject field confidence and the annotation information;
s22, extracting video segments of the annotation information set with the comprehensive confidence reaching a preset threshold according to the confidence of the user subject field and the position annotation information to obtain video segments corresponding to the annotation information set;
s23, for the annotation information set with the comprehensive confidence reaching the preset threshold, fusing and standardizing a plurality of annotation information of the annotation information set based on the user subject confidence to obtain fused annotation information of the video segment corresponding to the annotation information set, wherein the fused annotation information comprises fused content annotation information and fused position annotation information;
s24, calculating the content labeling similarity of each piece of content labeling information and the fused labeling information, and updating the confidence coefficient of the user subject field corresponding to each piece of labeling information;
s25, calculating the content labeling difference between each piece of content labeling information and the fused content labeling information, calculating the position difference between each piece of position labeling information and the fused position labeling information, and updating the segment segmentation position tolerance according to the relationship between the labeling difference and the position difference;
in step S24, if the similarity between the annotation information and the fused annotation information is large, the confidence of the user subject field is increased, otherwise, the confidence of the user subject field is decreased, and the calculation formula for updating the confidence of the user subject field is:
Figure FDA0003545872550000021
subjectCredit′Krepresents the updated user subject field confidence, objectCredit, of the user in the Kth subjectKRepresenting the user subject field confidence of the user in the Kth subject before updating, Sim () representing the content label similarity, Sim0Represents a preset adjustment threshold, eta represents a preset adjustment step length,
Figure FDA0003545872550000022
indicating fused annotation information, MarkiMarking information for the ith;
the step S25 includes the steps of: setting a segment division position tolerance update period according to the previous update period
Figure FDA0003545872550000028
And adjusting the tolerance of the segmentation position of the segment, if the difference between the mark and the final fusion result does not change along with the position difference, increasing the tolerance of the segmentation position of the segment, otherwise, reducing the tolerance of the segmentation position of the segment, and updating the tolerance of the segmentation position of the segment by the calculation formula:
Figure FDA0003545872550000023
Figure FDA0003545872550000024
Figure FDA0003545872550000025
Ef,kthe relation between the labeling difference and the position difference of the kth video segment is shown, N is the total number of the labeling information of the kth video segment, and M is the number of the fused labeling information of the last updating period,
Figure FDA0003545872550000026
Is M, Ef,kCov () represents correlation, Difference () represents content annotation Difference, Distance () represents position Difference, Marki,kThe ith annotation information representing the kth video segment,
Figure FDA0003545872550000027
represents the k-th video segment finally obtained by fusion and convergence, Ef0Adjusting a reference value, delta ', for a preset segment segmentation position tolerance'PSegmenting the position tolerance value, Delta, for the updated segmentPThe positional tolerance values are segmented for the segment before updating.
2. The crowd-sourcing-based video learning resource extraction and knowledge annotation process of claim 1, wherein said S21 comprises the steps of:
s211, initializing segment segmentation position tolerance and user subject field confidence;
s212, traversing the annotation information, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, and classifying all annotation information of the position difference between the position annotation information in the segment segmentation position tolerance into a set to obtain an annotation information set;
s213, obtaining the subject field of the labeling information set according to all the labeling information in the labeling information set;
s214, obtaining the user subject field confidence of each marking information corresponding to the user in the subject field to which the marking information set belongs in the marking information set, and calculating the comprehensive confidence of the marking information set, wherein the calculation formula of the comprehensive confidence is as follows:
Figure FDA0003545872550000031
wherein SetCredit is the comprehensive confidence of the labeling information set, and SubjectCredit is the labeling information setK,iAnd representing the user subject field confidence of the ith marking information in the marking information set corresponding to the subject field to which the marking information set belongs, wherein N is the total number of the marking information in the marking information set.
3. The crowd-sourcing-based video learning resource extraction and knowledge annotation method of claim 1, wherein in S22, the video segment extraction is implemented by a weighted voting method based on user confidence.
4. The crowd-sourcing-based video learning resource extraction and knowledge annotation process of claim 1, wherein the content annotation information comprises knowledge points, said S23 comprising the steps of:
s231, classifying the annotation information in one annotation information set according to the knowledge points of the annotation information in the annotation information set, and acquiring all annotation information of each knowledge point in the video clip;
s232, fusing and standardizing all the labeling information of each knowledge point in the video segment to obtain the fused labeling information of each knowledge point in the video segment.
5. The crowd-sourcing-based video learning resource extraction and knowledge annotation process of claim 4, wherein said step S232 comprises the steps of:
acquiring all annotation information of each knowledge point in a video segment, user identification corresponding to each annotation information of each knowledge point and user subject field confidence, and vectorizing each annotation information of each knowledge point in the video segment to obtain vectorized text data;
inputting the vectorized text data into a first long-short term memory artificial neural network to obtain text distributed expression data;
inputting the text distributed expression data into a second long-short term memory artificial neural network, outputting predicted abstract distributed expression data, and adjusting the influence degree of the input value on the output predicted value based on the user subject confidence coefficient by using an attention mechanism;
and converting the abstract distributed expression data into a text form to obtain the fusion marking information of each knowledge point in the video clip.
6. The crowd-sourcing-based video learning resource extraction and knowledge annotation process of claim 1, further comprising step S3: the video learning resource before marking is used as a parent video, the position offset of the extracted video clip relative to the parent video is obtained, a video head file is generated according to the parent video and the position offset, and the video clip is stored and managed in a parent video and video head file mode.
7. A crowd-sourcing-based video learning resource extraction and knowledge annotation system is characterized by comprising:
the annotation information acquisition module is used for acquiring annotation information of the knowledge points in the video learning resources from a plurality of users, wherein the annotation information comprises position annotation information and content annotation information of the knowledge points in the video learning resources;
the annotation module is used for classifying the annotation information according to the position annotation information, constructing an annotation information set, calculating the comprehensive confidence of the annotation information set, if the comprehensive confidence of the annotation information set reaches a preset threshold, extracting a video segment from the video learning resource according to the annotation information set, and performing fusion processing on the annotation information of the annotation information set to obtain the fusion annotation information of the video segment;
the specific implementation of the labeling module comprises the following steps:
s21, initializing segment segmentation position tolerance and user subject field confidence, classifying the annotation information according to the position annotation information and the segment segmentation position tolerance, constructing an annotation information set, and calculating the comprehensive confidence of the annotation information set according to the user subject field confidence and the annotation information;
s22, extracting video segments of the annotation information set with the comprehensive confidence reaching a preset threshold according to the confidence of the user subject field and the position annotation information to obtain video segments corresponding to the annotation information set;
s23, for the annotation information set with the comprehensive confidence reaching the preset threshold, fusing and standardizing a plurality of annotation information of the annotation information set based on the user subject confidence to obtain fused annotation information of the video segment corresponding to the annotation information set, wherein the fused annotation information comprises fused content annotation information and fused position annotation information;
s24, calculating the content labeling similarity of each piece of content labeling information and the fused labeling information, and updating the confidence coefficient of the user subject field corresponding to each piece of labeling information;
s25, calculating the content labeling difference between each piece of content labeling information and the fused content labeling information, calculating the position difference between each piece of position labeling information and the fused position labeling information, and updating the segment segmentation position tolerance according to the relationship between the labeling difference and the position difference;
in step S24, if the similarity between the annotation information and the fused annotation information is large, the confidence of the user subject field is increased, otherwise, the confidence of the user subject field is decreased, and the calculation formula for updating the confidence of the user subject field is:
Figure FDA0003545872550000051
subjectCredit′Krepresents the updated user subject field confidence, objectCredit, of the user in the Kth subjectKRepresenting the user subject field confidence of the user in the Kth subject before updating, Sim () representing the content label similarity, Sim0Represents a preset adjustment threshold, eta represents a preset adjustment step length,
Figure FDA0003545872550000052
indicating fused annotation information, MarkiMarking information for the ith;
the step S25 includes the steps of: setting a segment division position tolerance update period according to the previous update period
Figure FDA0003545872550000061
And adjusting the tolerance of the segmentation position of the segment, if the difference between the mark and the final fusion result does not change along with the position difference, increasing the tolerance of the segmentation position of the segment, otherwise, reducing the tolerance of the segmentation position of the segment, and updating the tolerance of the segmentation position of the segment by the calculation formula:
Figure FDA0003545872550000062
Figure FDA0003545872550000063
Figure FDA0003545872550000064
Ef,kis the relation between the labeling difference and the position difference of the kth video segment, N is the total number of the labeling information of the kth video segment, M is the number of the fusion labeling information of the last updating period,
Figure FDA0003545872550000065
is M, Ef,kCov () represents correlation, Difference () represents content annotation Difference, Distance () represents position Difference, Marki,kThe ith annotation information representing the kth video segment,
Figure FDA0003545872550000066
represents the k-th video segment finally obtained by fusion and convergence, Ef0Adjusting a reference value, delta ', for a preset segment segmentation position tolerance'PSegmenting the position tolerance value, Delta, for the updated segmentPThe positional tolerance values are segmented for the segment before updating.
CN202011319851.4A 2020-11-23 2020-11-23 Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing Active CN112418088B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011319851.4A CN112418088B (en) 2020-11-23 2020-11-23 Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011319851.4A CN112418088B (en) 2020-11-23 2020-11-23 Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing

Publications (2)

Publication Number Publication Date
CN112418088A CN112418088A (en) 2021-02-26
CN112418088B true CN112418088B (en) 2022-04-29

Family

ID=74777896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011319851.4A Active CN112418088B (en) 2020-11-23 2020-11-23 Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing

Country Status (1)

Country Link
CN (1) CN112418088B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115334354B (en) * 2022-08-15 2023-12-29 北京百度网讯科技有限公司 Video labeling method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090955A (en) * 2014-07-07 2014-10-08 科大讯飞股份有限公司 Automatic audio/video label labeling method and system
CN107609084A (en) * 2017-09-06 2018-01-19 华中师范大学 One kind converges convergent resource correlation method based on gunz
WO2018081751A1 (en) * 2016-10-28 2018-05-03 Vilynx, Inc. Video tagging system and method
CN110245259A (en) * 2019-05-21 2019-09-17 北京百度网讯科技有限公司 The video of knowledge based map labels method and device, computer-readable medium
CN110442710A (en) * 2019-07-03 2019-11-12 广州探迹科技有限公司 A kind of short text semantic understanding of knowledge based map and accurate matching process and device
CN110968708A (en) * 2019-12-20 2020-04-07 华中师范大学 Method and system for labeling education information resource attributes
CN111222500A (en) * 2020-04-24 2020-06-02 腾讯科技(深圳)有限公司 Label extraction method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090955A (en) * 2014-07-07 2014-10-08 科大讯飞股份有限公司 Automatic audio/video label labeling method and system
WO2018081751A1 (en) * 2016-10-28 2018-05-03 Vilynx, Inc. Video tagging system and method
CN107609084A (en) * 2017-09-06 2018-01-19 华中师范大学 One kind converges convergent resource correlation method based on gunz
CN110245259A (en) * 2019-05-21 2019-09-17 北京百度网讯科技有限公司 The video of knowledge based map labels method and device, computer-readable medium
CN110442710A (en) * 2019-07-03 2019-11-12 广州探迹科技有限公司 A kind of short text semantic understanding of knowledge based map and accurate matching process and device
CN110968708A (en) * 2019-12-20 2020-04-07 华中师范大学 Method and system for labeling education information resource attributes
CN111222500A (en) * 2020-04-24 2020-06-02 腾讯科技(深圳)有限公司 Label extraction method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Unsupervised video summarization framework using keyframe extraction and video skimming;Shruti Jadon;《2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA)》;20201110;1-6 *
教育信息资源用户标注模型构建及仿真研究;熊雅萍等;《现代远距离教育》;20170115(第01期);37-44 *
视频和图像文本提取方法综述;蒋梦迪等;《计算机科学》;20171115;18-28 *

Also Published As

Publication number Publication date
CN112418088A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
CN111737552A (en) Method, device and equipment for extracting training information model and acquiring knowledge graph
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
CN112836487B (en) Automatic comment method and device, computer equipment and storage medium
CN105117429A (en) Scenario image annotation method based on active learning and multi-label multi-instance learning
CN112131430B (en) Video clustering method, device, storage medium and electronic equipment
CN111046275A (en) User label determining method and device based on artificial intelligence and storage medium
CN112597296A (en) Abstract generation method based on plan mechanism and knowledge graph guidance
CN111831924A (en) Content recommendation method, device, equipment and readable storage medium
CN113515669A (en) Data processing method based on artificial intelligence and related equipment
CN112989212A (en) Media content recommendation method, device and equipment and computer storage medium
CN115062709B (en) Model optimization method, device, equipment, storage medium and program product
CN115114974A (en) Model distillation method, device, computer equipment and storage medium
CN116578717A (en) Multi-source heterogeneous knowledge graph construction method for electric power marketing scene
CN114330703A (en) Method, device and equipment for updating search model and computer-readable storage medium
CN114519397B (en) Training method, device and equipment for entity link model based on contrast learning
CN112418088B (en) Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing
CN111930981A (en) Data processing method for sketch retrieval
CN117952200A (en) Knowledge graph and personalized learning path construction method and system
CN117711001B (en) Image processing method, device, equipment and medium
CN116935170B (en) Processing method and device of video processing model, computer equipment and storage medium
CN113407776A (en) Label recommendation method and device, training method and medium of label recommendation model
CN115712855A (en) Self-learning-based label rule generation method and device
Liu Restricted Boltzmann machine collaborative filtering recommendation algorithm based on project tag improvement
CN118568568B (en) Training method of content classification model and related equipment
CN116340552B (en) Label ordering method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant