CN114996510A - Teaching video segmentation and information point extraction method, device, electronic equipment and medium - Google Patents

Teaching video segmentation and information point extraction method, device, electronic equipment and medium Download PDF

Info

Publication number
CN114996510A
CN114996510A CN202110223138.8A CN202110223138A CN114996510A CN 114996510 A CN114996510 A CN 114996510A CN 202110223138 A CN202110223138 A CN 202110223138A CN 114996510 A CN114996510 A CN 114996510A
Authority
CN
China
Prior art keywords
information
video
text
segmentation
teaching video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110223138.8A
Other languages
Chinese (zh)
Inventor
杨立春
夏德虎
张志发
赵梦凯
巩稼民
蒋杰伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Penguin Network Technology Co ltd
Xian University of Posts and Telecommunications
Original Assignee
Shenzhen Penguin Network Technology Co ltd
Xian University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Penguin Network Technology Co ltd, Xian University of Posts and Telecommunications filed Critical Shenzhen Penguin Network Technology Co ltd
Priority to CN202110223138.8A priority Critical patent/CN114996510A/en
Publication of CN114996510A publication Critical patent/CN114996510A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a teaching video segmentation and information point extraction method, a teaching video segmentation and information point extraction device, electronic equipment and a teaching video segmentation and information point extraction medium. The method comprises the following steps: acquiring a teaching video, and reading image information in the teaching video; extracting text information in the teaching video; segmenting the teaching video according to the text information and the image information to generate a segmented video; and extracting information points of the segmented video according to the text information and the image information corresponding to the segmented video, and determining the information points. According to the teaching video segmentation and information point extraction method, extraction is automatically completed in the whole segmentation and information point extraction process, manual participation is not needed, the working efficiency of the video segmentation and information point extraction process is improved, and the cost is reduced.

Description

Teaching video segmentation and information point extraction method, device, electronic equipment and medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for segmenting a teaching video and extracting an information point, an electronic device, and a medium.
Background
With the rapid development of intelligent equipment and the Internet, more and more people share the learning experience and the life entertainment process of the people in a video mode through an Internet platform. Each large education platform also provides own online teaching video courses. Compared with the traditional offline course, the online teaching video course has unique advantages, such as no restriction of classroom sites and class time, playback and watching, and the like. However, there are some problems in online teaching video courses, and although the student can find the required video according to the title and content introduction of the video, the student wants to jump to learn part of the knowledge in the video, and cannot quickly and accurately locate the target information point in the video. Especially for long duration videos, the location finding will delay the trainee's significant time.
When segmentation and information point extraction are carried out on a teaching video, a manual segmentation and extraction mode is adopted. The manual segmentation and extraction method not only needs to consume a large amount of manpower and material resources and has low efficiency, but also causes the problem that the segmentation and the extraction of information points are inconsistent because different people understand the same video differently. Therefore, the traditional teaching video segmentation and information point extraction method is time-consuming and labor-consuming and has the defect of low working efficiency.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a teaching video segmentation and information point extraction method, apparatus, electronic device, and medium capable of improving work efficiency.
In a first aspect of the present application, a method for segmenting a teaching video and extracting information points is provided, which includes:
acquiring a teaching video, and reading image information in the teaching video;
extracting text information in the teaching video;
segmenting the teaching video according to the text information and the image information to generate a segmented video;
and extracting information points of the segmented video according to the text information and the image information corresponding to the segmented video, and determining the information points.
In one embodiment, the text information is an audio text, and the extracting the text information in the teaching video includes:
and extracting audio content in the teaching video, and correspondingly storing text content and time line in the audio content according to an audio-to-text technology to obtain an audio text.
In one embodiment, the segmenting the teaching video according to the text information and the image information to generate a segmented video includes:
determining a preliminary segmentation point according to the text information;
determining secondary segmentation points according to the image information;
determining a final segmentation point according to the primary segmentation point and the secondary segmentation point;
and according to the final segmentation point, carrying out segmentation processing on the teaching video to generate a segmented video.
In one embodiment, the determining a preliminary segmentation point according to the text information includes:
and extracting time points of which the text time interval is larger than a preset interval threshold value in the text information according to the text information, and determining a preliminary segmentation point.
In one embodiment, the determining a secondary segmentation point according to the image information comprises:
extracting images according to the image information and preset time intervals, and calculating the similarity of adjacent images;
and if the similarity is smaller than a first preset similarity threshold, determining the time point between the corresponding adjacent images as a secondary segmentation point.
In one embodiment, the extracting images at preset time intervals according to the image information and calculating the similarity of adjacent images includes:
extracting images according to the image information and preset time intervals;
converting the extracted image into a four-level gray image, and vectorizing the four-level gray image to obtain an image vector;
carrying out standardization processing on the image vector to obtain a standardized vector;
and calculating the similarity of the adjacent images according to the normalized vector.
In one embodiment, the extracting information points from the segmented video and determining information points includes:
determining a first candidate information point according to text information corresponding to the segmented video;
determining a second candidate information point according to the image information in the segmented video;
and determining the information points of the segmented video according to the first candidate information points and the second candidate information points.
The second aspect of the present application provides a teaching video segmentation and information point extraction device, including:
the information extraction module is used for acquiring a teaching video and reading image information in the teaching video; extracting text information in the teaching video;
the segmented video generation module is used for carrying out segmented processing on the teaching video according to the text information and the image information to generate a segmented video;
and the information point extraction module is used for extracting the information points of the segmented video according to the text information and the image information corresponding to the segmented video.
In a third aspect of the present application, an electronic device is provided, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method in the foregoing embodiments when executing the computer program.
In a fourth aspect of the present application, a computer-readable storage medium is provided, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method in the above-described embodiments.
The teaching video segmentation and information point extraction method comprises the steps of firstly obtaining a teaching video, reading image information in the teaching video, and extracting text information in the teaching video; segmenting the teaching video according to the text information and the image information to generate a segmented video; and finally, extracting information points of the segmented video to determine the information points. In the whole information point extraction process, video segmentation and information point extraction are automatically completed without manual participation, and the work efficiency of the video segmentation and information point extraction process is favorably improved.
Drawings
FIG. 1 is a flow chart illustrating a method for teaching video segmentation and information point extraction according to an embodiment;
FIG. 2 is a schematic flowchart of a method for segmenting a teaching video and extracting information points according to another embodiment;
FIG. 3 is a flowchart illustrating segmentation of a teaching video according to text information and image information to generate a segmented video according to an embodiment;
fig. 4 is a schematic flow chart illustrating a process of extracting information points from a segmented video according to text information and image information corresponding to the segmented video and determining the information points in one embodiment;
FIG. 5 is a block diagram of an apparatus for segmentation and information point extraction of a teaching video according to another embodiment;
FIG. 6 is a diagram illustrating an internal structure of an electronic device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In an embodiment, as shown in fig. 1, an information point extraction method is provided, and in this embodiment, the method is applied to a terminal for example, it is understood that the method may also be applied to a server, and may also be applied to a system including a terminal and a server, and is implemented by the terminal of the terminal and the server. In this embodiment, the information point extraction method includes steps S200 to S800.
Step S200: and acquiring a teaching video and reading image information in the teaching video.
The teaching video is a video including an image and an audio. The content of the teaching video can be teaching contents of subjects such as Chinese, mathematics, English and the like, and can also be other types of teaching videos such as cooking skill, flower arrangement and the like. In short, the embodiment of the present application does not limit the specific content and subject type of the teaching video. The image information in the teaching video refers to a set of images of each frame in the teaching video. Specifically, after a teaching video is acquired, image information in the teaching video is read.
Step S400: and extracting text information in the teaching video.
The text information in the teaching video comprises a subtitle text and an audio text. In general, a video file and a subtitle file in a teaching video are stored separately, and not all teaching videos contain subtitles. Preferably, after the teaching video is obtained, firstly, whether a caption file is attached to the teaching video is judged, and if yes, the caption file is read to obtain a caption text; and if not, directly extracting the audio text.
In one embodiment, the text information is an audio text, and the specific process of extracting the audio text in the teaching video is as follows: and extracting audio content in the teaching video, and correspondingly storing text content and time line in the audio content according to an audio-to-text technology to obtain an audio text.
Specifically, the audio-to-text technology is adopted to convert the audio content, and the text content and the timeline are stored in a one-to-one correspondence manner, so that the format of the obtained audio text is shown in the following table.
Figure BDA0002955547460000051
Figure BDA0002955547460000061
Step S600: and carrying out segmentation processing on the teaching video according to the text information and the image information to generate a segmented video.
The segmentation processing is to divide the teaching video into a plurality of segmented videos according to a time line based on a certain standard. Specifically, the specific content of the teaching video can be acquired according to the text information and the image information obtained in the above steps. And splitting the teaching video according to the specific content of the teaching video, determining segmentation points, and dividing the teaching video into a plurality of segmentation videos.
Step S800: and extracting information points of the segmented video according to the text information and the image information corresponding to the segmented video, and determining the information points.
The information points refer to keywords or key sentences that can be used to represent the corresponding segmented video. Specifically, according to text information and image information corresponding to the segmented video, candidate information points in the video can be extracted, and then the candidate information points are screened according to a preset algorithm, so that the final information point can be determined.
Furthermore, after the information points are determined, the segmented video, the segmented points and the information points can be correspondingly stored, so that the segmented video is convenient to search. The segmentation point may refer to a start point or an end point of the corresponding segmented video.
The teaching video segmentation and information point extraction method comprises the steps of firstly obtaining a teaching video, reading image information in the teaching video, and extracting text information in the teaching video; segmenting the teaching video according to the text information and the image information to generate a segmented video; and finally, extracting information points of the segmented video to determine the information points. In the whole information point extraction process, video segmentation and information point extraction are automatically completed without manual participation, and the work efficiency of the video segmentation and information point extraction process is favorably improved.
In one embodiment, referring to fig. 2, step S600 includes steps S620 to S680.
Step S620: and determining a preliminary segmentation point according to the text information.
Specifically, the interval word set may be preset, the interval words in the text information are extracted, and the preliminary segmentation point is determined according to the time point of the interval word. Wherein the spacer words include, but are not limited to, "next section," "now," "start," "complete," etc.
In addition, the preliminary segmentation point may also be determined according to the interval time. The interval time is the pause time between sentences in the course of teaching by the teacher, and the text time interval in the text information can be obtained according to the text information obtained in the above steps, namely the interval time. Preferably, according to the text information, time points in the text information where the text time interval is greater than a preset interval threshold value are extracted, and a preliminary segmentation point is determined. The starting position of the first sentence in the text information is a first preliminary segmentation point, and the ending position of the last sentence is a last preliminary segmentation point.
Specifically, the starting point or the end point of the text interval may be determined as a preliminary segmentation point; the intermediate time point of the text interval may also be determined as a preliminary segmentation point. For example: the sentence sequence is set as S 1 ,S 2 …S i ,S i+1 ,…S n With subscript s denoting the beginning of the sentence and e denoting the end of the sentence. Let the set of intervals between two adjacent sentences be { T } 1 ,T 2 …T i ,T i+1 ,…T n-1 Is then T i =S (i+1)s -S ie The preliminary segmentation point may be D i =S ie 、D i =S (i+1)s Or D i =S ie +T i /2。
Step 640: and determining secondary segmentation points according to the image information.
Specifically, the teaching video generally includes a presentation document. According to the image information, the image information can be processed first, the demonstration document information in the image information is extracted, and the time node corresponding to the demonstration document information is marked. And determining a time node for switching to the next title according to the title text in the demonstration document information, and determining the time node as a secondary segmentation point.
In addition, secondary segmentation points can be determined according to the similarity of adjacent images. Specifically, images may be extracted according to image information at preset time intervals, the similarity between adjacent images is calculated, and if the similarity is smaller than a first preset similarity threshold, a time point between corresponding adjacent images is determined as a secondary segmentation point. The preset time interval may be 2S, 3S or other time intervals.
Further, in an embodiment, the process of extracting images according to the image information and the preset time interval and calculating the similarity between adjacent images includes: extracting images according to preset time intervals according to the image information; converting the extracted image into a four-level gray image, and vectorizing the four-level gray image to obtain an image vector; carrying out standardization processing on the image vector to obtain a standardized vector; and calculating the similarity of the adjacent images according to the normalized vector.
Where a gray-scale digital image refers to an image having only one sample color per pixel, such images are typically displayed as gray scales from the darkest black to the brightest white. Grayscale images differ from black and white images in that they have only two colors, black and white, and grayscale images also have many levels of excess color between black and white. The four-level gray image is a gray image with two excessive colors added between white and black. The image is converted into a four-level gray level image, so that the influence of non-key factors such as background color brightness and the like can be reduced on one hand, and content information in the image, such as character content, character attributes and the like, can be well maintained on the other hand. Specifically, two adjacent four-level gray level images are vectorized to obtain image vectors X and Y according to a formula
Figure BDA0002955547460000081
And
Figure BDA0002955547460000082
calculating 2 norms of two image vectors and obtaining a normalized vector Norm after normalization processing X =X/L X And Norm Y =Y/L Y . Finally, according to the dot product formula S ═ Norm of the vector X Norm Y ComputingAnd obtaining the cosine similarity of the two images, namely the similarity of the adjacent images.
Step S660: and determining a final segmentation point according to the primary segmentation point and the secondary segmentation point.
Specifically, the primary segmentation point and the secondary segmentation point are fused according to a time line, the obtained segmentation point sets are merged, and the situation that the time interval between adjacent segmentation points is too short may occur in the merged set. In order to avoid the situation, a Viterbi algorithm is used for searching the combined set and screening out the optimal segmentation point, and the specific method comprises the following steps:
for the preliminary segmentation points, each segmentation point contains an interval time attribute. For the secondary segmentation points, each segmentation point contains a similarity degree value attribute. Because the data types of the two are different, firstly, the data of the two are converted into a reasonable measurement standard, normalization processing is carried out, then, the Viterbi algorithm is used for solving the segmentation point sequence with the highest score, and the time interval between the adjacent segmentation points is ensured to be larger than the minimum duration threshold value through the constraint condition of the algorithm, so that the final segmentation point set can be obtained. Wherein the minimum duration threshold may be 5 minutes, 10 minutes, or any other duration.
Step S680: and according to the final segmentation point, carrying out segmentation processing on the teaching video to generate a segmented video.
In the above embodiment, the primary segmentation point is determined according to the text information, the secondary segmentation point is determined according to the video information, the two segmentation points are fused by using a preset algorithm, the final segmentation point is determined, the teaching video is segmented, and the segmented video is generated, so that the segmentation accuracy can be improved, and the extraction accuracy of the video information point is further improved.
In one embodiment, referring to fig. 3, step S800 includes steps S820 to S860.
Step S820: and determining a first candidate information point according to the text information corresponding to the segmented video.
Specifically, for the subtitle text or the audio text corresponding to each segmented video, a preset algorithm may be used to obtain the probability distribution of the keyword information points, and the keyword information points with the probability distribution greater than a preset probability threshold are used as the first candidate information points.
Step S840: and determining a second candidate information point according to the image information corresponding to the segmented video.
Specifically, the text content in the image information and the attribute of the corresponding text content can be extracted according to the image information of which the similarity value is smaller than the second preset similarity threshold value in the segmented video, so that the repeated recognition of the image with high similarity can be avoided, and the efficiency is improved under the condition that the content is not lost. And the corresponding text content meeting the preset attribute condition is used as a second candidate information point.
The attribute of the text content may be a text size, and the preset attribute condition may be that a font size is larger than a set font size. Furthermore, the attribute of the text content may further include a font, whether the font is bold, a gray value, and the like, and similarly, different preset attribute conditions may be set according to the specific content of the attribute. For example, according to the format of the presentation document, the text size and font of the primary directory and the secondary directory may be determined, a preset attribute condition may be set, and the second candidate information point may be determined according to the preset attribute condition. The method for extracting the attribute of the text content can be to use an OCR technology to recognize the text content in the image and retain the text attribute. Moreover, because a large amount of presentation documents are used in the teaching video, in order to extract information points more accurately, the embodiment retains the attributes which are important to the information points: the size of the text.
Step S860: and determining the information points of the segmented video according to the first candidate information points and the second candidate information points.
Specifically, according to the first candidate information point and the second candidate information point, the information point with high text similarity in the first candidate information point and the second candidate information point is extracted by using a cosine similarity technology, and then the information point of the segmented video can be determined.
It should be understood that, although the steps in the flowcharts involved in the above embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in each flowchart involved in the above embodiments may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
In one embodiment, as shown in fig. 4, there is provided a teaching video segmentation and information point extraction apparatus, including: the information extraction module 100 is configured to acquire a teaching video, read image information in the teaching video, and extract text information in the teaching video; the segmented video generating module 200 is configured to perform segmented processing on the teaching video according to the text information and the image information to generate a segmented video; the information point extracting module 300 is configured to extract information points of the segmented video according to the text information and the image information corresponding to the segmented video.
In one embodiment, please refer to fig. 5, the apparatus further includes a storage module 400 for correspondingly storing the segmented video and the segmentation points and the information points thereof.
In one embodiment, the information extraction module 100 is specifically configured to: and extracting audio content in the teaching video, and correspondingly storing text content and time line in the audio content according to an audio-to-text technology to obtain an audio text.
In one embodiment, the segmented video generation module 200 includes: a preliminary segmentation point determination unit for determining a preliminary segmentation point according to the text information; a secondary segmentation point determining unit, configured to determine a secondary segmentation point according to the image information; a final segmentation point determining unit, configured to determine a final segmentation point according to the primary segmentation point and the secondary segmentation point; and the segmented video generating unit is used for carrying out segmented processing on the teaching video according to the final segmentation point to generate a segmented video.
In an embodiment, the preliminary segmentation point determination unit is specifically configured to: and extracting time points of which the text time interval is larger than a preset interval threshold value in the text information according to the text information, and determining a preliminary segmentation point.
In an embodiment, the secondary segmentation point determination unit is specifically configured to: extracting images according to the image information and preset time intervals, and calculating the similarity of adjacent images; and if the similarity is smaller than a first preset similarity threshold, determining the time point between the corresponding adjacent images as a secondary segmentation point.
In an embodiment, the secondary segmentation point determination unit is specifically configured to: extracting images according to preset time intervals according to the image information; converting the extracted image into a four-level gray image, and vectorizing the four-level gray image to obtain an image vector; standardizing the image vector to obtain a standardized vector; and calculating the similarity of the adjacent images according to the normalized vector.
In an embodiment, the information point extracting module 300 is specifically configured to: determining a first candidate information point according to text information corresponding to the segmented video; determining a second candidate information point according to image information in the segmented video; and determining the information points of the segmented video according to the first candidate information points and the second candidate information points.
For the specific definition of the knowledge point annotation device, reference may be made to the above definition of the knowledge point annotation method, which is not described herein again. The modules in the knowledge point labeling apparatus can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, an electronic device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 6. The electronic device comprises a processor, a memory, a communication interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the electronic device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a teaching video information point extraction method. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the configuration shown in fig. 6 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A teaching video segmentation and information point extraction method is characterized by comprising the following steps:
acquiring a teaching video, and reading image information in the teaching video;
extracting text information in the teaching video;
segmenting the teaching video according to the text information and the image information to generate a segmented video;
and extracting information points of the segmented video according to the text information and the image information corresponding to the segmented video, and determining the information points.
2. The method of claim 1, wherein the text information is audio text, and the extracting the text information from the teaching video comprises:
and extracting audio content in the teaching video, and correspondingly storing text content and time line in the audio content according to an audio-to-text technology to obtain an audio text.
3. The method for segmenting and extracting the information points in the teaching video according to claim 1, wherein the step of segmenting the teaching video according to the text information and the image information to generate a segmented video comprises:
determining a preliminary segmentation point according to the text information;
determining secondary segmentation points according to the image information;
determining a final segmentation point according to the primary segmentation point and the secondary segmentation point;
and according to the final segmentation point, carrying out segmentation processing on the teaching video to generate a segmented video.
4. The method as claimed in claim 3, wherein the step of determining the preliminary segmentation points according to the text information comprises:
and extracting time points of which the text time interval is greater than a preset interval threshold value in the text information according to the text information, and determining a preliminary segmentation point.
5. The method as claimed in claim 3, wherein said determining secondary segmentation points according to the image information comprises:
extracting images according to the image information and preset time intervals, and calculating the similarity of adjacent images;
and if the similarity is smaller than a first preset similarity threshold, determining a time point between corresponding adjacent images as a secondary segmentation point.
6. The teaching video segmentation and information point extraction method as claimed in claim 5, wherein the extracting images according to the image information and the preset time interval and calculating the similarity of adjacent images comprises:
extracting images according to the image information and preset time intervals;
converting the extracted image into a four-level gray image, and vectorizing the four-level gray image to obtain an image vector;
standardizing the image vector to obtain a standardized vector;
and calculating the similarity of the adjacent images according to the normalized vector.
7. The teaching video segmentation and information point extraction method according to claim 1, wherein the extracting information points from the segmented video and determining information points comprises:
determining a first candidate information point according to the text information corresponding to the segmented video;
determining a second candidate information point according to the image information in the segmented video;
and determining the information points of the segmented video according to the first candidate information points and the second candidate information points.
8. The utility model provides a teaching video segmentation and information point extraction element which characterized in that includes:
the information extraction module is used for acquiring a teaching video and reading image information in the teaching video; extracting text information in the teaching video;
the segmented video generating module is used for carrying out segmented processing on the teaching video according to the text information and the image information to generate a segmented video;
and the information point extraction module is used for extracting the information points of the segmented video according to the text information and the image information corresponding to the segmented video.
9. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202110223138.8A 2021-03-01 2021-03-01 Teaching video segmentation and information point extraction method, device, electronic equipment and medium Pending CN114996510A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110223138.8A CN114996510A (en) 2021-03-01 2021-03-01 Teaching video segmentation and information point extraction method, device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110223138.8A CN114996510A (en) 2021-03-01 2021-03-01 Teaching video segmentation and information point extraction method, device, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN114996510A true CN114996510A (en) 2022-09-02

Family

ID=83018866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110223138.8A Pending CN114996510A (en) 2021-03-01 2021-03-01 Teaching video segmentation and information point extraction method, device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN114996510A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226453A (en) * 2023-05-10 2023-06-06 北京小糖科技有限责任公司 Method, device and terminal equipment for identifying dancing teaching video clips

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226453A (en) * 2023-05-10 2023-06-06 北京小糖科技有限责任公司 Method, device and terminal equipment for identifying dancing teaching video clips
CN116226453B (en) * 2023-05-10 2023-09-26 北京小糖科技有限责任公司 Method, device and terminal equipment for identifying dancing teaching video clips

Similar Documents

Publication Publication Date Title
US11062090B2 (en) Method and apparatus for mining general text content, server, and storage medium
CN108549643B (en) Translation processing method and device
WO2021062990A1 (en) Video segmentation method and apparatus, device, and medium
US11475588B2 (en) Image processing method and device for processing image, server and storage medium
CN111462554A (en) Online classroom video knowledge point identification method and device
CN113177435A (en) Test paper analysis method and device, storage medium and electronic equipment
CN107844531B (en) Answer output method and device and computer equipment
CN113436222A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN112396032A (en) Writing detection method and device, storage medium and electronic equipment
CN112528799B (en) Teaching live broadcast method and device, computer equipment and storage medium
CN111542817A (en) Information processing device, video search method, generation method, and program
CN114996510A (en) Teaching video segmentation and information point extraction method, device, electronic equipment and medium
CN113641837A (en) Display method and related equipment thereof
CN113505786A (en) Test question photographing and judging method and device and electronic equipment
CN111881900A (en) Corpus generation, translation model training and translation method, apparatus, device and medium
CN112800177B (en) FAQ knowledge base automatic generation method and device based on complex data types
CN111008295A (en) Page retrieval method and device, electronic equipment and storage medium
CN115273057A (en) Text recognition method and device, dictation correction method and device and electronic equipment
CN114399645A (en) Multi-mode data expansion method, system, medium, computer equipment and terminal
CN114173191A (en) Multi-language question answering method and system based on artificial intelligence
CN114331932A (en) Target image generation method and device, computing equipment and computer storage medium
CN113569112A (en) Tutoring strategy providing method, system, device and medium based on question
CN111582281A (en) Picture display optimization method and device, electronic equipment and storage medium
Rai et al. MyOcrTool: visualization system for generating associative images of Chinese characters in smart devices
CN117173945A (en) Problem processing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination