CN112668561B

CN112668561B - Teaching video segmentation determination method and device

Info

Publication number: CN112668561B
Application number: CN202110278404.7A
Authority: CN
Inventors: 王鑫龙; 卢波; 王凯夫; 彭守业
Original assignee: Beijing Century TAL Education Technology Co Ltd
Current assignee: Beijing Century TAL Education Technology Co Ltd
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2022-03-29
Anticipated expiration: 2041-03-16
Also published as: CN112668561A

Abstract

The application provides a segmentation determination method and device for teaching videos, wherein the method comprises the following steps: extracting frame images of the teaching video according to a first period to form an ordered image set; comparing the adjacent frame images in the ordered image set to determine a temporary segmentation point; determining a first frame image and a second frame image according to the temporary segmentation points; comparing the blackboard writing areas of the first frame image and the second frame image to determine a first similarity; the first frame image is a frame image before the temporary segmentation point, and the second frame image is a frame image after the temporary segmentation point; deleting the corresponding temporary segmentation points according to the first similarity; and adopting the residual temporary segmentation points as actual segmentation points of the teaching video. The content of the blackboard writing area has a direct relation with the teaching theme, so that the temporary segmentation points can be accurately screened by using the content of the blackboard writing area, and the teaching video can be segmented according to the content characteristics of the teaching video.

Description

Teaching video segmentation determination method and device

Technical Field

The application relates to the technical field of video processing, in particular to a segmentation determination method and device for teaching videos.

Background

In order to quickly locate a specific content area in a video and meet the requirements of quick search and specific content extraction, a method for identifying the content of videos such as movies and films and determining segmentation according to the content of the videos is provided at present.

A method for identifying videos such as movies and videos such as documentaries to determine index points is mainly characterized in that segmentation indexes are determined by utilizing large-scale changes of image frames based on the fact that videos of the movies and videos have scene change characteristics and scenes before and after changing have obvious distinguishing characteristics.

Because the teaching video has the characteristic that the scene characteristics are basically unchanged, the segmentation determination method suitable for the videos and the documentaries is not suitable for the teaching video. There is still a need for a method of determining an index for an educational video by manual indexing.

Disclosure of Invention

In order to solve the technical problem or at least partially solve the technical problem, the present application provides a segmentation method and a segmentation apparatus for teaching videos.

On one hand, the application provides a segmentation method of teaching videos, which comprises the following steps:

extracting frame images of the teaching video according to a first period to form an ordered image set;

comparing the adjacent frame images in the ordered image set to determine a temporary segmentation point;

comparing the blackboard writing areas of the first frame image and the second frame image to determine a first similarity; the first frame image is a frame image before the temporary segmentation point, and the second frame image is a frame image after the temporary segmentation point;

deleting the corresponding temporary segmentation points according to the first similarity;

and adopting the residual temporary segmentation points as actual segmentation points of the teaching video.

Optionally, comparing the blackboard-writing areas of the first frame image and the second frame image, and determining a first similarity, includes:

by usingSSIMComparing blackboard writing areas of the first frame image and the second frame image by using an algorithm, and determining the first similarity; and/or the presence of a gas in the gas,

comparing the blackboard writing areas of the first frame image and the second frame image by adopting a cosine distance method, and determining the first similarity; and/or the presence of a gas in the gas,

text content recognition is carried out on the blackboard writing areas of the first frame image and the second frame image, and two recognition texts are obtained;

and determining the first similarity according to the two recognition texts.

Optionally, comparing the two recognized texts to determine the first similarity includes:

comparing the two identification texts to obtain an editing distance;

and determining the first similarity according to the editing distance and the length of one recognition text.

Optionally, in the case that the teaching type video has a character image,

comparing adjacent ones of the frame images in the ordered set of images to determine temporal segmentation points, comprising: and comparing the regions of the frame images which do not comprise the character image, and determining the temporary segmentation points.

Optionally, comparing adjacent frame images of the ordered image set to determine a temporary segmentation point comprises:

by usingSSIMComparing the adjacent frame images by an algorithm to obtain a second similarity;

and determining to set the temporary dividing point between the adjacent frame images in the case that the second similarity is smaller than a second threshold value.

Optionally, the method further comprises: comparing the first frame image with the second frame image by adopting a cosine distance method to obtain a third similarity;

deleting the temporary segmentation point corresponding to the third similarity under the condition that the third similarity is larger than a third threshold; and/or the presence of a gas in the gas,

obtaining a residual image according to the first frame image and the second frame image;

and determining whether to delete the corresponding temporary segmentation point according to the residual image, the first frame image and the second frame image.

Optionally, obtaining a residual image according to the first frame image and the second frame image includes:

calculating a first residual image and a second residual image; the first residual image is a difference image between the first frame image and a second frame image, and the second residual image is a difference image between the second frame image and the first frame image;

determining whether to delete the corresponding temporary segmentation point according to the residual image, the first frame image and the second frame image, including:

calculating a fourth similarity according to the first residual image and the first frame image, and calculating a fifth similarity according to the second residual image and the second frame image;

determining a larger value and a smaller value of a fourth similarity and the fifth similarity;

and deleting the corresponding temporary dividing point under the condition that the larger value is smaller than a fourth threshold value or the smaller value is smaller than a fifth threshold value.

Optionally, the method further comprises: determining the sound intensity in each second period in the teaching video;

setting the corresponding audio identifier of the second period as a mute identifier under the condition that the sound intensity is smaller than a preset intensity threshold; or, when the sound intensity is greater than a preset intensity threshold, setting the corresponding audio identifier of the second period as an audio identifier;

determining audio dividing points according to the change characteristics of the audio identifiers corresponding to the second periods;

adopting the remaining temporary segmentation points as actual segmentation points of the teaching video, comprising: and adopting the temporary dividing point and the audio dividing point as the actual dividing point.

Optionally, determining an audio dividing point according to the variation characteristic of the audio identifier corresponding to each second period includes:

judging whether the audio identifiers of the second period of a certain number of continuous periods are mute identifiers or not;

if yes, keeping the audio identifiers of the continuous specific number of the second periods unchanged;

if not, modifying the mute identification in the audio identifications in the second period with continuous specific number into an audio identification;

and setting the audio dividing point at the position where the audio identification changes.

Optionally, the method further comprises: and extracting a video clip of the teaching video between two adjacent actual segmentation points to serve as an extracted clip.

Optionally, in a case that the teaching type video has a character image, extracting a video segment of the teaching type video between two adjacent actual segmentation points includes:

determining the number of people in each frame image in the ordered image set;

counting the number of frame images which are positioned in the ordered image set in a time period determined by two adjacent actual segmentation points and contain more than a preset number of people;

and under the condition that the number of the frame images is less than the preset number, extracting a video clip positioned between two adjacent actual segmentation points.

Optionally, the method further comprises: processing the extracted segments, or processing the frame images corresponding to the extracted segments in the ordered image set, and determining segment topics;

and indexing the extracted fragment by adopting the fragment theme.

On the other hand, this application provides a segmentation device of teaching type video, includes:

the extraction unit is used for extracting frame images of the teaching video according to a first period to form an ordered image set;

a segmentation point primary selection unit, configured to compare adjacent frame images in the ordered image set, and determine a temporary segmentation point;

a cut point deleting unit for determining a first frame image and a second frame image according to the temporary cut points; comparing the blackboard writing areas of the first frame image and the second frame image to determine a first similarity; deleting the corresponding temporary segmentation points according to the first similarity; the first frame image is a frame image before the temporary segmentation point, and the second frame image is a frame image after the temporary segmentation point;

and the actual segmentation point determining unit is used for adopting the residual temporary segmentation points as the actual segmentation points of the teaching video.

Optionally, the cut point deleting unit adoptsSSIMComparing blackboard writing areas of the first frame image and the second frame image by using an algorithm, and determining the first similarity; and/or the presence of a gas in the gas,

text content recognition is carried out on the blackboard writing areas of the first frame image and the second frame image, and two recognition texts are obtained; and determining the first similarity according to the two recognition texts.

Optionally, the cut point deleting unit is further configured to:

comparing the first frame image with the second frame image by adopting a cosine distance method to obtain a third similarity; and deleting the temporary segmentation point corresponding to the third similarity when the third similarity is larger than a third threshold; and/or the presence of a gas in the gas,

obtaining a residual image according to the first frame image and the second frame image; and determining whether to delete the corresponding temporary segmentation point according to the residual image, the first frame image and the second frame image.

Optionally, the method further comprises: the sound intensity determining unit is used for determining the sound intensity in each second period in the teaching video;

a sound identifier determining unit, configured to set the corresponding audio identifier of the second period as a mute identifier when the sound intensity is smaller than a preset intensity threshold; or, when the sound intensity is greater than a preset intensity threshold, setting the corresponding audio identifier of the second period as an audio identifier;

the audio dividing point determining unit is used for determining audio dividing points according to the change characteristics of the audio identifiers corresponding to the second periods;

the actual segmentation point determining unit uses the temporary segmentation point and the audio segmentation point as the actual segmentation point.

Optionally, the method further comprises: and the extraction unit is used for extracting a video clip of the teaching video between two adjacent actual segmentation points as an extraction clip.

Optionally, the method further comprises: the theme determining unit is used for processing the extracted fragments or processing the frame images corresponding to the extracted fragments in the ordered image set to determine the theme of the fragments; and indexing the extracted segment with the segment theme.

According to the teaching video segmentation method and device, after the temporary segmentation points are determined, the temporary segmentation points of the contents of the blackboard writing areas of adjacent frames are used for deleting operation, and the temporary segmentation points which are not suitable for being used as actual segmentation points are eliminated according to the contents of the blackboard writing areas. The content of the blackboard writing area has a direct relation with the teaching theme, so that the teaching video can be accurately segmented according to the content characteristics of the teaching video by adopting the method.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive labor;

fig. 1 is a flowchart of a segmentation method for teaching-type video according to an embodiment of the present application;

FIG. 2 is a flow chart of determining audio cut points according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a teaching-type video segmentation apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

wherein: 11-extraction unit, 12-segmentation point initial selection unit, 13-segmentation point deletion unit, 14-actual segmentation point determination unit, 21-processor, 22-memory, 23-communication interface, 24-bus system.

Detailed Description

In order that the above-mentioned objects, features and advantages of the present application may be more clearly understood, the solution of the present application will be further described below. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways than those described herein; it is to be understood that the embodiments described in this specification are only some embodiments of the present application and not all embodiments.

The embodiment of the application provides a segmentation method of a teaching video, which selects a specific segmentation strategy based on the characteristics of the teaching video to realize automatic determination of segmentation points of the teaching video, and subsequent possible label addition and fragment interception.

It should be noted that the teaching video in the embodiment of the present application is a specific video, and is characterized in that the educational video has a certain writing area. The writing area described here is not to be construed narrowly as a handwritten content area, but is to be understood as an area in which a trainee is presented with content by characters, graphics, or pictures in order to present teaching content to the trainee, and may be a handwritten content area or an area displayed by a display or a projection display.

It should be noted that the content of the blackboard-writing area gradually changes as the teaching content advances, including: (1) within the time corresponding to a specific teaching content, the blackboard writing content can be gradually increased until the teaching content is completely displayed; (2) when switching from one teaching content to the next teaching content, the display content in the blackboard-writing area is cleared.

Fig. 1 is a flowchart of a segmentation method for teaching videos according to an embodiment of the present application. As shown in fig. 1, the segmentation method for teaching videos in the embodiment of the present application includes steps S101 to S105.

S101: and extracting frame images of the teaching video according to a first period to form an ordered image set.

In order to ensure that smooth video content is formed, the frame frequency of the teaching video is more than 24 frames/second (the teaching video in practical application may be 30 frames/second, 50 frames/second and 60 frames/second); because the time interval between the adjacent image frames is small, the rate of change of the image content of the two adjacent video frames is not large (it is possible that the content of the video frame image within a certain set period is also not large), and it is not easy to determine whether the two adjacent video frames are determined as the dividing point.

In order to overcome the content mentioned in the previous paragraph, in the embodiment of the present application, according to the method of periodic sampling, the frame images of the teaching class are extracted according to the first period, and an ordered image set for subsequent processing is formed.

In the embodiment of the application, the first period may be determined according to the length of the teaching video, the video content type of the teaching video, and the computer processing capability for performing the subsequent processing. In practical application, the first period is set to be 1s at most; of course, if the teaching progress of the teaching-type video is slow, or the computer processing capability for the subsequent processing is limited, the first period may be set to other values such as 2s or 5 s.

It should also be noted that the ordered image set may be an image set including only two frame images, or may be an image set including more frame images, and the embodiment of the present application is not particularly limited.

S102: and comparing adjacent frame images in the ordered image set to determine a temporary segmentation point.

The adjacent frame images in the ordered image set, that is, the images respectively extracted in the adjacent first cycle in step S101.

The temporal segmentation point is a time point that may be used to segment the teaching-like video. In practical application, after the temporary dividing point is determined, the temporary dividing point is used for dividing two adjacent first periods into two different teaching video clips.

In step S102, the adjacent frame images are compared, and there are various methods for determining the temporary segmentation points; for convenience, the specific implementation of step S102 is described later. It is only necessary to understand that the temporary segmentation point determined in step S102 is a segmentation point determined by using a lower determination criterion and possibly used for segmenting the teaching video; based on the above requirements, the number of temporary segmentation points is larger than the number of actual segmentation points.

S103: and determining a first frame image and a second frame image according to the temporary segmentation points, comparing blackboard writing areas of the first frame image and the second frame image, and determining a first similarity.

The first frame image is a frame image of a first period before the corresponding temporal segmentation point in the ordered image set, and the second frame image is a frame image of a first period after the corresponding temporal segmentation point in the ordered image set.

As described above, the teaching video provided by the embodiment of the present application has a blackboard-writing area. In step S103, the recognition blackboard-writing areas in the first frame image and the second frame image need to be recognized and extracted, and then the blackboard-writing areas are processed to obtain the first similarity.

In the specific application of the embodiment of the application, there are several methods for identifying and extracting the blackboard-writing area from the first frame of image and the second frame of image.

(1) Method for adopting deep learning or self-adaptive recognition

The method using the deep learning or the adaptive recognition is preferably used in the case where the blackboard-writing region has a significant contrast with other regions in the frame image, or the edge features of the blackboard-writing region are significant. This method is preferably used in a case where a camera that captures a teaching-type video may change the viewing range.

In this case, if there is no blackboard-writing area in the first frame image or the second frame image, a null area may not be obtained instead of the corresponding blackboard-writing area; in practice, the first similarity is determined by comparing the blackboard writing areas, and may be comparing one blackboard writing area with one null area, or comparing two null areas.

(2) By extracting specific location areas

The method for extracting the specific area is mostly used for shooting teaching video in a scene with a fixed camera view finding range. The method is characterized in that the position area of the blackboard writing area in the frame image is determined before the teaching video is processed, and then the position area is used as the blackboard writing area when each frame image is processed.

It should be noted that, in the case of extracting a specific area, even if the blackboard-writing area is blocked by other obstacles (for example, blocked by a trainee or a teacher), the area is still regarded as the blackboard-writing area.

In this embodiment of the application, after determining the blackboard writing area, the method for determining the first similarity according to the blackboard writing areas of the first frame image and the second frame image may be selected as follows.

(1) By usingSSIMThe algorithm compares the blackboard writing areas of the first frame image and the second frame image to determine the first similarity. In the embodiment of the present application,SSIMthe algorithm is a structure similarity method (StructuralSIMilarity) For short.

(2) And comparing the blackboard-writing areas of the first frame image and the second frame image by adopting the cosine distance to determine the first similarity. For example, the pixel gray scale of the writing area of the first frame image is x₁，x₂，……x_nThe pixel gray scale of the writing area of the second frame image isy ₁ ，y ₂ ，……y _nThe cosine distance is

。

(3) And determining the first similarity by adopting a text comparison method. Specifically, steps S1031 to S1032 are included.

S1031: and performing text content identification on the blackboard writing areas of the first frame image and the second frame image to obtain two texts.

In the specific application of the embodiment of the present application, the method can be adoptedText content recognition of the content of the blackboard writing to obtain recognized text is carried out by various methods known in the art, and the text recognition method is not repeated here (Optical character ReorganizationOCR) method, and may be embodied in the relevant technical literature or engineering practice.

In other applications of the embodiments of the present application, other methods may be used to determine the first similarity.

S1032: a first similarity is determined based on the lengths of the two recognized texts.

In a specific application of the embodiment of the present application, determining the first similarity according to the lengths of the two recognition texts includes: and comparing the two recognition texts to obtain an editing distance, and taking the ratio of the editing distance to the length of one recognition text as a first similarity.

For example, the text content length of the blackboard-writing area in the first frame image issmThe text content of the blackboard writing area in the second frame image issnEdit distance oflThen the first similarity may bel/smOrl/sn。

After the execution of step S103 is completed, the execution of steps S104 to S105 may be continued.

S104: and deleting the temporary segmentation points corresponding to the first similarity according to the first similarity.

In step S104, the corresponding temporary segmentation points are deleted according to the first similarity, and need to be specifically selected according to the text length determination method adopted in step S103.

For example, in the use ofSSIMUnder the condition that the algorithm calculates the first similarity, if the first similarity is larger than the corresponding threshold value, deleting the corresponding temporary segmentation point; and under the condition of adopting a cosine distance method and a text comparison method, if the first similarity is smaller than the corresponding threshold value, deleting the corresponding temporary segmentation point. And under the condition that the first similarity is calculated by adopting a text recognition method, if the first similarity is smaller than the corresponding threshold value, deleting the corresponding temporary segmentation point.

S105: and adopting the residual temporary segmentation points as actual segmentation points of the teaching video.

After the deleting operation of some temporary segmentation points which are not suitable for being used as actual segmentation points is completed in step S104, the temporary segmentation points which are not deleted are used as real segmentation points of the teaching video in step S105, and are used for performing segmentation extraction or labeling addition on the teaching video.

According to the teaching video segmentation method provided by the embodiment of the application, after the temporary segmentation points are determined, the temporary segmentation points of the contents of the adjacent frame blackboard writing areas are used for deleting operation, so that the temporary segmentation points which are not suitable for being used as actual segmentation points are excluded according to the blackboard writing contents. The writing content and the teaching theme have a direct relation, so the teaching video can be accurately segmented according to the content characteristics of the teaching video by adopting the method.

In educational videos specifically processed by some embodiments of the present application, there may be images of people such as instructors, trainees, and the like, particularly images that include instructors. In the teaching process, the instructor can show various body actions according to the development of teaching contents, such as checking a teaching plan, facing to a student (facing to a lens), facing to a blackboard writing area and the like. At this time, the instructor action may affect that the number of the determination temporary division points in step S102 is excessive.

To solve the foregoing problem, in some embodiments of the present application, the step S102 may be executed as follows: and comparing the regions of the frame images which do not comprise the character images, and determining the temporary segmentation points. That is, in step S102, the adjacent frame images are first identified to remove the character image portion, and then the content of the removed character image portion is subjected to feature identification.

In the embodiment of the present application, the method for determining the temporary segmentation point when performing step S102 may include the following methods: comparing the adjacent frame images by adopting an SSIM method to obtain a second similarity; then judging whether the second similarity is smaller than a second threshold value; if the second similarity is smaller than a second threshold value, setting a temporary segmentation point between adjacent frame images; if the second similarity is greater than the second threshold, a temporary cut point is not set between adjacent frame images.

In practical application, becauseSSIMThe image similarity identification method based on the statistical method has the advantages of fast image similarity identification and global contrast, so the method is preferably adopted as the method for determining the temporary segmentation points. In practical applications, the number of the determined temporary dividing points can be adjusted by setting the size of the second threshold.

Of course, in other embodiments of the present application, other processing methods may be used to determine the temporary segmentation point.

In some applications of the embodiment of the present application, in addition to the step of deleting the temporary dividing points in steps S103 and S104, other steps may be further provided for deleting the temporary dividing points, and the following methods may be adopted.

(1) Method for using cosine distance

The method using cosine distance comparison includes steps S106-S108.

S106: comparing the first frame image with the second frame image by adopting a cosine distance method to obtain a third similarity;

s107: judging whether the third similarity is smaller than a third threshold value; if yes, go to step S108.

S108: and deleting the temporary segmentation point corresponding to the third similarity.

The cosine distance calculation method mentioned in step S106 is as described above, and will not be repeated here. It should be noted that, as can be seen from the above formula for calculating the cosine distance, the cosine distance method is a method for comparing corresponding pixels, and if the gray scale change of corresponding pixels of two images is not large, the cosine distance between the two images is large, so that it can be conveniently determined whether some temporary cut points can be deleted.

(2) Method using residual comparison

The method using the residual comparison includes steps S108 to S109.

S108: and obtaining a residual image according to the first frame image and the second frame image.

In a specific application, the residual image calculated in step 108 includes a first residual image and a second residual image; the first residual image is a difference image between the first frame image and the second frame image, and the second residual image is a difference image between the second frame image and the first frame image.

S109: and determining whether to delete the corresponding temporary segmentation point according to the residual image, the first frame image and the second frame image.

In a specific application of some embodiments of the present application, step S109 can be subdivided into S1091-S1093.

S1091: and calculating a fourth similarity according to the first residual image and the first frame image, and calculating a fifth similarity according to the second residual image and the second frame image.

In a specific embodiment, the calculation of the fourth similarity and the calculation of the fifth similarity may both adoptSSIMThe method is calculated or obtained by other algorithms known in the image processing field.

S1092: and determining the larger value and the smaller value of the fourth similarity and the fifth similarity.

S1093: and deleting the corresponding temporary dividing point under the condition that the larger value is smaller than the fourth threshold value or the smaller value is smaller than the fifth threshold value.

It should be noted that in the specific application of the embodiment of the present application, the actual dividing point can be obtained by using the foregoing various kinds of deletion temporary dividing points.

In practical application, the teaching video includes audio content besides image content, the audio content and the teaching content have strong correlation, and the general probability between different teaching subjects has strong time intervals. Based on the foregoing analysis, in the embodiment of the present application, in addition to determining the actual segmentation point according to the frame image content, it is also possible to determine the audio segmentation point according to the sound feature, and take the audio segmentation point as the actual segmentation point.

Fig. 2 is a flow chart providing a determination of audio cut points according to an embodiment of the present application. As shown in fig. 2, the determining of the audio cut point step includes S201-S205.

S201: and determining the sound intensity in each second period in the teaching video.

In the embodiment of the application, the second period is a time period determined according to factors such as the content progress of the teaching video and the like; in practical applications, the second period may be the same as the first period in the foregoing, or may be different from the first period.

S202: judging whether the sound intensity is smaller than a preset intensity threshold value or not; if yes, go to S203; if not, go to step S204.

S203: and setting the audio identifier of the corresponding second period as a mute identifier.

S204: and setting the audio identifier of the corresponding second period as a sound identifier.

S205: and determining audio dividing points according to the change characteristics of the audio identifiers corresponding to the second periods.

By adopting the foregoing steps S201 to S205, after the audio identifier of each second period is determined by adopting the intensity threshold, the sound change characteristic in the teaching video can be determined according to the distribution characteristic of the audio identifier. According to the analysis, the general probability among different teaching subjects has stronger time intervals, so that the audio dividing point can be determined according to the dialectical characteristics of the audio identification.

In the embodiment of the present application, step S205 can be subdivided into steps S2051-S2054.

S2051: judging whether the audio identifiers of the second period of a certain number of continuous periods are mute identifiers or not; if yes, go to S2052; if not, go to S2053.

In the embodiment of the application, the specific number is set according to the teaching speed of the teaching video; in one particular application, the specific number may be set to 5-10.

S2052: the audio identification is maintained for a certain number of consecutive second periods.

S2053: and modifying the mute identifier in the audio identifier in a second period of a certain number of continuous periods into the sound identifier.

S2054: and setting an audio dividing point at the position where the audio identification changes.

After step S2053, the audio identification sequence of the teaching video includes a continuous voiced identification and a continuous unvoiced identification; where the silent indicator characterizes a region of greater silence, which may characterize an interval region for switching from one instructional theme to another, the location at which the audio indicator changes is set as the audio cut-off point.

Based on the foregoing steps S201 to S205, step S105 in some embodiments of the present application specifically includes: and adopting the temporary dividing point and the audio dividing point as actual dividing points.

In some applications of the embodiment of the application, after the actual segmentation point is determined, the segmentation point can be added to the corresponding position in the teaching video to serve as the index point, so that a subsequent student can quickly find the content to be watched through the index point. In other applications of the embodiment of the present application, after the actual segmentation points are determined, it may be necessary to intercept a video segment located between two adjacent actual segmentation points as an extracted segment.

In the embodiment of the present application, in the method for obtaining extracted segments, some segments that are not teaching content need to be deleted. In order to remove segments that do not include instructional content, the method provided by the embodiment of the present application may include steps S301-S305.

S301: the number of people in each frame image in the ordered image set is determined.

S302: and counting the number of the frame images in the ordered image set, wherein the frame images are positioned in the two adjacent actual segmentation points to determine the time period and contain more than the preset number of people.

S303: judging whether the number of frame images containing more than a preset number of people is larger than a preset number; if yes, go to step S304; if not, go to S305.

S304: the segment between the two actual cut points is discarded.

S305: and extracting the video segment between two adjacent actual segmentation points as an extracted segment.

In some teaching videos processed by the application of the embodiment of the application, only a teacher generally appears in a scene during a normal teaching process; the number of people in the shot scene is more than two and the number of people in the shot scene is not much when the student may answer the question; if the number of people who have a long time is more than the set number of people in the scene, the probability is not in the normal teaching. Based on this, some video segments are discarded through steps S301-S305 in some embodiments of the present application, and only some other video segments are taken as extracted segments.

In addition to the foregoing method, in some applications of the embodiments of the present application, some video segments that cannot be extracted may be removed by the following method.

(1) The video segment between two adjacent actual cut points, whose sound signatures are all silence signatures, is discarded.

(2) Discarding video segments with a video length between two adjacent actual segmentation points being smaller than a certain length; wherein the specific length is determined according to the teaching and training content.

(3) In case the content of a video frame between video segments between two adjacent actual cut points has a certain image content or audio content identifying a non-teaching time, the video segments between these two adjacent actual cut points are discarded.

In this embodiment of the present application, after determining the extracted segment, the method provided in this embodiment of the present application may further include adding a theme index to the extracted segment, so as to conveniently and quickly find the corresponding teaching extracted segment subsequently.

In the embodiment of the present application, the process of adding the theme index may include steps S401 to S402.

S401: and processing the extracted segment, or processing the frame image corresponding to the extracted segment in the ordered image set, and determining the segment theme.

In the embodiment of the application, for some specific frame images in the extraction segment or in the corresponding ordered image set, content extraction can be performed on the specific frame images to determine the segment topic. For example, in an application of the embodiment of the present application, the start frame image of the extracted segment, the frame image 1/4, the frame image 1/2, and the frame image 3/4 may be processed to obtain a segment topic.

S402: and extracting the fragments by adopting the fragment topic indexing.

Step S402 is to add the clip topic as a title or attribute content to the extracted clip.

In addition, in some embodiments of the application, under the condition that the extracted segment is not obtained, a segment topic may also be determined according to a frame image in the actual segmentation point corresponding to the teaching video, and the segment topic is used as a label of a corresponding segment of the teaching video.

Besides the teaching video segmentation method, the embodiment of the application also provides a teaching video segmentation device which has the same inventive concept as the teaching video segmentation method.

Fig. 3 is a schematic structural diagram of a teaching-type video segmentation apparatus according to an embodiment of the present application; as shown in fig. 3, in some embodiments, the segmentation apparatus for teaching-type video includes an extraction unit 11, a segmentation point initial selection unit 12, a segmentation point deletion unit 13, and an actual segmentation point determination unit 14.

The extraction unit 11 is used for extracting frame images of the teaching video according to a first period to form an ordered image set;

in order to ensure the formation of smooth video content, the frame frequency of teaching video is more than 24 frames/second; because the time interval between the adjacent image frames is small, the image content change rate of the two adjacent video frames is not large. To overcome this problem, in the embodiment of the present application, the extraction unit 11 extracts the frame images of the teaching class according to the first cycle according to a periodic sampling method, and forms an ordered image set for subsequent processing.

The segmentation point primary selection unit 12 is configured to compare adjacent frame images in the ordered image set, and determine a temporary segmentation point.

In the embodiment of the present application, the temporary dividing point determining unit may adoptSSIMThe method compares the adjacent frame images to obtain a second similarity; then judging whether the second similarity is smaller than a second threshold value; if the second similarity is smaller than a second threshold value, setting a temporary segmentation point between adjacent frame images; if the second similarity is greater than the second threshold, a temporary cut point is not set between adjacent frame images.

A cut point deletion unit 13 for determining a first frame image and a second frame image from the provisional cut points; comparing the blackboard writing areas of the first frame image and the second frame image to determine a first similarity; and deleting the corresponding temporary segmentation points according to the first similarity.

The first frame image is a frame image before the temporary segmentation point, and the second frame image is a frame image after the temporary segmentation point.

In the embodiment of the present application, theSSIMProcessing blackboard writing areas in the first frame image and the second frame image by one or more of an algorithm, a cosine distance algorithm and a text distance algorithm to determine a first similarity; whether the corresponding temporary segmentation point is deleted or not is determined according to the first similarity.

The text distance method comprises the following steps: text content recognition is carried out on the blackboard writing areas of the first frame image and the second frame image to obtain two recognition texts; comparing the two identification texts to obtain an editing distance; a first similarity is then determined based on the edit distance and a length of a recognized text.

And an actual segmentation point determining unit 14, configured to use the remaining temporary segmentation points as actual segmentation points of the teaching video.

The teaching type video segmentation device provided by the embodiment of the application utilizes some teaching type videos to have blackboard writing areas, and the blackboard writing areas have methods with great similarity in short time, after the temporary segmentation points are determined, the temporary segmentation points of the adjacent frame blackboard writing area contents are utilized to carry out deletion operation, and the temporary segmentation points which are not suitable for being used as actual segmentation points are eliminated according to the blackboard writing contents. Because the blackboard writing content and the teaching theme have direct relation, the teaching video can be accurately segmented according to the content characteristics of the teaching video by adopting the device.

In some applications of the embodiment of the application, in order to eliminate image differences caused by human actions in teaching videos and avoid determining too many temporary segmentation points, the temporary segmentation points may be determined by comparing regions of the frame images that do not include human images.

In the embodiment of the present application, the temporary dividing point determining unit may adoptSSIMThe algorithm compares the adjacent frame images to obtain a second phaseSimilarity; and determining that a temporary segmentation point is set between the adjacent frame images in the case that the second similarity is smaller than a second threshold value.

In the embodiment of the present application, the cutting point deletion unit 13 may delete the temporary cutting points by the following method in addition to deleting the temporary cutting points by comparing the blackboard writing areas.

(1) Comparing the first frame image with the second frame image by adopting a cosine distance method to obtain a third similarity; and deleting the temporary dividing point corresponding to the third similarity under the condition that the third similarity is larger than the third threshold.

(2) Obtaining a residual image according to the first frame image and the second frame image; and determining whether to delete the corresponding temporary segmentation point according to the residual image, the first frame image and the second frame image.

In a specific application, the residual image may include a first residual image and a second residual image. The first residual image is a difference image between the first frame image and the second frame image, and the second residual image is a difference image between the second frame image and the first frame image.

The step of determining whether to delete the corresponding temporary segmentation point according to the residual image, the first frame image and the second frame image comprises: calculating a fourth similarity according to the first residual image and the first frame image, and calculating a fifth similarity according to the second residual image and the second frame image; determining a larger value and a smaller value of the fourth similarity and the fifth similarity; and deleting the corresponding temporary dividing point under the condition that the larger value is smaller than the fourth threshold or the smaller value is smaller than the fifth threshold.

In some embodiments of the present application, the video segmentation apparatus for teaching type may further include a sound intensity determination unit, a sound identification determination unit, and an audio segmentation point determination unit.

The sound intensity determining unit is used for determining the sound intensity in each second period in the teaching video.

The sound identifier determining unit is used for setting the corresponding audio identifier of the second period as a mute identifier under the condition that the sound intensity is smaller than a preset intensity threshold value; or, when the sound intensity is greater than the preset intensity threshold, setting the audio identifier of the corresponding second period as a sound identifier.

The audio dividing point determining unit is used for determining the audio dividing points according to the change characteristics of the audio identifiers corresponding to the second periods.

In the case of including the aforementioned sound intensity determination unit, sound identification determination unit, and audio cut point determination unit, the actual cut point determination unit 14 employs the provisional cut point and the audio cut point as the actual cut point.

In some applications of the embodiments of the present application, the audio segmentation point determination unit determines the audio segmentation point by using the following method: (1) judging whether the audio identifiers of the second period of a certain number of continuous periods are mute identifiers or not; (2) if yes, keeping the audio identifiers of a second period of continuous specific number unchanged; if not, modifying the mute identification in the audio identification in the second period of continuous specific number into the sound identification; (3) and setting an audio dividing point at the position where the audio identification changes.

In some applications of the embodiments of the present application, an extraction unit 11 is further included. The extracting unit 11 is configured to extract a video segment of the teaching video between two adjacent actual segmentation points as an extracted segment. In one particular application, the method for obtaining the extracted fragment comprises the following steps: (1) determining the number of people in each frame image in the ordered image set; (2) counting the number of frame images which are positioned in the ordered image set in a time period determined by two adjacent actual segmentation points and contain more than a preset number of people; (3) and under the condition that the number of the frame images is less than the preset number, extracting a video clip positioned between two adjacent actual segmentation points as an extracted clip.

In some applications of the application, the segmentation device for teaching videos further comprises a main body determination unit; the theme determining unit is used for processing the extraction fragments or processing the frame images corresponding to the extraction fragments in the ordered image set to determine the theme of the fragments; and extracting the fragments by adopting the fragment theme index.

Based on the inventive concept, the application also provides an electronic device. Fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. As shown in fig. 4, the first server comprises at least one processor 21, at least one memory 22 and at least one communication interface 23. And a communication interface 23 for information transmission with an external device.

The various components in the first server are coupled together by a bus system 24. Understandably, the bus system 24 is used to enable connective communication between these components. The bus system 24 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, the various buses are labeled as bus system 24 in fig. 4.

It will be appreciated that the memory 22 in this embodiment may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. In some embodiments, memory 22 stores elements, executable units or data structures, or a subset thereof, or an expanded set thereof: an operating system and an application program.

The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic tasks and processing hardware-based tasks. Applications, including various applications, e.g. media players: (MediaPlayer) A browser (Browser) Etc. for implementing various application tasks. The program for implementing the segmentation method of the teaching video provided by the embodiment of the present disclosure may be included in an application program.

In the embodiment of the present disclosure, the processor 21 is configured to call a program or an instruction stored in the memory 22, specifically, the program or the instruction stored in the application program, and the processor 21 is configured to execute each step of the teaching video segmentation method provided by the embodiment of the present disclosure.

The segmentation method for teaching videos provided by the embodiment of the present disclosure may be applied to the processor 21, or implemented by the processor 21. The processor 21 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware or instructions in software in the processor 21. The processor 21 may be a general-purpose processor, a digital signal processor(s) ((DigitalSignalProcessor，DSP) Application specific integrated circuit(s) (ii)ApplicationSpecific IntegratedCircuit，ASIC) Ready-to-use programmable gate array (FieldProgrammableGateArray，FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The steps of the segmentation method for teaching videos provided by the embodiment of the present disclosure can be directly embodied as the execution of a hardware decoding processor, or the execution of the hardware decoding processor and a software unit in the decoding processor are combined. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in a memory 22, and the processor 21 reads the information in the memory 22 and performs the steps of the method in combination with its hardware.

The embodiments of the present disclosure further provide a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores a program or an instruction, and the program or the instruction causes a computer to execute the steps of the teaching video segmentation method in each embodiment, and in order to avoid repeated description, details are not repeated here.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A segmentation method for teaching videos is characterized by comprising the following steps:

extracting frame images of the teaching video according to a first period to form an ordered image set; the first period is determined according to the length of the teaching video, the video content type of the teaching video and the computer processing capacity for executing subsequent processing;

comparing the blackboard writing areas of the first frame image and the second frame image to determine a first similarity; the first frame image is a frame image before the temporary segmentation point, and the second frame image is a frame image after the temporary segmentation point; the method comprises the following steps: comparing blackboard writing areas of the first frame image and the second frame image by adopting an SSIM algorithm, and determining the first similarity; and/or comparing blackboard writing areas of the first frame image and the second frame image by adopting a cosine distance method, and determining the first similarity; and/or performing text content identification on the blackboard writing areas of the first frame image and the second frame image to obtain two identification texts; determining the first similarity according to the two recognition texts;

2. The method for segmenting teaching-type videos according to claim 1, wherein comparing the two recognition texts to determine the first similarity comprises:

comparing the two identification texts to obtain an editing distance;

3. The segmentation method for teaching video as claimed in claim 1, wherein in case of the teaching video having a character image,

4. A segmentation method for teaching-type video according to any one of claims 1 to 3, wherein comparing adjacent frame images of said ordered image set to determine temporal segmentation points comprises:

5. The segmentation method for teaching videos according to any one of claims 1 to 3, further comprising:

comparing the first frame image with the second frame image by adopting a cosine distance method to obtain a third similarity;

6. The method for slicing teaching video as claimed in claim 5,

obtaining a residual image according to the first frame image and the second frame image, including:

7. The segmentation method for teaching video according to claim 1, further comprising:

determining the sound intensity in each second period of the teaching video;

8. The method of claim 7, wherein determining audio segmentation points according to the variation characteristics of the audio identifier corresponding to each second period comprises:

9. The segmentation method for teaching video according to claim 1 or 7, further comprising:

and extracting a video clip of the teaching video between two adjacent actual segmentation points to serve as an extracted clip.

10. The method for segmenting teaching video according to claim 1 or 7, wherein in the case that the teaching video has a human image, extracting a video segment of the teaching video between two adjacent actual segmentation points comprises:

determining the number of people in each frame image in the ordered image set;

11. The method for segmenting teaching video according to claim 9, further comprising:

processing the extracted segments, or processing the frame images corresponding to the extracted segments in the ordered image set, and determining segment topics;

and indexing the extracted fragment by adopting the fragment theme.

12. The utility model provides a segmentation device of teaching type video which characterized in that includes:

the extraction unit is used for extracting frame images of the teaching video according to a first period to form an ordered image set; the first period is determined according to the length of the teaching video, the video content type of the teaching video and the computer processing capacity for executing subsequent processing;

the segmentation point deleting unit is used for comparing the blackboard writing areas of the first frame image and the second frame image and determining a first similarity; deleting the corresponding temporary segmentation points according to the first similarity; the first frame image is a frame image before the temporary segmentation point, and the second frame image is a frame image after the temporary segmentation point; the said cutting point deleting unit adoptsSSIMComparing blackboard writing areas of the first frame image and the second frame image by using an algorithm, and determining the first similarity; and/or comparing blackboard writing areas of the first frame image and the second frame image by adopting a cosine distance method, and determining the first similarity; and/or performing text content identification on the blackboard writing areas of the first frame image and the second frame image to obtain two identification texts; determining the first similarity according to the two recognition texts;

13. The apparatus for slicing teaching video as claimed in claim 12, wherein the slicing point deleting unit is further configured to:

14. The apparatus for slicing teaching video as claimed in claim 12, further comprising:

the sound intensity determining unit is used for determining the sound intensity in each second period in the teaching video;

15. The apparatus for slicing teaching video according to any of claims 12-14, further comprising:

and the extraction unit is used for extracting a video clip of the teaching video between two adjacent actual segmentation points as an extraction clip.

16. The apparatus for slicing teaching video as claimed in claim 15, further comprising:

the theme determining unit is used for processing the extracted fragments or processing the frame images corresponding to the extracted fragments in the ordered image set to determine the theme of the fragments; and the number of the first and second groups,

and indexing the extracted fragment by adopting the fragment theme.

17. An electronic device comprising a processor and a memory;

the processor is configured to execute the steps of the segmentation method for instructional videos according to any one of claims 1 to 11 by calling a program or instructions stored in the memory.

18. A computer-readable storage medium, characterized in that it stores a program or instructions for causing a computer to execute the steps of the segmentation method for instructional videos according to any one of claims 1 to 11.