CN113033662A

CN113033662A - Multi-video association method and device

Info

Publication number: CN113033662A
Application number: CN202110319988.8A
Authority: CN
Inventors: 王泽晶; 刘一剑; 王雅辉; 沈哲吉; 沈来信
Original assignee: Beijing Thunisoft Information Technology Co ltd
Current assignee: Beijing Thunisoft Information Technology Co ltd
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2021-06-25

Abstract

The application discloses a multi-video association method and a multi-video association device, so that association matching of a plurality of long videos is realized. The multi-video association method comprises the following steps: acquiring at least two video files; segmenting the video file according to the color feature similarity of the video frames of the video file, and at least generating a plurality of video subfiles; respectively generating characteristic indexes of the video subfiles according to a preset data structure so as to retrieve the information of the video subfiles; comparing the feature index similarity of each video subfile; and determining the video subfiles with the feature index similarity exceeding a second preset threshold value, and generating the associated information.

Description

Multi-video association method and device

Technical Field

The present application relates to the field of media file processing technologies, and in particular, to a multi-video association method and apparatus.

Background

At present, video retrieval is widely applied to large internet video websites or short video APPs.

The video retrieval technology can be understood as searching useful or required data from videos, and is mainly used for performing operations such as video scene boundary detection, video shot detection, key frame extraction, index establishment, index storage and the like on input video files and finally associating video clips with matched similarity.

In the process of realizing the prior art, the inventor finds that:

at present, the video retrieval only carries out similarity matching calculation aiming at short videos, and cannot carry out correlation on multiple videos.

Therefore, it is desirable to provide a multi-video association method and apparatus for implementing association matching of several long videos.

Disclosure of Invention

The embodiment of the application provides a multi-video association method and device, which are used for realizing association matching of a plurality of long videos.

The multi-video association method provided by the application comprises the following steps:

acquiring at least two video files;

according to the color feature similarity of the video frames of the first video file, segmenting the first video file to generate at least a third video subfile and a fourth video subfile;

according to the color feature similarity of the video frames of the second video file, the second video file is segmented, and at least a fifth video subfile and a sixth video subfile are generated;

respectively generating feature indexes of a third video subfile, a fourth video subfile, a fifth video subfile and a sixth video subfile according to a preset data structure so as to retrieve information of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile;

comparing the feature index similarity of the fourth video subfile, the fifth video subfile and the sixth video subfile relative to the third video subfile;

determining at least one video subfile of a fourth video subfile, a fifth video subfile and a sixth video subfile, wherein the feature index similarity of the fourth video subfile, the fifth video subfile and the sixth video subfile exceeds a second preset threshold value relative to the third video subfile, and using the video subfile as an intermediate video subfile;

and associating the intermediate video subfile with the third video subfile to generate association information.

Further, according to the color feature similarity of the video frames of the first video file, the first video file is segmented, and at least a third video subfile and a fourth video subfile are generated, which specifically includes:

comparing the gray value of the video frame in the first video file frame by frame;

selecting the current frame as a first video frame and the next frame as a second video frame when the gray value similarity of the next frame and the current frame exceeds a first preset threshold;

taking the first video frame as a newly added video end endpoint of the first video file;

taking the second video frame as a newly added video starting endpoint of the first video file;

and the first video frame and the second video frame divide the first video file to generate at least a third video subfile and a fourth video subfile.

Further, comparing the gray values of the video frames in the first video file frame by frame specifically includes:

reducing the gray scale level of a single color of a video frame in the first video file from 256 levels to 16 levels;

storing any two color values in the three colors according to 4 bytes, wherein the two color values are used for representing the video frame gray value of the first video file;

and when the gray value similarity of the video frames in the first video file is calculated by using a Bhattacharyya coefficient calculation formula, obtaining the color feature similarity of the video frames of the first video file after the optimization algorithm.

Further, with a preset data structure, feature indexes of a third video subfile, a fourth video subfile, a fifth video subfile and a sixth video subfile are respectively generated so as to retrieve information of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile, and the method specifically includes:

extracting at least one of color features, texture features, target features, motion features and audio features of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile;

respectively storing the video characteristics of a third video subfile, a fourth video subfile, a fifth video subfile and a sixth video subfile in a preset data structure, and respectively generating characteristic indexes of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile;

the preset data structure head data is an abstract field, and each of a plurality of following bits respectively stores a feature vector of a color feature, a texture feature, a target feature, a motion feature and an audio feature.

Further, comparing the feature index similarity of the fourth video subfile, the fifth video subfile and the sixth video subfile with respect to the third video subfile specifically includes:

respectively comparing the summary field similarity of the fourth video subfile, the fifth video subfile and the sixth video subfile relative to the third video subfile; or

And respectively comparing at least one feature vector similarity of the color feature, the texture feature, the target feature, the motion feature and the audio feature of the fourth video subfile, the fifth video subfile and the sixth video subfile relative to the third video subfile.

The application provides a many video related device, includes:

the acquisition module is used for acquiring at least two video files;

the segmentation module is used for segmenting the first video file according to the color feature similarity of the video frames of the first video file to generate at least a third video subfile and a fourth video subfile;

the index module is used for respectively generating feature indexes of a third video subfile, a fourth video subfile, a fifth video subfile and a sixth video subfile according to a preset data structure so as to retrieve information of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile;

the comparison module is used for comparing the feature index similarity of the fourth video subfile, the fifth video subfile and the sixth video subfile relative to the third video subfile;

the association module is used for determining at least one video subfile from a fourth video subfile, a fifth video subfile and a sixth video subfile, wherein the similarity of the feature index relative to the third video subfile exceeds a second preset threshold value, and the video subfile is used as an intermediate video subfile;

Further, the segmentation module is configured to segment the first video file according to the color feature similarity of the video frames of the first video file, and generate at least a third video subfile and a fourth video subfile, and specifically configured to:

Further, the segmentation module further comprises an optimization submodule;

the optimization submodule is specifically configured to:

Further, the indexing module is configured to generate feature indexes of a third video subfile, a fourth video subfile, a fifth video subfile and a sixth video subfile respectively according to a preset data structure, so as to retrieve information of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile, and is specifically configured to:

Further, the comparison module is configured to compare feature index similarities of the fourth video subfile, the fifth video subfile, and the sixth video subfile with respect to the third video subfile, and specifically configured to:

The embodiment provided by the application has at least the following beneficial effects:

the method and the device can be used for realizing the relevance matching of a plurality of long videos.

Through the optimized calculation method, the calculation amount can be saved by about 50%, and the calculation efficiency is improved.

The designed vector storage format can facilitate overall calculation and component vector calculation. For example, the similarity calculation of the whole video can be performed, and the similarity calculation can also be performed according to different characteristics. The method and the performance of searching and associating by operators are facilitated.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a flowchart of a method for implementing multi-video association according to an embodiment of the present disclosure.

Fig. 2 is a block diagram schematically illustrating a structure of a device for implementing multiple video association according to an embodiment of the present application.

Reference numerals:

100 multi-video related device

11 acquisition module

12 division module

121 optimization submodule

13 index module

14 comparing module

15 correlation module

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

At present, video retrieval is widely applied to large internet video websites or short video APPs. In addition, video retrieval is also widely applied in the field of surveillance videos.

However, at present, video retrieval is only performed on short videos for similarity matching calculation, the number of video files which can be processed is only two, and multiple videos cannot be associated.

Therefore, it is desirable to provide a multi-video association method and apparatus for matching associated segments in a plurality of long videos.

Referring to fig. 1, the present application provides a multi-video association method, including the following steps:

s110: at least two video files are obtained.

It should be noted that, unlike the prior art in which only two short video files can be acquired and sent to the system for similarity matching calculation, the multi-video association method provided by the present application can input a plurality of long video files for association.

Specifically, the video file may be a multimedia file containing real-time audio and video information. The file format of the video file may be MPEG, AVI, ASF, MOV, 3GP, WMV, RM, RMVB, FLV/F4V.

In a specific application scenario, the video file may be a number of videos that need to be associated, a number of monitoring videos that need to be associated, a number of interview videos that need to be associated, and the like, which are acquired from a recorder.

Such videos are recorded for a long time and may have multiple locations for recording. Therefore, when associating, the system often acquires at least two long video files.

S120: and segmenting the first video file according to the color feature similarity of the video frames of the first video file to generate at least a third video subfile and a fourth video subfile.

It is understood that there may be a plurality of video files acquired by the system. For the sake of brevity, only two video files are captured by the system and are not to be construed as limiting the scope of the invention.

For convenience of representation, one of the two video files acquired by the system is referred to herein as the first video file.

It should be noted that, in consideration of the situations that the video file is obtained from a recorder and needs to be associated with a plurality of videos, a plurality of monitoring videos, a plurality of interview videos and the like, the difference of scenes of each video recording is very large, and the videos collected by the camera have a plurality of different view angle shots. Therefore, the system needs to segment the first video file when the shot is switched.

The most direct method for the system to determine the shot-to-shot switching of the first video file is to compare the color feature similarity of the video frames of the first video file.

The color feature similarity may be a gray value similarity calculated for the video frame of the first video file.

Specifically, the system may calculate a gray value of each video frame of the first video file, and then the system calculates a similarity of the gray values of adjacent video frames in the first video file.

When the gray value similarity of the adjacent video frames exceeds a first preset threshold, the system selects the two adjacent frames with the gray value similarity exceeding the first preset threshold as a newly added video ending end point and a newly added video starting end point respectively. When the similarity of the gray values of the adjacent video frames exceeds a first preset threshold, the difference between the color features of the current frame and the next frame in the first video file is relatively large, and therefore it can be judged that the shot in the first video file is switched.

Specifically, the gray value of the video frame of the first video file needs to consider three channels of RGB. Therefore, the gray space of the video frame of the first video file has 256 × 16777216.

When calculating the gray values of the video frames of the first video file, the system calculates the respective gray levels for each video frame of the first video file. Considering that the system is computationally intensive, it is common to reduce the gray scale levels of three colors from 256 levels to 16 levels and then store the RGB values in 4 bytes as follows:

int value＝(R/16)*16*15+(G/16)*15+(B/16)。

and then calculating the gray value similarity value of the video frame of the first video file according to a babbitt coefficient calculation formula, wherein the babbitt coefficient formula is as follows:

in the formula, p and p' respectively represent image data of a source and a candidate, and the result obtained by adding after squaring the product of each data point with the same value i is a gray value similarity value, namely a Papanicolaou coefficient factor value. The values range from 0 to 1, and the closer the values are to 0, the more similar the images are.

It is emphasized that the above-described calculation method is a conventional method of calculating the gray-scale values of the video frames of the first video file. Considering that most of the unique features of the video are concentrated at the center 2/3 of the video picture, and the R value has very little influence on the features, the present application optimizes the calculation method of the gray value and the babbitt coefficient.

Specifically, the optimized result is as follows:

short value＝(G/16)*15+B/16

it is particularly emphasized that the first video file may be segmented. For the sake of brevity, only two video subfiles generated by splitting the first video file will be described herein, and therefore, the present invention should not be construed as being limited by the scope of the present invention.

For convenience of representation, two adjacent frames in the first video file whose similarity of gray-level values exceeds a first preset threshold are respectively referred to as a first video frame and a second video frame.

The system takes the first video frame as a new video adding end point of the first video file, and takes the second video frame as a new video adding start end point of the first video file.

For convenience of representation, after the first video frame and the second video frame are divided into the first video file, the two generated video subfiles are respectively recorded as a third video subfile and a fourth video subfile.

It is understood that there may be multiple groups of two adjacent frames with the above-mentioned similarity of gray-level values exceeding the first preset threshold, and the first video file may also generate multiple video subfiles. For the sake of brevity, no description of possible combinations of the embodiments is provided, and no limitation of the scope of the invention is thereby intended.

S130: and segmenting the second video file according to the color feature similarity of the video frames of the second video file to generate at least a fifth video subfile and a sixth video subfile.

For convenience of representation, the other of the two video files acquired by the system except the first video file is referred to herein as the second video file.

It should be noted that, in consideration of the situations that the video file is obtained from a recorder and needs to be associated with a plurality of videos, a plurality of monitoring videos, a plurality of interview videos and the like, the difference of scenes of each video recording is very large, and the videos collected by the camera have a plurality of different view angle shots. Therefore, the system needs to segment the second video file shot when switching, and the most direct method for the system to judge the second video file shot switching is to compare the color feature similarity of the second video file video frame.

The color feature similarity may be a gray value similarity calculated for the video frame of the second video file.

Specifically, the system calculates the gray value of each video frame of the second video file, and then the system calculates the similarity of the gray values of the adjacent video frames in the second video file.

The method for calculating the gray value of each video frame of the second video file by the system is the same as that described above, and details are not repeated here.

When the gray value similarity of the adjacent video frames exceeds a first preset threshold, the system selects the two adjacent frames with the gray value similarity exceeding the first preset threshold as a newly added video ending end point and a newly added video starting end point respectively. When the similarity of the gray values of the adjacent video frames exceeds a first preset threshold, the difference between the color features of the current frame and the next frame in the second video file is relatively large, and therefore it can be judged that the shot in the second video file is switched.

It is particularly emphasized that the second video file may be segmented. For the sake of brevity, only two video subfiles generated by splitting the second video file will be described herein, and therefore, the present invention should not be construed as being limited by the scope of the present invention.

For convenience of representation, two adjacent frames in the second video file whose similarity of gray-level values exceeds the first preset threshold are respectively referred to as a third video frame and a fourth video frame.

The system takes the third video frame as a new video adding end point of the second video file, and takes the fourth video frame as a new video adding start end point of the second video file.

For convenience of representation, after the third video frame and the fourth video frame are divided into the second video file, the generated two video subfiles are respectively recorded as a fifth video subfile and a sixth video subfile.

It is understood that there may be multiple groups of the two adjacent frames with the above-mentioned similarity of gray-level values exceeding the first preset threshold, and the second video file may also generate multiple video subfiles. For the sake of brevity, no description of possible combinations of the embodiments is provided, and no limitation of the scope of the invention is thereby intended.

S140: and respectively generating feature indexes of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile according to a preset data structure so as to retrieve the information of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile.

It should be noted that, in order to perform association between videos for a plurality of video subfiles, the system needs to perform feature extraction on the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile.

After the system extracts the feature information of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile, the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile are associated for convenience of retrieval and comparison. The system also needs to store a plurality of attribute feature values for the extracted features of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile in a preset storage format.

Specifically, the system extracts at least one of color features, texture features, target features, motion features and audio features of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile.

In the case where the system extracts the color features of the third, fourth, fifth, and sixth video subfiles, the color features may be calculating the similarity of gray values of each video frame of the video subfiles.

When the system extracts texture features of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile, the texture features may be calculated by using an LBP local binary pattern algorithm.

In the case that the system extracts the target features of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile, the target features may be extracted by using a deep learning training mature model of YOLOV 3.

Under the condition that the system extracts the motion characteristics of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile, the motion characteristics can be extracted from the personnel posture based on OpenPose posture recognition.

Under the condition that the system extracts the audio features of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile, the audio features can be obtained by calculating an FFT value of each frame of voice through a spectrogram and then solving an absolute value.

And then the system respectively stores the video attribute characteristic values of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile in a preset data structure, and respectively generates characteristic indexes of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile.

Specifically, the preset data structure header data is an abstract field, and each of the following bits stores a feature vector of a color feature, a texture feature, a target feature, a motion feature, and an audio feature.

In a specific embodiment provided by the present application, the preset data structure may be designed as a storage structure in which 16 bits of header data are a Summary (Summary) field, and every 16 bits later store one feature vector.

The Summary field is calculated as follows:

w in the formula_iFor different vector weight values，x_iAre different characteristic values.

The Summary vector can directly carry out cosine similarity calculation, and the calculation formula is as follows:

x in the formula_iRepresenting each eigenvalue, y, in a vector_iRepresenting each eigenvalue in the second vector.

S150: and comparing the feature index similarity of the fourth video subfile, the fifth video subfile and the sixth video subfile relative to the third video subfile.

It should be noted that the system reads the feature indexes of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile respectively according to the data structures storing the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile.

It should be noted that the system may use any one of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile as the comparison object. For the sake of brevity, only the third video subfile is described herein as the comparison object, and therefore, the present invention should not be construed as being limited to the scope of the present invention.

And when the third video subfile is taken as a comparison object, respectively calculating the feature index similarity of the fourth video subfile, the fifth video subfile and the sixth video subfile relative to the third video subfile.

It is emphasized that the feature indexes of the third, fourth, fifth and sixth video subfiles may be the summary fields of the third, fourth, fifth and sixth video subfiles.

Or the feature index of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile can be a feature vector of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile, wherein the feature vector is one of color features, texture features, target features, motion features and audio features of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile.

It is to be understood that the system may use any one of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile as a comparison object. Therefore, the system may also compare the feature index similarities of the third, fifth, and sixth video subfiles relative to the fourth video subfile.

Or the system may compare the feature index similarity of the third video subfile, the fourth video subfile, and the sixth video subfile with respect to the fifth video subfile.

Or the system may compare the feature index similarity of the third video subfile, the fourth video subfile, and the fifth video subfile with respect to the sixth video subfile.

S160: and determining at least one of the fourth video subfile, the fifth video subfile and the sixth video subfile with the characteristic index similarity exceeding a second preset threshold relative to the third video subfile as an intermediate video subfile.

It should be noted that the system determines at least one of the fourth video subfile, the fifth video subfile and the sixth video subfile, of which the similarity of the feature index relative to the third video subfile exceeds a second threshold value, as an intermediate video subfile according to the similarity of the feature index of the fourth video subfile, the fifth video subfile and the sixth video subfile relative to the third video subfile.

It is emphasized that the intermediate video subfile may be a file having a single element if there is only one video subfile with a similarity exceeding the second threshold relative to the feature index of the third video subfile. The intermediate video subfile may also be a collection of video subfiles if there are multiple video subfiles with a feature index similarity exceeding a second threshold relative to a third video subfile. It is to be understood that any one of the fourth video subfile, the fifth video subfile and the sixth video subfile is considered as an intermediate video subfile as long as the similarity of the feature index of the fourth video subfile, the fifth video subfile and the sixth video subfile with respect to the third video subfile exceeds the second threshold.

S170: and associating the intermediate video subfile with the third video subfile to generate association information.

It should be noted that the system associates the intermediate video subfile with the third video subfile to generate the association information.

It should be further noted that the system may further display the intermediate video subfile and the third video subfile according to the association information.

In the specific implementation provided by the application, two videos are continuous for a long time and have large motion amplitude.

After the system obtains two recorded videos which are continuous for a long time and have large action amplitude, the two recorded videos are respectively recorded as a first video file and a second video file.

The system can then calculate the gray value of each video frame of the first video file, and determine that two adjacent frames in the first video file whose gray value similarity exceeds a first preset threshold are respectively marked as a first video frame and a second video frame. And taking the first video frame as a new video adding end point of the first video file, and taking the second video frame as a new video adding start end point of the first video file.

And after the first video file is divided by the first video frame and the second video frame, the generated two video subfiles are respectively recorded as a third video subfile and a fourth video subfile.

And then the system calculates the gray value of each video frame of the second video file, and determines that two adjacent frames of which the similarity of the gray value in the second video file exceeds a first preset threshold are respectively marked as a third video frame and a fourth video frame. And taking the third video frame as a new video adding end point of the second video file, and taking the fourth video frame as a new video adding start end point of the second video file.

And after the second video file is divided by the third video frame and the fourth video frame, the generated two video subfiles are respectively recorded as a fifth video subfile and a sixth video subfile.

Then, the system takes 16 bits as a Summary (Summary) field, stores a feature vector for each 16 next bits, and totally calculates a 96-bit data structure to store the color feature, texture feature, target feature, motion feature and audio feature of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile respectively as the feature indexes of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile.

Then, the system respectively calculates the summary field similarity of the fourth video subfile, the fifth video subfile and the sixth video subfile relative to the third video subfile by taking the third video subfile as a comparison object.

Then, the system takes the sixth video subfile whose feature index similarity with respect to the third video subfile exceeds a second threshold as an intermediate video subfile. And associating the intermediate video subfile with a third video subfile to generate associated information.

And finally, the system displays a sixth video subfile and a third video subfile according to the association information.

To support a multi-video association method, the present application provides a multi-video association apparatus 100.

Referring to fig. 2, the present application provides a multi-video association apparatus 100, including:

the obtaining module 11 is configured to obtain at least two video files.

It should be noted that, unlike the prior art that only two short video files can be obtained for similarity matching calculation, the multi-video association apparatus 100 provided in the present application can obtain a plurality of long video files for association through the obtaining module 11.

Such videos are recorded for a long time and may have multiple locations for recording. Therefore, when associating, the obtaining module 11 will often obtain at least two long video files.

The segmentation module 12 is configured to segment the first video file according to the color feature similarity of the video frames of the first video file, and generate at least a third video subfile and a fourth video subfile;

and segmenting the second video file according to the color feature similarity of the video frames of the second video file to generate at least a fifth video subfile and a sixth video subfile.

It should be noted that, since the obtaining module 11 can obtain a plurality of long video files, for the sake of brevity, only two long video files are obtained by the obtaining module 11, and therefore, the invention is not to be construed as being limited by the scope of the claims.

For convenience of illustration, the two long video files acquired by the acquisition module 11 are referred to as a first video file and a second video file, respectively.

It should be noted that, in consideration of the situations that the video file is obtained from a recorder and needs to be associated with a plurality of videos, a plurality of monitoring videos, a plurality of interview videos and the like, the difference of scenes of each video recording is very large, and the videos collected by the camera have a plurality of different view angle shots. Therefore, the multi-video related device 100 needs to divide the first video file and the second video file by the dividing module 12 when the shots are switched.

The most direct method for the segmentation module 12 to determine the shot-to-shot switching of the first video file is to compare the color feature similarity of the video frames of the first video file.

The segmentation module 12 segments the first video file according to the similarity of the color features of the video frames of the first video file.

It should be noted that the color feature similarity may be a gray value similarity calculated for the video frame of the first video file.

Specifically, the segmentation module 12 may calculate a gray value of each video frame of the first video file, and then the segmentation module 12 calculates a similarity of gray values of adjacent video frames in the first video file.

In the specific embodiment provided in the present application, the segmentation module 12 may calculate the gray value of the video frame of the first video file using a conventional babbitt coefficient calculation formula.

Whereas the segmentation module 12 also includes an optimization submodule 121, taking into account that the most video's unique features are concentrated at the video picture center 2/3.

The optimization sub-module 121 may calculate the gray value of the video frame of the first video file using a modified babbitt coefficient calculation formula.

Specifically, the optimization submodule 121 reduces the gray scale level of a single color of a video frame in the first video file from 256 levels to 16 levels.

The optimization submodule 121 then stores only any two color values of the three colors in 4 bytes, which are used to represent the video frame gray values of the first video file.

The optimization sub-module 121 will then calculate the video frame gray value similarity within the first video file using the modified babbitt coefficient calculation formula.

Specifically, the improved babbitt coefficient calculation formula is as follows:

short value＝(G/16)*15+B/16

as can be seen from the formula, the modified babbitt factor calculation formula is calculated only for the image data of the first video file video frame 2/3.

And finally, the optimization submodule 121 obtains the gray value similarity of the video frame of the first video file after the optimization algorithm.

When the gray value similarity of the adjacent video frames exceeds a first preset threshold, the segmentation module 12 selects two adjacent frames with the gray value similarity exceeding the first preset threshold as a newly added video ending end point and a newly added video starting end point respectively. When the similarity of the gray values of the adjacent video frames exceeds a first preset threshold, it indicates that the difference between the color features of the current frame and the next frame in the first video file is relatively large, so that the segmentation module 12 can determine that the shot in the first video file is switched.

It is particularly emphasized that the segmentation module 12 may perform a multi-segment segmentation on the first video file. For the sake of brevity, the two video subfiles generated by the segmentation module 12 segmenting the first video file are described herein only and should not be construed as limiting the scope of the invention.

For convenience of representation, two adjacent frames with a gray-level similarity exceeding a first preset threshold are respectively referred to as a first video frame and a second video frame.

The segmentation module 12 will use the first video frame as the end point of the newly added video of the first video file, and the segmentation module 12 will use the second video frame as the start point of the newly added video of the first video file.

It should be noted that the most direct method for the segmentation module 12 to determine shot switching of the second video file is to compare the color feature similarity of the video frames of the second video file. And the color feature similarity may be a gray value similarity for calculating a video frame of the second video file.

The segmentation module 12 calculates the gray value of the video frame of the second video file, and the calculation method is the same as that described above, which will not be described herein again.

It should be noted that the segmentation module 12 may perform multi-segment segmentation on the second video file. For the sake of brevity, the two video subfiles generated by the segmentation module 12 segmenting the second video file are described herein only and should not be construed as limiting the scope of the invention.

Specifically, when the gray-level value similarity of the adjacent video frames exceeds a first preset threshold, the segmentation module 12 selects two adjacent frames with the gray-level value similarity exceeding the first preset threshold as an end point of the newly added video and an initial end point of the newly added video respectively. When the similarity of the gray values of the adjacent video frames exceeds a first preset threshold, the difference between the color features of the current frame and the next frame in the second video file is relatively large, and therefore it can be judged that the shot in the second video file is switched.

For convenience of representation, two adjacent frames with a gray-level similarity exceeding a first preset threshold are respectively referred to as a third video frame and a fourth video frame.

The segmentation module 12 will use the third video frame as the end point of the newly added video of the second video file, and the segmentation module 12 will use the fourth video frame as the start point of the newly added video of the second video file.

For convenience of representation, after the first video frame and the second video frame are divided into the second video file, the two generated video subfiles are respectively recorded as a fifth video subfile and a sixth video subfile.

And the indexing module 13 is configured to generate feature indexes of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile respectively according to a preset data structure, so as to retrieve information of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile.

It should be noted that after the first video file and the second video file are divided by the dividing module 12 to generate the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile, in order to perform video association on a plurality of video subfiles, the multi-video association apparatus 100 further needs to perform feature extraction on each video subfile through the retrieving module 13.

After the retrieval module 13 extracts the feature information of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile, the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile are associated for convenience of retrieval, comparison and association. The retrieval module 13 further needs to store the extracted features of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile in a preset storage format to obtain a plurality of attribute feature values.

Specifically, the indexing module 13 extracts at least one of color features, texture features, target features, motion features and audio features of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile.

Then, the indexing module 13 stores the video features of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile respectively in a preset data structure, and generates feature indexes of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile respectively.

The preset data structure head data is an abstract field, and each of a plurality of following bits respectively stores a feature vector of color features, texture features, target features, motion features and audio features.

And the comparison module 14 is used for comparing the feature index similarity of the fourth video subfile, the fifth video subfile and the sixth video subfile relative to the third video subfile.

It should be noted that the comparison module 14 reads the feature indexes of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile according to the data structures storing the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile.

It should be noted that the comparison module 14 may use any one of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile as a comparison object. For the sake of brevity, only the third video subfile is described herein as the comparison object, and therefore, the present invention should not be construed as being limited to the scope of the present invention.

When the comparison module 14 takes the third video subfile as a comparison object, the comparison module 14 will calculate the feature index similarity of the fourth video subfile, the fifth video subfile and the sixth video subfile with respect to the third video subfile.

It is to be understood that the comparison module 14 may use any one of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile as a comparison object. Accordingly, the comparison module 14 may also compare the feature index similarities of the third, fifth, and sixth video subfiles relative to the fourth video subfile.

Or the comparison module 14 may compare the similarity of the feature indexes of the third video subfile, the fourth video subfile and the sixth video subfile with respect to the fifth video subfile.

Or the comparison module 14 may compare the similarity of the feature indexes of the third video subfile, the fourth video subfile and the fifth video subfile with respect to the sixth video subfile.

The association module 15 is configured to determine at least one video subfile from among a fourth video subfile, a fifth video subfile and a sixth video subfile, which have a feature index similarity exceeding a second preset threshold with respect to the third video subfile, as an intermediate video subfile;

It should be noted that the association module 15 determines, as the intermediate video subfile, at least one of the fourth video subfile, the fifth video subfile and the sixth video subfile, of which the similarity of the feature index with respect to the third video subfile exceeds a second threshold, according to the similarity of the feature index of the fourth video subfile, the fifth video subfile and the sixth video subfile with respect to the third video subfile.

The association module 15 then associates the intermediate video subfile with the third video subfile, generating association information.

It should be further noted that the multi-video association apparatus 100 may further present the intermediate video subfile and the third video subfile according to the association information.

The obtaining module 11 records the two videos as a first video file and a second video file after obtaining the recorded videos with long-time continuity and large motion amplitude.

Then, the segmentation module 12 calculates the gray value of each video frame of the first video file, and determines that two adjacent frames in the first video file whose gray value similarity exceeds a first preset threshold are respectively recorded as the first video frame and the second video frame. And the segmentation module 12 will take the first video frame as the end point of the newly added video of the first video file, and take the second video frame as the start point of the newly added video of the first video file.

After the segmentation module 12 segments the first video file by the first video frame and the second video frame, the two generated video subfiles are respectively recorded as a third video subfile and a fourth video subfile.

Then, the segmentation module 12 calculates the gray value of each video frame of the second video file, and determines that two adjacent frames in the second video file whose gray value similarity exceeds a first preset threshold are respectively recorded as a third video frame and a fourth video frame. And the segmentation module 12 will take the third video frame as the end point of the newly added video of the second video file, and take the fourth video frame as the start point of the newly added video of the second video file.

After the segmentation module 12 segments the second video file with the third video frame and the fourth video frame, the two generated video subfiles are respectively recorded as a fifth video subfile and a sixth video subfile.

Then, the indexing module 13 takes 16 bits as a Summary (Summary) field, stores a feature vector for every 16 following bits, and stores the color feature, texture feature, target feature, motion feature and audio feature of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile respectively as the feature indexes of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile in a 96-bit data structure.

Then, the comparison module 14 will calculate the color feature similarity of the third video subfile, the fourth video subfile and the sixth video subfile with respect to the fifth video subfile respectively, taking the fifth video subfile as the comparison object.

Then, the association module 15 determines the third video subfile and the fourth video subfile, whose color feature similarity with respect to the fifth video subfile exceeds a second threshold, as intermediate video subfiles.

The association module 15 associates the third video subfile, the fourth video subfile and the fifth video subfile to generate association information.

Finally, the multi-video association apparatus 100 displays the third video subfile, the fourth video subfile and the fifth video subfile according to the association information.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A multi-video association method, comprising the steps of:

acquiring at least two video files;

2. The multi-video association method according to claim 1, wherein the segmenting the first video file according to the color feature similarity of the video frames of the first video file to generate at least a third video subfile and a fourth video subfile includes:

3. The multi-video association method of claim 1, wherein the color feature similarity of the video frames of the first video file is calculated by:

4. The multi-video association method according to claim 1, wherein feature indexes of a third video subfile, a fourth video subfile, a fifth video subfile and a sixth video subfile are respectively generated in a preset data structure, so as to retrieve information of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile, and the method specifically comprises:

5. The multi-video association method of claim 1, wherein comparing the feature index similarities of the fourth video subfile, the fifth video subfile, and the sixth video subfile with respect to the third video subfile comprises:

6. A multi-video association apparatus, comprising:

the acquisition module is used for acquiring at least two video files;

7. The multi-video association apparatus as claimed in claim 6, wherein the segmentation module is configured to segment the first video file according to the similarity of color features of the video frames of the first video file to generate at least a third video subfile and a fourth video subfile, and is specifically configured to:

8. The multi-video correlation apparatus of claim 7, wherein the partitioning module further comprises an optimization sub-module;

the optimization submodule is specifically configured to:

9. The multi-video association apparatus as claimed in claim 6, wherein the indexing module is configured to generate feature indexes of a third video subfile, a fourth video subfile, a fifth video subfile and a sixth video subfile in a preset data structure, so as to retrieve information of the third video subfile, the fourth video subfile, the fifth video subfile and the sixth video subfile, and is specifically configured to:

10. The multi-video association apparatus as claimed in claim 6, wherein the comparing module is configured to compare feature index similarities of the fourth video subfile, the fifth video subfile, and the sixth video subfile with respect to the third video subfile, and is specifically configured to: