CN112291634A

CN112291634A - Video processing method and device

Info

Publication number: CN112291634A
Application number: CN201910678001.4A
Authority: CN
Inventors: 胡东方
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2021-01-29
Anticipated expiration: 2039-07-25
Also published as: CN112291634B

Abstract

The invention provides a video processing method and a video processing device; the method comprises the following steps: carrying out segmentation processing on a target video to obtain a plurality of video segments of the target video; respectively carrying out video interframe content switching detection on each video segment to obtain a detection result corresponding to each video segment; extracting sample frame images from the video clips respectively based on the detection result; carrying out similarity matching on the extracted sample frame image of the target video and the video frame image at the corresponding position in the video to be matched to obtain a matching result; and determining the incidence relation between the target video and the video to be matched based on the matching result. By the method and the device, the accuracy and the efficiency of matching between videos can be improved.

Description

Video processing method and device

Technical Field

The invention relates to the technical field of internet, in particular to a video processing method and device.

Background

In video similarity application, similarity among different videos is usually judged by comparing similarity of video frames at corresponding positions of the different videos, but if all video frames of the videos are extracted for comparison, huge data need to be processed, and time consumption is high; in fact, many videos are similar in content in a short time, and frame-by-frame comparison is not required, and for this reason, the related art performs frame extraction processing at equal intervals on the videos to reduce the processing complexity.

However, for videos with fast content change, video frames extracted by the technique cannot well represent video content in a time interval, for example, when two similar videos are staggered by only a small segment, the technique may cause the extracted video frames to change greatly, so that when a final similarity judgment is made, an erroneous conclusion that the two videos are dissimilar is obtained.

Disclosure of Invention

The embodiment of the invention provides a video processing method and device, which can improve the accuracy and efficiency of video matching.

The embodiment of the invention provides a video processing method, which comprises the following steps:

carrying out segmentation processing on a target video to obtain a plurality of video segments of the target video;

respectively carrying out video interframe content switching detection on each video segment to obtain a detection result corresponding to each video segment;

extracting sample frame images from the video clips respectively based on the detection result;

carrying out similarity matching on the extracted sample frame image of the target video and the video frame image at the corresponding position in the video to be matched to obtain a matching result;

and determining the incidence relation between the target video and the video to be matched based on the matching result.

An embodiment of the present invention further provides a video processing apparatus, including:

the segmentation unit is used for carrying out segmentation processing on a target video to obtain a plurality of video segments of the target video;

the detection unit is used for respectively carrying out content switching detection among video frames on each video segment to obtain a detection result corresponding to each video segment;

an extracting unit configured to extract sample frame images from the respective video clips based on the detection result;

the matching unit is used for carrying out similarity matching on the extracted sample frame image of the target video and the video frame image at the corresponding position in the video to be matched to obtain a matching result;

and the determining unit is used for determining the incidence relation between the target video and the video to be matched based on the matching result.

In the foregoing scheme, the segmenting unit is further configured to perform segmentation processing on the target video based on the frame rate of the target video to obtain a plurality of video segments.

In the above scheme, the detection unit is further configured to determine a difference between pixels between adjacent video frames in each of the video segments;

determining a plurality of characteristic values of each video segment based on the difference degree;

determining a variance of the feature value corresponding to each video segment based on a plurality of feature values of each video segment;

when the variance of the characteristic value exceeds a variance threshold value, determining that the content switching of the corresponding video segment occurs.

In the foregoing solution, the extracting unit is further configured to determine a content variation value of a video frame in each of the video segments based on a plurality of feature values of each of the video segments and the corresponding variance of the feature values;

determining the video frame with the largest content change value in each video segment based on the content change value of the video frame in each video segment;

determining a sample frame in each video segment based on the video frame with the largest content change value;

and respectively extracting the sample frames from the video clips to obtain a plurality of sample frame images.

In the foregoing solution, the extracting unit is further configured to extract sample frames from the video segments according to set positions respectively to obtain sample frame images when the variance of the feature values does not exceed a variance threshold.

In the above scheme, the matching unit is further configured to perform feature extraction on each sample frame image of the target video, respectively, to obtain sample frame image features of each sample frame image;

respectively extracting the characteristics of each video frame image of the video to be matched to obtain the video frame image characteristics of each video frame image;

and respectively carrying out similarity matching on the sample frame image characteristics of the sample frame image and the video frame image characteristics of the video frame image at the corresponding position to obtain a matching result.

In the foregoing solution, the determining unit is further configured to determine that the target video and the video to be matched are the same video when the matching result represents that the number of sample frame images meeting the matching condition in the plurality of sample frame images reaches a number threshold;

wherein, the sample frame image satisfying the matching condition is: and the similarity between the sample frame image characteristics and the video frame image characteristics of the video frame image at the corresponding position reaches a similarity threshold value.

An embodiment of the present invention further provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the video processing method provided by the embodiment of the invention when executing the executable instructions stored in the memory.

The embodiment of the invention also provides a storage medium, which stores executable instructions and is used for causing a processor to execute the executable instructions so as to realize the video processing method provided by the embodiment of the invention.

The application of the embodiment of the invention has the following beneficial effects:

respectively carrying out content switching detection among video frames on each video clip to obtain a detection result, extracting sample frame images from the video clips based on the detection result, respectively carrying out similarity matching on the extracted sample frame images and video frame images at corresponding positions in a video to be matched to obtain a matching result, and determining the incidence relation between a target video and the video to be matched based on the matching result; therefore, by considering the change of the content between the video frames, the extracted sample frame image can well represent the video content of the corresponding video clip, and the sample frame image is matched with the video frame image at the corresponding position in the video to be matched, so that the accuracy and the efficiency of video matching can be improved.

Drawings

Fig. 1 is a schematic diagram of an architecture of a video processing system according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a hardware structure of a server according to an embodiment of the present invention;

fig. 3 is a schematic view of an implementation scenario of video copyright detection according to an embodiment of the present invention;

fig. 4 is a schematic view of an implementation scenario of redundant video management according to an embodiment of the present invention;

fig. 5 is a schematic view of an implementation scenario of video recommendation according to an embodiment of the present invention;

fig. 6 is a schematic flowchart of a video processing method according to an embodiment of the present invention;

fig. 7 is a flowchart illustrating a video processing method according to an embodiment of the present invention;

fig. 8 is a flowchart illustrating a video processing method according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Fig. 1 is an alternative architecture diagram of a video detection system according to an embodiment of the present invention, and referring to fig. 1, to support an exemplary application, terminals (including a terminal 400-1 and a terminal 400-2) are connected to a server 200 through a network 300, where the network 300 may be a wide area network or a local area network, or a combination of the two, and data transmission is implemented using a wireless or wired link.

A terminal (e.g., terminal 400-1) configured to send a video processing request to the server 200, where the video processing request carries a target video;

the server 200 is configured to perform segmentation processing on the target video based on the video processing request to obtain a plurality of video segments of the target video; respectively carrying out content switching detection among video frames on each video clip to obtain a detection result corresponding to each video clip; respectively extracting sample frame images from each video clip based on the detection result; carrying out similarity matching on the sample frame image of the extracted target video and the video frame image at the corresponding position in the video to be matched to obtain a matching result; determining the incidence relation between the target video and the video to be matched based on the matching result, and returning the processing result to the terminal;

here, in practical applications, the server 200 may be a single server configured to support various services, or may be a server cluster.

The terminal (terminal 400-1 and/or terminal 400-2) is further configured to display the processing result.

In practical applications, the terminal may be a smartphone, a tablet, a laptop, a wearable computing device, a Personal Digital Assistant (PDA), a desktop computer, a cellular phone, a media player, a navigation device, a game console, a television, etc., or a combination of any two or more of these or other data processing devices.

In some embodiments, a terminal is provided with a video playing client, a user can perform online playing of a video, uploading and downloading of the video, and the like through the video playing client, for example, the user uploads the video (namely, a target video) through the video playing client, the video playing client sends an uploading request carrying the target video to a server, the server analyzes the uploading request sent by the video playing client to obtain the target video, and performs segmentation processing on the target video to obtain a plurality of video segments of the target video; respectively carrying out content switching detection among video frames on each video clip to obtain a detection result corresponding to each video clip; respectively extracting sample frame images from each video clip based on the detection result; carrying out similarity matching on the sample frame image of the extracted target video and the video frame image at the corresponding position in the video to be matched to obtain a matching result; and determining the incidence relation between the target video and the video to be matched based on the matching result, and returning a corresponding processing result.

An electronic device implementing a video processing method according to an embodiment of the present invention is described below. In some embodiments, the electronic device may be a terminal and may also be a server. The embodiment of the invention takes the electronic equipment as an example of the server, and the hardware structure of the server is explained in detail.

Fig. 2 is a schematic diagram of a hardware structure of a server according to an embodiment of the present invention, and it is understood that fig. 2 only shows an exemplary structure of the server, and not a whole structure, and a part of or the whole structure shown in fig. 2 may be implemented as needed. Referring to fig. 2, a server provided in an embodiment of the present invention includes: at least one processor 201, memory 202, user interface 203, and at least one network interface 204. The various components in the server are coupled together by a bus system 205. It will be appreciated that the bus system 205 is used to enable communications among the components. The bus system 205 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 205 in fig. 2.

The user interface 203 may include, among other things, a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, or a touch screen.

It will be appreciated that the memory 202 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory.

The memory 202 in embodiments of the present invention is used to store various types of data to support the operation of the server. Examples of such data include: any executable instructions for operating on a server, such as executable instructions, may be included in the program for implementing the method of an embodiment of the invention.

The video processing method disclosed by the embodiment of the invention can be realized by the processor 201. The processor 201 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the video processing method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 201. The Processor 201 may be a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 201 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed by the embodiment of the invention can be directly implemented by a hardware decoding processor, or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in the memory 202, and the processor 201 reads the information in the memory 202, and performs the steps of the video processing method provided by the embodiment of the present invention in combination with the hardware thereof.

Based on the above description of the video processing system and the electronic device according to the embodiment of the present invention, an application scenario or a field of the video processing method according to the embodiment of the present invention is described below, and it should be noted that the video processing method according to the embodiment of the present invention is not limited to the following scenario or field:

1. video copyright detection

Fig. 3 is a schematic view of an implementation scene of video copyright detection according to an embodiment of the present invention, and a description is next given, with reference to fig. 1 and fig. 3, of a scene in which a video processing method according to an embodiment of the present invention is applied to video copyright detection.

Taking a terminal as a terminal 400-1 in fig. 1 as an example, a video playing client is arranged on the terminal, a background server corresponding to the video playing client is the server 200 in fig. 1, a user uploads a video (such as an a movie) through the video playing client, and the video playing client sends an upload request carrying the a movie to the background server;

a plurality of videos with copyright attribution attributes (such as copyright attribution being a video publisher which is a user of a video playing client or copyright attribution being a playing platform) are stored in a video library of the background server; the background server carries out segmentation processing on the film A based on the uploading request to obtain a plurality of video clips of the film A; respectively carrying out content switching detection among video frames on each video clip to obtain a detection result corresponding to each video clip; respectively extracting sample frame images from each video clip based on the detection result; carrying out similarity matching on the extracted sample frame image of the target video and the video frame image at the corresponding position in the video library to obtain a matching result; determining the incidence relation between the film A and the video in the video library based on the matching result, and returning a corresponding processing result to the terminal, for example, if the film A is determined to be the same as the video in the video library, finding the video which is the same as the film A, and returning information which prohibits uploading to the terminal; if the video in the film A is determined to be different from the video in the video library, the video same as the film A is not found, and information of successful uploading is returned to the terminal.

The video processing method of the embodiment of the invention can effectively provide copyright protection for the video and effectively maintain rights and interests of a video uploader and a playing platform.

2. Redundant video management

Fig. 4 is a schematic view of an implementation scenario of redundant video management according to an embodiment of the present invention, and a scenario in which a video processing method according to an embodiment of the present invention is applied to redundant video management is described with reference to fig. 1 and fig. 4.

Taking an example that a first terminal is a terminal 400-1 in fig. 1, a video playing client is arranged on the first terminal, a second terminal is a terminal 400-2 in fig. 1, and a background server corresponding to the video playing client is a server 200 in fig. 1, where the first terminal faces a video viewer, the second terminal faces a manager of the video playing client, and in some embodiments, the manager may also play a video through the video playing client arranged on the second terminal.

In actual implementation, the second terminal is provided with management software (such as a management client), and a manager can manage resources corresponding to the video playing client stored on the background server through a user interface provided by a management tool.

In some embodiments, the second terminal sends a repeated query request carrying the target video to the background server, and the background server analyzes the repeated query request sent by the second terminal to obtain the target video, and performs segmentation processing on the target video to obtain a plurality of video clips of the target video; respectively carrying out content switching detection among video frames on each video clip to obtain a detection result corresponding to each video clip; respectively extracting sample frame images from each video clip based on the detection result; carrying out similarity matching on the sample frame image of the extracted target video and the video frame image at the corresponding position in the video to be matched to obtain a matching result; determining the association relationship between the target video and the video to be matched based on the matching result, and returning a corresponding processing result, for example, if the target video is determined to be the same as the video to be matched, finding the video same as the target video, and returning corresponding video information (such as a video identifier and a video name) to the second terminal, if the target video is determined to be different from the video to be matched, not finding the video same as the target video, and returning the information of not finding the same video to the second terminal;

the manager can perform corresponding processing based on the search result returned by the background server, for example, if the second terminal receives the video information which is returned by the background server and is the same as the target video, the video can be deleted based on the video information, so that the occupation of the storage space of the background server can be reduced, and the stock video of the video playing platform can be purified.

3. Video recommendation

Fig. 5 is a schematic view of an implementation scene of video recommendation provided in an embodiment of the present invention, and a description is next given, with reference to fig. 1 and fig. 5, of a scene in which a video processing method according to an embodiment of the present invention is applied to video recommendation.

Taking the terminal as the terminal 400-1 in fig. 1 as an example, a video playing client is arranged on the terminal, a background server corresponding to the video playing client is the server 200 in fig. 1, and a user can watch videos through the video playing client.

In some embodiments, the background server may recommend videos through the video playing client, and a video library is disposed on the background server, where videos recommended within a period of time are stored in the video library.

Before video recommendation, a background server carries out segmentation processing on a video to be recommended (namely a target video) to obtain a plurality of video segments of the video to be recommended; respectively carrying out content switching detection among video frames on each video clip to obtain a detection result corresponding to each video clip; respectively extracting sample frame images from each video clip based on the detection result; carrying out similarity matching on the extracted sample frame image of the video to be recommended and the video frame image at the corresponding position in the recommended video stored in the video library to obtain a matching result; determining the association relationship between the video to be recommended and the recommended video based on the matching result, and judging whether to recommend the video to be recommended or not based on the determination result, for example, if the video to be recommended is the same as the recommended video, the video which is the same as the video to be recommended is found, and the video is not recommended, so that the recommended video can be filtered, and the video is prevented from being repeatedly recommended; and if the video to be recommended is different from the recommended video, namely the video same as the video to be recommended is not found, pushing the video to a video playing client side to recommend the video.

Next, a video processing method provided by an embodiment of the present invention is described, fig. 6 is a flowchart illustrating the video processing method provided by the embodiment of the present invention, and in some embodiments, the video processing method may be implemented by a server or a terminal, or implemented by the server and the terminal in a cooperation manner, for example, implemented by the server 200 in fig. 1, and with reference to fig. 1 and fig. 6, the video processing method provided by the embodiment of the present invention includes:

step 601: and the server carries out segmentation processing on the target video to obtain a plurality of video segments of the target video.

In practical applications, the target video may be a complete video, such as a complete movie file, or a video segment, such as a segment excerpt of a movie.

In practical implementation, because the video data is composed of continuous images and is not convenient for direct processing, the server divides the target video into a plurality of relatively independent video segments before performing frame extraction on the target video, and because the video is mostly formed by splicing a plurality of segments of shots, and the images of the frames in the shots are continuously changed, the server can divide the target video into the individual shots and then extract the key frames of each shot to represent the shots.

In some embodiments, the server may also obtain multiple video clips of the target video by:

and carrying out segmentation processing on the target video based on the frame rate of the target video to obtain a plurality of video segments.

Here, assuming that the frame rate of the target video is 30FPS, that is, 30 frames per second of the displayed image, the target video can be divided into video segments each having a duration of 1 second and containing 30 frames of pictures, based on the frame rate of the target video.

Step 602: and respectively carrying out content switching detection among video frames on each video clip to obtain a detection result corresponding to each video clip.

In practical application, when detecting the content change of a video frame, because the chrominance and saturation between video frames are not changed greatly, the brightness information is changed greatly, and the HSV color space is more sensitive to the brightness change, the video frame image in each video segment can be converted from the RGB format to the HSV format before detecting the content between the video frames.

In some embodiments, the server may perform content switching detection between video frames on each video segment in the following manner to obtain a detection result corresponding to each video segment:

respectively determining the difference degree of pixel points between adjacent video frames in each video clip; determining a plurality of characteristic values of each video clip based on the difference degree; determining a variance of the feature value corresponding to each video clip based on the plurality of feature values of each video clip; and when the variance of the characteristic values exceeds the variance threshold value, determining that the content switching of the corresponding video segment occurs.

In practical implementation, it is necessary to collect pixel points between adjacent video frames in a video segment, calculate the difference between the pixel points between adjacent video frames in HSV three channels by using an inter-frame pixel point matching method, and sum the difference between the pixel points between adjacent video frames in the three channels to obtain the difference between the pixel points between adjacent video frames:

f_v_i＝∑_i∑_c＝0,1,2(C1_ci-C2_ci)； (1)

wherein, f _ v_iRepresenting the feature value of the ith frame image, C1 and C2 representing pictures of two adjacent video frames, and i is the pixel of the picture.

Thus, a series of feature values in the video segment can be obtained: f _ v₁、f_v₂…f_v_fps(fps is frame rate), the mean value mean and variance std of the series of characteristic values can be obtained, the variance std of the characteristic values can well reflect the fluctuation of the pixel characteristics of the video clip, when the variance std of the characteristic values is larger than a variance threshold value T, the video content in the video clip is considered to be changed greatly, and the corresponding video clip is determined to have content switching; when the variance of the characteristic value does not exceed the variance threshold, the video content in the video segment is not changed greatly, and the corresponding video segment is determined not to be sentAnd switching the original content.

Step 603: based on the detection result, sample frame images are extracted from the respective video clips.

In practical application, when the video clip is determined to have the content switching, the video frame image which can represent the content of the video clip most can be extracted from the video clip.

In some embodiments, the server may extract the sample frame images from the video clips respectively based on the detection results by:

determining a content change value of a video frame in each video segment based on a plurality of characteristic values and corresponding characteristic value variances of each video segment, wherein the content change value represents content change between adjacent video frames; determining the video frame with the maximum content change value in each video clip based on the content change value of the video frame in each video clip; determining a sample frame in each video clip based on the video frame with the maximum content change value in each video clip; sample frames are extracted from each video segment to obtain a plurality of sample frame images.

In practical implementation, the characteristic value f _ v of the video frame based on the obtained video clip_iThe mean value mean of the feature values and the variance std of the feature values, the content change value z _ score of each video frame in the corresponding video segment can be determined:

z_score＝(f_v_i-mean)/std； (2)

wherein, the larger z _ score is, the larger the content change value of the corresponding video frame is, and when z _ score is maximum, it can be determined that the content switching occurs to the corresponding video frame, and then the content switching time K is_ZCan be expressed as:

K_Z＝max{(f_v_i∈(0,fps)-mean)/std}and std>T； (3)

in order to obtain a sample frame image that represents a video clip well, the server decimates [ K ]_Z，end]And taking the video frame image at the middle moment as a sample frame image, wherein end refers to the moment of the last video frame image in the video segment, and 0 refers to the moment of the first video frame image in the video segment.

For each video clip with the video frame content switched in the video clip, the extraction method is adopted to obtain a plurality of sample images, so that the sample frame images which can well represent the video content of the corresponding video clip are extracted, the extracted sample frame images of the target video are matched with the video images of the video to be matched, and the video matching accuracy can be improved.

In some embodiments, the server may extract the sample frame image by:

and when the variance of the characteristic value does not exceed the variance threshold, respectively extracting the sample frames from each video segment according to the set position to obtain sample frame images.

Here, when the variance of the feature value does not exceed the variance threshold, the content of the video frame in the corresponding video segment is not switched, that is, the video frame images in the video segment are considered to be similar, the set position may be a [0, end ] middle position or a position obtained by dividing the video segment at equal intervals according to the duration of the video segment, and each video segment may extract one or more sample frames. Illustratively, the server may extract a video frame at the middle time [0, end ] as a sample frame image, and may also randomly extract a video frame from [0, end ] as a sample frame image.

Step 604: and performing similarity matching on the extracted sample frame image of the target video and the video frame image at the corresponding position in the video to be matched to obtain a matching result.

Here, in practical applications, the video to be matched is stored in the server, and the video frame image in the video to be matched is extracted in the same extraction manner as the sample frame of the target video frame.

In some embodiments, the server may obtain the matching result by:

respectively extracting the features of various sample frame images of a target video to obtain the sample frame image features of the various sample frame images; respectively extracting the characteristics of each video frame image of the video to be matched to obtain the video frame image characteristics of each video frame image; and respectively carrying out similarity matching on the sample frame image characteristics of the sample frame image and the video frame image characteristics of the video frame image to obtain matching results.

In some embodiments, feature extraction can be performed on each sample frame image of a target video and each video frame image of a video to be matched respectively through a feature extraction model obtained through training to obtain sample frame image features of each sample frame image and video frame image features of each video frame image, then similarity calculation is performed on the sample frame image features of the sample frame images and the video frame image features of the video frame images respectively to obtain corresponding similarity results, and when the similarity reaches a preset similarity threshold value, successful matching is indicated; and when the similarity does not reach the preset similarity threshold, indicating that the matching fails.

In practical implementation, when feature matching is performed, the sample frame image features of the sample frame image need to be matched with the video frame image features of the video frame images corresponding to the feature library, for example, the currently matched sample frame image of the target video is the 2 nd sample frame image, and feature matching needs to be performed with the 2 nd video frame image of the video in the feature library.

In some embodiments, the server is provided with a video library and a feature library, wherein the video library stores a plurality of videos to be matched, the feature library stores a plurality of video frame image features corresponding to the video library, and for a specific video in the video library, the feature library stores a plurality of sample frame image features corresponding to the specific video, wherein the extraction manner of the sample frame of the specific video is the same as the extraction manner of the sample frame of the target video frame.

In practical implementation, the server may perform similarity calculation on the sample frame image features of the target video and the corresponding sample frame image features (e.g., video frames with the same playing time) in the feature library to obtain corresponding similarity results. When the similarity reaches a preset similarity threshold, the matching is successful; and when the similarity does not reach the preset similarity threshold, indicating that the matching fails.

Step 605: and determining the incidence relation between the target video and the video to be matched based on the matching result.

In some embodiments, the association relationship between the target video and the video to be matched may be determined as follows:

when the matching result represents that the number of the sample frame images meeting the matching condition in the plurality of sample frame images reaches a number threshold, determining that the target video and the video to be matched are the same video; wherein, the sample frame image satisfying the matching condition is: and the similarity between the image features of the sample frame and the image features of the video frame at the corresponding position reaches a similarity threshold value, namely the sample frame image matched with the image features of the video frame at the corresponding position of the video to be matched.

By applying the embodiment of the invention, the extracted sample frame image can well represent the video content of the corresponding video segment according to the content of switching among the video frames in each video segment by considering the change of the content among the video frames, and the sample frame image is matched with the video frame image at the corresponding position in the video to be matched, so that the accuracy and the efficiency of video matching are improved.

Next, taking an application scenario applied to video copyright detection as an example, a video processing method according to an embodiment of the present invention is continuously described, fig. 7 is a schematic flow chart of the video processing method according to the embodiment of the present invention, in some embodiments, the video processing method may be cooperatively implemented by a terminal and a server, for example, implemented by the terminal 400-1 and the server 200 in fig. 1, the terminal 400-1 is provided with a video playing client, and with reference to fig. 1 and fig. 7, the video processing method according to the embodiment of the present invention includes:

step 701: and the video playing client sends a video uploading request carrying the target video to the server.

In practical application, a user can play, download, upload and the like videos through a video playing client arranged in a terminal, when the user uploads the videos, the video playing client receives a video uploading instruction triggered by the user and a user interface, the video uploading instruction indicates that a target video is uploaded, the video playing client sends a video uploading request carrying the target video to a server, and in practical application, the video uploading request can also carry a sending end identifier, such as a terminal identifier.

Step 702: and the server analyzes the video uploading request to obtain the target video to be uploaded.

Here, in practical application, the video upload request carries a target video to be uploaded, and the server receives the video upload request sent by the video playing client and analyzes the video upload request to obtain the target video to be uploaded.

Step 703: and carrying out segmentation processing on the target video to be uploaded to obtain a plurality of video segments of the target video.

In practical implementation, the server divides the target video into a plurality of video segments based on the frame rate of the target video, for example, assuming that the frame rate of the target video is 30FPS, the target video can be divided into video segments each having a duration of 1 second and containing 30 frames of pictures.

Step 704: and respectively carrying out content switching detection among video frames on each video clip to obtain a detection result corresponding to each video clip.

Determining the difference degree of pixel points between adjacent video frames in each video segment, determining a plurality of characteristic values of each video segment based on the difference degree, determining the variance of the characteristic value corresponding to each video segment based on the plurality of characteristic values of each video segment, and determining the content switching of the corresponding video segment when the variance of the characteristic value exceeds a variance threshold; and when the variance of the characteristic values does not exceed the variance threshold, determining that the content switching does not occur in the corresponding video segment. In this way, it can be detected whether the content of the video frame in each video segment has changed greatly.

Step 705: based on the detection result, sample frame images are extracted from the respective video clips.

Here, for a specific video segment, when the detection result indicates that the content of the video frame in the video segment has not changed, sample frames are extracted from the middle position of the video segment according to the set position, so as to obtain sample frame images.

When the detection result represents that the content of the video frame in the video clip changes, further, based on a plurality of characteristic values of the video clip, obtaining an average characteristic value of the video clip; then, based on a plurality of characteristic values of the video segmentThe average characteristic value, the corresponding variance of the characteristic values and the formulas (2) to (3) to obtain the moment K of switching the video frame content in the video clip_ZAnd extracting [ K ]_Z，end]The video frame image at the intermediate time is taken as a sample frame image.

By the extraction mode, one sample frame image which can represent the content of the video clip most can be extracted from the video clip every second, and the accuracy of video matching can be improved.

Step 706: and sequentially inputting the sample frame images extracted from each video clip into the feature extraction model, and determining the corresponding sample frame image features.

Here, the server is provided with a feature extraction model trained in advance.

Step 707: and performing similarity matching on the obtained sample frame image characteristics of the target video and corresponding video frame image characteristics in the characteristic library to obtain a matching result.

The server is provided with a video library and a feature library, wherein the video in the video library is a video to be matched with copyright attribution attribute, the feature library is constructed in advance and stores a plurality of video frame image features to be matched in the corresponding video library, and the extraction mode of the sample frame of the video to be matched is the same as that of the sample frame of the target video frame.

In actual implementation, when feature matching is performed, the sample frame image features of the target video need to be matched with the video frame image features of the video to be matched corresponding to the feature library, for example, the currently matched sample frame image of the target video needs to be matched with the 2 nd video frame image of the video to be matched in the feature library.

Step 708: searching a video identical to the target video in a video library, and executing a step 709 when the video identical to the target video is searched; when the video identical to the target video is not found, step 711 is executed.

Here, in practical applications, the server determines that the video to be matched in the video library that satisfies the following condition is the same video as the target video to be uploaded:

and in the matching result with the sample frame images of the target video, the number of the successfully matched sample frame images of the video to be matched reaches a preset number threshold.

Step 709: and sending a message for prohibiting uploading to the video playing client.

Step 710: and the video playing client displays the information of forbidding uploading through a user interface.

Step 711: and sending the message of successful uploading to the video playing client.

By applying the embodiment of the invention, the extracted sample frame image can well represent the video content of the corresponding video segment according to the content switched among the video frames in each video segment by considering the change of the content among the video frames, the characteristics of the sample frame image are matched with the characteristics of the corresponding video frame image to be matched in the characteristic library, the accuracy of video matching is improved, and when the situation that the video uploaded by a user is the same as the video to be matched in the video library is detected, the uploading of the video to be matched is forbidden, the repeated uploading of the video can be effectively avoided, the copyright protection of the video can be effectively provided, and the rights and interests of a video uploader and a playing platform can be effectively maintained.

Continuing with the description of the video processing method according to the embodiment of the present invention, the method may be executed by the terminal, or by the server, or by the terminal and the server in cooperation, and then the description will be given by taking the server as an example, as executed by the server 200 in fig. 1.

The video processing method comprises the steps of dividing a target video into a plurality of video segments based on the frame rate of the target video, extracting a sample frame image from each video segment by considering the content switching among video frames in each video segment to obtain a plurality of corresponding sample frame images, wherein the sample frame images can better reflect the video content of the corresponding video segment, then extracting the characteristics of each sample frame image to obtain a plurality of corresponding sample frame image characteristics, and performing similarity matching with each video frame image characteristic of a video to be matched in a characteristic library to obtain a matching result; and finally, determining the incidence relation between the target video and the video to be matched based on the matching result. The extraction mode of the sample frame of the video to be matched is the same as that of the sample frame of the target video frame.

Referring to fig. 8, fig. 8 is a schematic flow chart of a video processing method according to an embodiment of the present invention, in a stage of constructing a feature library, a to-be-matched video library is set on a server, a plurality of to-be-matched videos are stored in the to-be-matched video library, frame-sampling is performed on the to-be-matched videos, a plurality of sample frame images of the to-be-matched videos are obtained, for each sampled sample frame image, feature extraction is performed using a trained feature extraction model, and extracted features are added to the feature library.

In the feature matching stage, the same steps of frame extraction and feature extraction are carried out on the target video to be matched as in the feature library construction stage, and finally the extracted image features of the target video are matched with the features in the feature library.

Next, the frame decimation, which is involved in both the stage of constructing the feature library and the feature matching stage, will be described.

Firstly, the server carries out segmentation processing on a target video based on a video frame rate to obtain a plurality of video segments of the target video, and considering that when the content of a video frame is detected to be changed, the chrominance and the saturation between video frames are not changed greatly, the brightness information is changed greatly, and the HSV color space is more sensitive to the change of the brightness than the RGB space, so that before the content between the video frames is detected, the video frame images in the video segments are converted into the HSV format from the RGB format.

And secondly, performing content switching detection among video frames on each video clip to obtain a detection result. The step is mainly used for judging whether the video content of the video clip is mutated or not. In some embodiments, the detection of video inter-frame content switching may be performed by: respectively determining the difference degree of pixel points between adjacent video frames in each video clip; determining a plurality of characteristic values of each video clip based on the difference degree; determining a variance of the feature value corresponding to each video clip based on the plurality of feature values of each video clip; when the variance of the characteristic value exceeds a variance threshold value, determining that the corresponding video segment has content switching; and when the variance of the characteristic values does not exceed the variance threshold, determining that the content switching does not occur in the corresponding video segment.

The method for determining the difference degree of the pixel points between the adjacent video frames in each video clip can be realized by the following steps: the method comprises the steps of collecting pixel points between adjacent video frames in a video segment, respectively calculating the difference degree of the pixel points between the adjacent video frames in HSV three channels by adopting an inter-frame pixel point matching method, summing the difference degrees of the pixel points between the adjacent video frames in the three channels, and obtaining the difference degree of the pixel points between the adjacent video frames through a formula (1).

By the method, a series of characteristic values in the video clip can be obtained: f _ v₁、f_v₂…f_v_fps(fps is frame rate), the average mean value mean of the series of characteristic values and the corresponding characteristic value variance std can be obtained, the fluctuation of the pixel characteristics of the video clip can be well reflected by the characteristic value variance std, when the characteristic value variance std is larger than a variance threshold value T, the video content in the video clip is changed greatly, and the corresponding video clip is determined to have content switching.

Finally, based on the detection result, respectively extracting the sample frame images from each video segment, and when the video segment is determined to have the content switching, extracting the video frame image which can represent the content of the video segment most from the video segment, specifically: characteristic value f _ v of video frame based on obtained video segment_iDetermining a content change value z _ score of each video frame in a corresponding video clip through a formula (2), wherein the larger the z _ score is, the larger the content change value of the corresponding video frame is, when the z _ score is maximum, the content switching of the corresponding video frame can be determined, and then the content switching time K is_ZCan be determined by equation (3) to determine K_ZThen, the server extracts [ K ]_Z，end]The video frame image at the intermediate time is taken as a sample frame image.

For each video clip with the video frame content switched in the video clip, through the extraction mode, a sample frame image which can represent the video clip content at the second most is extracted every second, and the extracted sample frame image of the target video is matched with the video image extracted from the video to be matched in the same extraction mode, so that the video matching accuracy can be improved.

When the variance of the feature value does not exceed the variance threshold, the content of the video frame in the corresponding video segment is not switched, that is, the video frame images in the video segment are all considered to be similar, the server can extract the video frame at the middle time of [0, end ] as a sample frame image, and certainly, a video frame can be randomly extracted from [0, end ] as a sample frame image.

Therefore, the frame extraction sampling of the target video is completed through the method.

Next, similarity judgment is performed on the target video and the video to be matched for introduction.

Firstly, carrying out feature extraction on the sample frame images extracted by the extraction mode to obtain a series of sample frame image feature vectors, and carrying out feature extraction on the video frame images extracted from the video to be matched to obtain a series of video frame image feature vectors to be matched.

Then, for two different videos, calculating sample frame image feature vectors at corresponding moments and similarity Dis of video frame image features to be matched frame by frame according to a time sequence, regarding video segments with the same duration, when Dis is larger than a similarity threshold, considering that the video frame image at the moment contributes to the similarity of the videos, namely the video segment contributes to the similarity of the whole video, and when the number of corresponding similar video frame images exceeds a certain threshold, determining that the target video is similar to the video to be matched, otherwise, determining that the target video is not similar to the video to be matched.

By applying the embodiment of the invention, the processing speed is high, the recall rate of similar videos is also improved, and under the extreme condition that two sections of clipped videos just stagger for a very small period of time (such as 0.5 second), the recall rate of the video clip is improved by 10-15% by applying the processing method of the embodiment of the invention compared with an equal-interval frame extraction method, and the condition that the recall rate of the video clip with fast picture change under the extreme condition is low can be better improved.

The description continues for the video processing apparatus provided by the embodiment of the present invention. Fig. 9 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention, and referring to fig. 9, a video processing apparatus 90 according to an embodiment of the present invention includes:

a segmenting unit 91, configured to perform segmentation processing on a target video to obtain a plurality of video segments of the target video;

a detecting unit 92, configured to perform content switching detection between video frames on each video segment, respectively, to obtain a detection result corresponding to each video segment;

an extracting unit 93 configured to extract sample frame images from the respective video clips based on the detection result;

the matching unit 94 is configured to perform similarity matching on the extracted sample frame image of the target video and the video frame image at the corresponding position in the video to be matched to obtain a matching result;

a determining unit 95, configured to determine, based on the matching result, an association relationship between the target video and the video to be matched.

In some embodiments, the segmenting unit is further configured to segment the target video based on the frame rate of the target video to obtain a plurality of video segments.

In some embodiments, the detection unit is further configured to determine a difference between pixels between adjacent video frames in each of the video segments respectively;

In some embodiments, the extracting unit is further configured to determine a content change value of a video frame in each of the video segments based on a plurality of feature values and corresponding feature value variances of each of the video segments, where the content change value represents a content change between adjacent video frames;

In some embodiments, the extracting unit is further configured to extract sample frames from the video segments respectively according to a set position when the variance of the feature values does not exceed a variance threshold, so as to obtain sample frame images.

In some embodiments, the matching unit is further configured to perform feature extraction on each sample frame image of the target video, so as to obtain sample frame image features of each sample frame image;

In some embodiments, the determining unit is further configured to determine that the target video and the video to be matched are the same video when the matching result indicates that the number of sample frame images satisfying the matching condition in the plurality of sample frame images reaches a number threshold;

Here, it should be noted that: the above description related to the apparatus is similar to the above description of the method, and for the technical details not disclosed in the apparatus according to the embodiment of the present invention, please refer to the description of the method embodiment of the present invention.

a memory for storing executable instructions;

All or part of the steps of the embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Random Access Memory (RAM), a Read-Only Memory (ROM), a magnetic disk, and an optical disk.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a RAM, a ROM, a magnetic or optical disk, or various other media that can store program code.

The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. A method of video processing, the method comprising:

2. The method of claim 1, wherein the segmenting the target video into a plurality of video segments of the target video comprises:

and based on the frame rate of the target video, carrying out segmentation processing on the target video to obtain a plurality of video segments.

3. The method of claim 1, wherein the performing the inter-video content switching detection on each of the video segments to obtain the detection result corresponding to each of the video segments comprises:

respectively determining the difference degree of pixel points between adjacent video frames in each video segment;

4. The method as claimed in claim 3, wherein said extracting sample frame images from each of said video segments respectively based on said detection results comprises:

determining a content change value of a video frame in each video segment based on a plurality of characteristic values of each video segment and the corresponding characteristic value variance;

5. The method as claimed in claim 3, wherein said extracting sample frame images from each of said video segments respectively based on said detection results comprises:

and when the variance of the characteristic values does not exceed a variance threshold value, respectively extracting a sample frame from each video segment according to a set position to obtain a sample frame image.

6. The method of claim 1, wherein the matching the extracted sample frame image of the target video with the video frame image at the corresponding position in the video to be matched to obtain the matching result comprises:

respectively extracting the characteristics of each sample frame image of the target video to obtain the characteristics of sample frame images of various sample frame images;

7. The method according to claim 1, wherein the determining the association relationship between the target video and the video to be matched based on the matching result comprises:

when the matching result represents that the number of sample frame images meeting the matching condition in the plurality of sample frame images reaches a number threshold, determining that the target video and the video to be matched are the same video;

8. A video processing apparatus, characterized in that the apparatus comprises:

9. The apparatus of claim 8,

the segmentation unit is further configured to perform segmentation processing on the target video based on the frame rate of the target video to obtain a plurality of video segments.

10. An electronic device, characterized in that the electronic device comprises:

a memory for storing executable instructions;

a processor for implementing the video processing method of any of claims 1 to 7 when executing executable instructions stored in the memory.