CN114827665B

CN114827665B - Video analysis method, device, equipment and storage medium

Info

Publication number: CN114827665B
Application number: CN202210614424.1A
Authority: CN
Inventors: 王建明; 赵书礼; 高锋; 勇智雯
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2023-10-10
Anticipated expiration: 2042-05-31
Also published as: CN114827665A

Abstract

The application relates to a video analysis method, a device, equipment and a storage medium, and relates to the technical field of multimedia processing. The video analysis method comprises the following steps: acquiring a video to be analyzed; searching a historical version video corresponding to the video to be analyzed; acquiring a first key initial frame and a first key termination frame of a similar content segment in a historical version video, and acquiring a second key initial frame and a second key termination frame of the similar content segment in a video to be analyzed; according to the first key initial frame and the first key termination frame, acquiring a historical video analysis result of similar content fragments in the historical version video; and according to the second key start frame and the second key end frame, taking the historical video analysis result as a target video analysis result of the similar content fragments in the video to be analyzed. The method and the device are used for solving the problem that AI calculation force is wasted because AI analysis needs to be carried out on the changed video again after the video is changed.

Description

Video analysis method, device, equipment and storage medium

Technical Field

The present application relates to the field of multimedia processing technologies, and in particular, to a video analysis method, apparatus, device, and storage medium.

Background

Currently, AI (Artificial Intelligence ) analysis is usually performed on a video, and contents such as scenes, objects, actions and the like in the video are analyzed to obtain a video analysis result.

In general, a popular video is frequently changed in the early stage of online, for example, the video content changes: adding or deleting advertisements, adding or deleting lenses, and the like. After the video is changed, the video analysis result of the video before the change is invalid, and AI analysis needs to be carried out on the video after the change again. Frequent changes of the hot video can lead to the need of AI analysis on each version of the same video, so that AI calculation power waste such as GPU (graphics processing unit, graphic processor), CPU (central processing unit ) and the like is caused, and labor cost for auditing analysis results of the video is increased.

Disclosure of Invention

The application provides a video analysis method, a device, equipment and a storage medium, which are used for solving the problem that AI calculation force is wasted because AI analysis needs to be carried out on a video after video is changed.

In a first aspect, the present application provides a video analysis method, including:

acquiring a video to be analyzed;

searching a historical version video corresponding to the video to be analyzed, wherein the historical version video is subjected to video analysis;

acquiring a first key start frame and a first key end frame of a similar content segment in the historical version video, and acquiring a second key start frame and a second key end frame of the similar content segment in the video to be analyzed, wherein the similar content segment refers to a similar frame sequence between the video to be analyzed and the historical version video;

acquiring a historical video analysis result of the similar content segments in the historical version video according to the first key start frame and the first key stop frame;

and taking the historical video analysis result as a target video analysis result of the similar content fragments in the video to be analyzed according to the second key start frame and the second key end frame.

Optionally, the similarity between the first key start frame and the second key start frame is greater than or equal to a similarity threshold, the similarity between the previous frame of the first key start frame and the previous frame of the second key start frame is less than the similarity threshold, the similarity between the first key end frame and the second key end frame is greater than or equal to the similarity threshold, and the similarity between the next frame of the first key end frame and the next frame of the second key end frame is less than the similarity threshold.

Optionally, for any one of the similar content segments, the acquiring process of the first key start frame and the second key start frame includes:

acquiring the first key-stop frame and the second key-stop frame of the similar content segment that is the last of the any one similar content segment;

taking the next frame of the first key termination frame of the last similar content segment as a first initial frame and taking the next frame of the second key termination frame of the last similar content segment as a second initial frame;

acquiring a first scene transition frame after the first initial frame and acquiring a second scene transition frame after the second initial frame;

acquiring a first similarity between the first initial frame and the second scene transition frame, and acquiring a second similarity between the first scene transition frame and the second initial frame;

if the first similarity is greater than or equal to the similarity threshold, the first initial frame is used as the first key start frame, and the second scene transition frame is used as the second key start frame;

and if the second similarity is greater than or equal to the similarity threshold, taking the first scene transition frame as the first key start frame and taking the second initial frame as the second key start frame.

Optionally, the similarity between the previous frame of the first scene transition frame and the first scene transition frame is smaller than the similarity threshold, and the similarity between the previous frame of the second scene transition frame and the second scene transition frame is smaller than the similarity threshold.

Optionally, for any one of the similar content segments, the acquiring process of the first key-terminated frame and the second key-terminated frame includes:

acquiring a first termination frame and a second termination frame, wherein the playing time length between the first termination frame and the first key start frame is equal to the playing time length between the second termination frame and the second key start frame;

taking the first key start frame as a first target start frame, taking the first stop frame as a first target stop frame, taking the second key start frame as a second target start frame, and taking the second stop frame as a second target stop frame;

acquiring a first intermediate frame between the first target starting frame and the first target ending frame and a second intermediate frame between the second target starting frame and the second target ending frame;

acquiring a third similarity between the first intermediate frame and the second intermediate frame, and comparing the third similarity with the similarity threshold;

if the third similarity is greater than or equal to the similarity threshold, taking the first intermediate frame as the first target starting frame, taking the second intermediate frame as the second target starting frame, and returning to execute the step of acquiring the first intermediate frame and the second intermediate frame;

if the third similarity is smaller than the similarity threshold, taking the first intermediate frame as the first target termination frame, taking the second intermediate frame as the second target termination frame, and returning to execute the step of acquiring the first intermediate frame and the second intermediate frame;

and taking the first intermediate frame as the first key termination frame and taking the second intermediate frame as the second key termination frame until the first intermediate frame is identical to the first target start frame or the first intermediate frame is identical to the first target termination frame.

Optionally, before obtaining the third similarity between the first intermediate frame and the second intermediate frame, the method further includes:

acquiring a first size of the first intermediate frame and a second size of the second intermediate frame;

comparing the first size and the second size;

and if the first size and the second size are inconsistent, adjusting the first size or the second size to enable the first size and the second size to be consistent.

Optionally, the acquiring the similar content segments is after a second key start frame and a second key end frame in the video to be analyzed, and the method further includes:

acquiring a target position information range corresponding to a newly added frame sequence except the similar content fragments in the video to be analyzed;

judging whether the newly added frame sequence is an advertisement or not according to the target position information range and a preset advertisement position information range;

if the newly added frame sequence is an advertisement, not carrying out video analysis on the newly added frame sequence;

if the newly added frame sequence is not an advertisement, video analysis is carried out on the newly added frame sequence, and a video analysis result of the newly added frame sequence is obtained.

In a second aspect, the present application provides a video analysis apparatus comprising:

the first acquisition module is used for acquiring a video to be analyzed;

the searching module is used for searching the historical version video corresponding to the video to be analyzed, wherein the historical version video is subjected to video analysis;

the second acquisition module is used for acquiring a first key start frame and a first key end frame of a similar content segment in the historical version video and acquiring a second key start frame and a second key end frame of the similar content segment in the video to be analyzed, wherein the similar content segment refers to a similar frame sequence between the video to be analyzed and the historical version video;

the third acquisition module is used for acquiring a historical video analysis result of the similar content segments in the historical version video according to the first key start frame and the first key stop frame;

and the processing module is used for taking the historical video analysis result as a target video analysis result of the similar content fragments in the video to be analyzed according to the second key start frame and the second key stop frame.

In a third aspect, the present application provides an electronic device, comprising: the device comprises a processor, a memory and a communication bus, wherein the processor and the memory are communicated with each other through the communication bus; the memory is used for storing a computer program; the processor is configured to execute the program stored in the memory, and implement the video analysis method according to the first aspect.

In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which when executed by a processor implements the video analysis method of the first aspect.

Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages: according to the method, the device and the system, aiming at the similar content fragments between the video to be analyzed and the historical version video, the historical video analysis results of the similar content fragments in the historical version video are obtained directly according to the first key start frames and the first key stop frames of the similar content fragments in the historical version video, the historical video analysis results are used as the target video analysis results of the similar content fragments in the video to be analyzed, the video analysis is not needed to be carried out again on the similar content fragments in the video to be analyzed, only the video analysis is needed to be carried out again on the content except the similar content fragments in the video to be analyzed, the AI calculation force is saved to a great extent, the manual auditing is also not needed, the cost and the time of the manual auditing are saved, and the problem that the AI calculation force waste is caused by carrying out again on the video after the video is changed is solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic flow chart of a method for video analysis according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for acquiring a first key start frame and a second key start frame according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for acquiring a first key-terminated frame and a second key-terminated frame according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a video analysis device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

The inventor finds that the modification modes of the existing video mainly include the following modes by analyzing the modification modes of the existing video: (1) Adding or deleting advertisements, and adding or deleting advertisement parts in the video, such as the precursor advertisement; (2) Adding and deleting shots, and adding or deleting normal contents in the video; (3) The material is replaced, the picture content is not changed, and only the original film is changed, for example, the original video is generated by recording copyrighted video, the copyrighted video is obtained later, the new video is copyrighted video, or the original video is subjected to image quality enhancement and the like.

In the embodiment of the application, a video analysis method is provided, which can be applied to a server, and of course, can also be applied to other electronic devices, such as terminals (mobile phones, tablet computers, etc.). In the embodiment of the present application, an example of applying the method to a server will be described.

In the embodiment of the present application, as shown in fig. 1, the video analysis method mainly includes:

and step 101, acquiring a video to be analyzed.

Step 102, searching a historical version video corresponding to the video to be analyzed.

Wherein the historical version of the video has completed the video analysis.

Searching for a historical version video corresponding to the video to be analyzed refers to searching for videos with the same video identification as the video to be analyzed, but different versions and completed video analysis. For example, the video identification of the video to be analyzed is the fifth set of the television drama a, the version is the second version, the video identification of the historical version video is also the fifth set of the television drama a, the version is the first version, the video to be analyzed is only added with a section of advertisement on the basis of the historical version video, and the historical version video is already subjected to video analysis, and has video analysis results.

Step 103, obtaining a first key start frame and a first key end frame of the similar content segment in the historical version video, and obtaining a second key start frame and a second key end frame of the similar content segment in the video to be analyzed.

Wherein, the similar content segments refer to similar frame sequences between the video to be analyzed and the historical version video.

The number of the similar content pieces may be one or at least two, and the present application is not limited to the number of the similar content pieces.

For example, the playing time corresponding to the first key start frame in the historical version video is 3 minutes, the playing time corresponding to the first key stop frame is 13 minutes, the playing time corresponding to the second key start frame in the video to be analyzed is 5 minutes, the playing time corresponding to the second key stop frame is 15 minutes, and the similar content segment refers to a frame sequence between 3 minutes and 13 minutes of the playing time in the historical version video and also refers to a frame sequence between 5 minutes and 15 minutes of the playing time in the video to be analyzed.

In a specific embodiment, the similarity between the first key start frame and the second key start frame is greater than or equal to a similarity threshold, the similarity between the previous frame of the first key start frame and the previous frame of the second key start frame is less than the similarity threshold, the similarity between the first key stop frame and the second key stop frame is greater than or equal to the similarity threshold, and the similarity between the next frame of the first key stop frame and the next frame of the second key stop frame is less than the similarity threshold.

The similarity threshold is a preset value, may be an empirical value, or may be a numerical value obtained by multiple tests. The similarity between the first key start frame and the second key start frame is greater than or equal to a similarity threshold, and the similarity between the previous frame of the first key start frame and the previous frame of the second key start frame is less than the similarity threshold, which indicates that the frames in the historical version video and the video to be analyzed begin to be similar from the first key start frame and the second key start frame. The similarity between the first key-stop frame and the second key-stop frame is greater than or equal to a similarity threshold, and the similarity between the next frame of the first key-stop frame and the next frame of the second key-stop frame is less than the similarity threshold, indicating that the frames in the historical version video and the video to be analyzed are still similar until the first key-stop frame and the second key-stop frame.

The similarity can be calculated using SSIM (Structural Similarity ) and PSNR (Peak Signal to Noise Ratio, peak signal to noise ratio).

In one embodiment, as shown in fig. 2, for any similar content segment, the process of acquiring the first key start frame and the second key start frame includes:

step 201, a first key-terminated frame and a second key-terminated frame of a last similar content segment of any one similar content segment are obtained.

For example, the first key-stop frame of the last similar content segment in the historical version video a is an Amid frame, and the second key-stop frame of the last similar content segment in the video B to be analyzed is a Bmid frame.

Step 202, taking the next frame of the first key-terminated frame of the last similar content segment as the first initial frame, and taking the next frame of the second key-terminated frame of the last similar content segment as the second initial frame.

For example, the first initial frame is an amid+1 frame, and the second initial frame is a bmid+1 frame.

Step 203, a first scene transition frame after the first initial frame is acquired, and a second scene transition frame after the second initial frame is acquired.

For example, the first scene transition frame is a transformA frame and the second scene transition frame is a transformB frame.

In a specific embodiment, the similarity between the previous frame of the first scene transition frame and the first scene transition frame is less than a similarity threshold, and the similarity between the previous frame of the second scene transition frame and the second scene transition frame is less than a similarity threshold.

The similarity between the previous frame of the first scene transition frame and the first scene transition frame is smaller than a similarity threshold value, which indicates that the content of the picture between the previous frame of the first scene transition frame and the first scene transition frame is changed greatly, for example, the previous frame of the first scene transition frame displays a scene outside a classroom door, and the first scene transition frame displays a scene in the classroom after the principal angle enters the classroom. Similarly, the similarity between the previous frame of the second scene change frame and the second scene change frame is smaller than the similarity threshold, which indicates that the picture content between the previous frame of the second scene change frame and the second scene change frame is changed greatly, and the scene change is performed.

Step 204, obtaining a first similarity between the first initial frame and the second scene transition frame, and obtaining a second similarity between the first scene transition frame and the second initial frame.

By acquiring the first similarity between the first initial frame and the second scene transition frame and acquiring the second similarity between the first scene transition frame and the second initial frame, the first key start frame and the second key start frame can be rapidly determined, and the similarity between each frame of the video to be analyzed and each frame of the historical version video does not need to be compared, so that the acquisition speed of the first key start frame and the second key start frame is greatly improved.

In step 205, if the first similarity is greater than or equal to the similarity threshold, the first initial frame is used as a first key start frame, and the second scene transition frame is used as a second key start frame.

For example, the first initial frame is an amid+1 frame, the second initial frame is a bmid+1 frame, the first scene transition frame is a transform frame, the second scene transition frame is a transform frame, and a first similarity between the amid+1 frame and the transform frame is greater than or equal to a similarity threshold, the amid+1 frame is taken as a first key start frame, and the transform frame is taken as a second key start frame.

In step 206, if the second similarity is greater than or equal to the similarity threshold, the first scene transition frame is used as the first key start frame, and the second initial frame is used as the second key start frame.

For example, the first initial frame is an amid+1 frame, the second initial frame is a bmid+1 frame, the first scene transition frame is a transformA frame, the second scene transition frame is a transformB frame, and the second similarity between the transformA frame and the bmid+1 frame is greater than or equal to the similarity threshold, and the transformA frame is taken as a first key start frame, and the bmid+1 frame is taken as a second key start frame.

In a specific embodiment, as shown in fig. 3, for any similar content segment, the process of acquiring the first key-terminated frame and the second key-terminated frame includes:

in step 301, a first termination frame and a second termination frame are acquired.

The playing time length between the first termination frame and the first key start frame is equal to the playing time length between the second termination frame and the second key start frame.

For example, the first key start frame is an amid+1 frame, the second key start frame is a transform b frame, the first end frame is an end frame, the second end frame is a end frame, and the play duration between the end frame and the amid+1 frame is equal to the play duration between the end frame and the transform b frame.

Step 302, taking the first key start frame as a first target start frame, taking the first stop frame as a first target stop frame, taking the second key start frame as a second target start frame, and taking the second stop frame as a second target stop frame.

For example, the first target start frame is an amid+1 frame, the first target end frame is an Aend frame, the second target start frame is a transform b frame, and the second target end frame is a bond frame.

Step 303, obtaining a first intermediate frame between the first target start frame and the first target end frame, and a second intermediate frame between the second target start frame and the second target end frame.

The first intermediate frame between the first target start frame and the first target end frame refers to a frame in which the playing time of the historical version video is corresponding to the first intermediate frame, and the first intermediate frame is obtained by calculating the first intermediate between the playing time of the first target start frame and the playing time of the first target end frame. The second intermediate frame between the second target start frame and the second target end frame refers to calculating a second median between the playing time of the second target start frame and the playing time of the second target end frame, and taking the playing time in the video to be analyzed as a frame corresponding to the second median as the second intermediate frame. For example, the first target start frame is an amid+1 frame, the first target end frame is an end frame, the first intermediate frame is a (amid+1+end)/2 frame, the second target start frame is a transform b frame, the second target end frame is a bond frame, and the second intermediate frame is a (transform b+bond)/2 frame.

Step 304, a third similarity between the first intermediate frame and the second intermediate frame is obtained, and the third similarity is compared with a similarity threshold.

In a specific embodiment, before the third similarity between the first intermediate frame and the second intermediate frame is obtained, the video analysis method further includes: acquiring a first size of a first intermediate frame and a second size of a second intermediate frame; comparing the first size and the second size; and if the first size and the second size are inconsistent, adjusting the first size or the second size to enable the first size and the second size to be consistent.

By adjusting the first size or the second size, the first size and the second size are kept consistent, and errors caused by judgment errors due to size changes are reduced. The frames may be scaled to conform to the smaller frame size by scaling the larger frame by a resampling based on the region pixel relationship.

In step 305, if the third similarity is greater than or equal to the similarity threshold, the first intermediate frame is used as the first target start frame, the second intermediate frame is used as the second target start frame, and the process returns to step 303.

If the third similarity is smaller than the similarity threshold, step 306 is executed, in which the first intermediate frame is used as the first target termination frame, and the second intermediate frame is used as the second target termination frame, and step 303 is executed.

Step 307, until the first intermediate frame is the same as the first target start frame, or the first intermediate frame is the same as the first target end frame, the first intermediate frame is taken as a first key end frame, and the second intermediate frame is taken as a second key end frame.

The first key termination frame and the second key termination frame are determined by acquiring the third similarity between the first intermediate frame and the second intermediate frame and comparing the third similarity with a similarity threshold value, so that the similarity between each frame of the video to be analyzed and each frame of the historical version video is not required to be compared, and the acquisition speed of the first key termination frame and the second key termination frame is greatly improved.

In a specific embodiment, after obtaining the second key start frame and the second key end frame in the video to be analyzed, the video analysis method further includes: acquiring a target position information range corresponding to a newly added frame sequence except for similar content fragments in a video to be analyzed; judging whether the newly added frame sequence is an advertisement or not according to the target position information range and the preset advertisement position information range; if the newly added frame sequence is an advertisement, the video analysis is not carried out on the newly added frame sequence; if the newly added frame sequence is not the advertisement, video analysis is carried out on the newly added frame sequence, and a video analysis result of the newly added frame sequence is obtained.

For example, the target position information range corresponding to the newly added frame sequence is 10 minutes to 11 minutes, the preset advertisement position information range is 10 minutes to 12 minutes, the advertisement time length is 1 minute, only the advertisement position information is approximately 10 minutes to 12 minutes, but the accurate position information of the advertisement cannot be determined, so that the newly added frame sequence is determined to be the advertisement by judging that the advertisement is approximately 10 minutes to 11 minutes and 10 minutes to 12 minutes, video analysis on the advertisement is not needed, the calculation force and the manual auditing time for video analysis on the newly added frame sequence can be saved, and the efficiency of video analysis is further improved. If the newly added frame sequence is not an advertisement, the newly added frame sequence is indicated to be a newly added shot and is a newly added normal picture in the video, and video analysis is required to be carried out on the newly added frame sequence, so that a video analysis result of the newly added frame sequence is obtained.

Step 104, obtaining a historical video analysis result of the similar content segments in the historical version video according to the first key start frame and the first key stop frame.

For example, if the first key start frame is a frame corresponding to 3 minutes of playing time in the historical version video and the first key end frame is a frame corresponding to 13 minutes of playing time in the historical version video, a historical video analysis result of a frame sequence between 3 minutes and 13 minutes of playing time in the historical version video is obtained.

And step 105, taking the historical video analysis result as a target video analysis result of the similar content fragments in the video to be analyzed according to the second key start frame and the second key end frame.

For example, the second key start frame is a frame corresponding to a playing time of 5 minutes in the video to be analyzed, the second key stop frame is a frame corresponding to a playing time of 15 minutes in the video to be analyzed, and a historical video analysis result of a frame sequence between 3 minutes and 13 minutes in the historical version video is used as a target video analysis result of the frame sequence between 5 minutes and 15 minutes in the video to be analyzed.

In summary, in the embodiment of the application, a video to be analyzed is obtained, a historical version video corresponding to the video to be analyzed is searched, wherein the video analysis of the historical version video is completed, a first key start frame and a first key end frame of a similar content segment in the historical version video are obtained, and a second key start frame and a second key end frame of the similar content segment in the video to be analyzed are obtained, wherein the similar content segment refers to a similar frame sequence between the video to be analyzed and the historical version video, a historical video analysis result of the similar content segment in the historical version video is obtained according to the first key start frame and the first key end frame, and the historical video analysis result is used as a target video analysis result of the similar content segment in the video to be analyzed according to the second key start frame and the second key end frame.

According to the method, the device and the system, aiming at the similar content fragments between the video to be analyzed and the historical version video, the historical video analysis results of the similar content fragments in the historical version video are obtained directly according to the first key start frames and the first key stop frames of the similar content fragments in the historical version video, the historical video analysis results are used as the target video analysis results of the similar content fragments in the video to be analyzed, the video analysis is not needed to be carried out again on the similar content fragments in the video to be analyzed, only the video analysis is needed to be carried out again on the content except the similar content fragments in the video to be analyzed, the AI calculation force is saved to a great extent, the manual auditing is also not needed, the cost and the time of the manual auditing are saved, and the problem that the AI calculation force waste is caused by carrying out again on the video after the video is changed is solved.

Based on the same conception, the embodiment of the present application provides a video analysis device, and the specific implementation of the device may be referred to the description of the embodiment of the method, and the repetition is omitted. As shown in fig. 4, the apparatus mainly includes:

a first obtaining module 401, configured to obtain a video to be analyzed;

a searching module 402, configured to search a historical version video corresponding to the video to be analyzed, where the historical version video has completed video analysis;

a second obtaining module 403, configured to obtain a first key start frame and a first key end frame of a similar content segment in the historical version video, and obtain a second key start frame and a second key end frame of the similar content segment in the video to be analyzed, where the similar content segment refers to a similar frame sequence between the video to be analyzed and the historical version video;

a third obtaining module 404, configured to obtain a historical video analysis result of the similar content segments in the historical version video according to the first key start frame and the first key end frame;

and the processing module 405 is configured to use the historical video analysis result as a target video analysis result of the similar content segment in the video to be analyzed according to the second key start frame and the second key end frame.

Based on the same conception, the embodiment of the application also provides an electronic device, as shown in fig. 5, which mainly comprises: processor 501, memory 502 and communication bus 503, wherein processor 501 and memory 502 accomplish the communication between each other through communication bus 503. The memory 502 stores a program executable by the processor 501, and the processor 501 executes the program stored in the memory 502 to implement the following steps:

acquiring a video to be analyzed; searching a historical version video corresponding to the video to be analyzed, wherein the historical version video is subjected to video analysis; acquiring a first key initial frame and a first key termination frame of a similar content segment in a historical version video, and acquiring a second key initial frame and a second key termination frame of the similar content segment in a video to be analyzed, wherein the similar content segment refers to a similar frame sequence between the video to be analyzed and the historical version video; according to the first key initial frame and the first key termination frame, acquiring a historical video analysis result of similar content fragments in the historical version video; and according to the second key start frame and the second key end frame, taking the historical video analysis result as a target video analysis result of the similar content fragments in the video to be analyzed.

The communication bus 503 mentioned in the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated to PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated to EISA) bus, or the like. The communication bus 503 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 5, but not only one bus or one type of bus.

The memory 502 may include random access memory (Random Access Memory, simply RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor 501.

The processor 501 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a digital signal processor (Digital Signal Processing, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA), or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components.

In yet another embodiment of the present application, there is also provided a computer-readable storage medium having stored therein a computer program which, when run on a computer, causes the computer to perform the video analysis method described in the above embodiment.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, by a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, microwave, etc.) means from one website, computer, server, or data center to another. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape, etc.), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of video analysis, comprising:

acquiring a video to be analyzed;

according to the second key start frame and the second key end frame, taking the historical video analysis result as a target video analysis result of the similar content segments in the video to be analyzed;

wherein, for any one of the similar content segments, the acquiring process of the first key start frame and the second key start frame includes:

acquiring the first key termination frame and the second key termination frame of the last similar content segment of any similar content segment;

if the first similarity is greater than or equal to a similarity threshold, the first initial frame is used as the first key start frame, and the second scene transition frame is used as the second key start frame;

2. The video analysis method of claim 1, wherein a similarity between the first key start frame and the second key start frame is greater than or equal to a similarity threshold, a similarity between a previous frame of the first key start frame and a previous frame of the second key start frame is less than the similarity threshold, a similarity between the first key stop frame and the second key stop frame is greater than or equal to the similarity threshold, and a similarity between a next frame of the first key stop frame and a next frame of the second key stop frame is less than the similarity threshold.

3. The video analysis method according to claim 1, wherein a similarity between a previous frame of the first scene transition frame and the first scene transition frame is smaller than the similarity threshold, and a similarity between a previous frame of the second scene transition frame and the second scene transition frame is smaller than the similarity threshold.

4. The video analysis method of claim 2, wherein the acquiring of the first key-stop frame and the second key-stop frame for any one of the similar content segments comprises:

5. The video analysis method of claim 4, wherein prior to obtaining a third similarity between the first intermediate frame and the second intermediate frame, the method further comprises:

comparing the first size and the second size;

6. The video analysis method according to any one of claims 1 to 5, wherein the obtaining the similar content segments is after a second key start frame and a second key end frame in the video to be analyzed, the method further comprising:

7. A video analysis device, comprising:

the first acquisition module is used for acquiring a video to be analyzed;

the processing module is used for taking the historical video analysis result as a target video analysis result of the similar content fragments in the video to be analyzed according to the second key start frame and the second key stop frame;

wherein, the second acquisition module is used for:

8. An electronic device, comprising: the device comprises a processor, a memory and a communication bus, wherein the processor and the memory are communicated with each other through the communication bus; the memory is used for storing a computer program; the processor is configured to execute a program stored in the memory to implement the video analysis method according to any one of claims 1 to 6.

9. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the video analysis method of any one of claims 1 to 6.