CN116017047A - Video analysis method, device, computer equipment and storage medium - Google Patents

Video analysis method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN116017047A
CN116017047A CN202211690922.0A CN202211690922A CN116017047A CN 116017047 A CN116017047 A CN 116017047A CN 202211690922 A CN202211690922 A CN 202211690922A CN 116017047 A CN116017047 A CN 116017047A
Authority
CN
China
Prior art keywords
video
determining
video frame
analyzed
video frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211690922.0A
Other languages
Chinese (zh)
Inventor
尹天舒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202211690922.0A priority Critical patent/CN116017047A/en
Publication of CN116017047A publication Critical patent/CN116017047A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a video analysis method, apparatus, computer device, and storage medium, wherein the method includes: acquiring video data to be analyzed; determining candidate video frames related to preset contents in the video data to be analyzed; determining a target video frame in the candidate video frames based on the position of the candidate video frames in the video data to be analyzed; the target video frame is used for indicating the start and stop positions of the preset content in the video data to be analyzed.

Description

Video analysis method, device, computer equipment and storage medium
Technical Field
The disclosure relates to the technical field of multimedia, and in particular relates to a video analysis method, a video analysis device, computer equipment and a storage medium.
Background
With the development of video networks, more and more people watch video programs through the internet, it is common for these video programs to include essential content and non-essential content, for example, when the video program is a variety program, the non-essential content may be the content of the beginning, the end, the promotion, etc., and the essential content may be the feature film.
When a video management platform manages the video program or a user views the video program through an application program, there is a need to distinguish the essential content from the non-essential content of the video program, however, the difficulty of distinguishing the essential content from the non-essential content of the video program is high, for example, when the video program is a variety program, the similarity between the non-essential content and the essential content is often high, so that the error when labeling the starting and ending positions of the non-essential content or the essential content is high, and the labeling efficiency is low.
Disclosure of Invention
The embodiment of the disclosure at least provides a video analysis method, a video analysis device, computer equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a video analysis method, including:
acquiring video data to be analyzed;
determining candidate video frames related to preset contents in the video data to be analyzed;
determining a target video frame in the candidate video frames based on the position of the candidate video frames in the video data to be analyzed; the target video frame is used for indicating the start and stop positions of the preset content in the video data to be analyzed.
In an alternative embodiment, the determining the candidate video frames related to the preset content in the video data to be analyzed includes:
determining screening key information based on content characteristics corresponding to the preset content;
and determining candidate video frames matched with the screening key information in the video data to be analyzed.
In an alternative embodiment, the determining the candidate video frames related to the preset content in the video data to be analyzed includes:
identifying successive video frames in the video data to be analyzed that include at least partially identical content;
determining a position gap of the at least partially identical content in the successive video frames;
and determining candidate video frames of which the position difference meets a movement condition in the continuous video frames.
In an alternative embodiment, the determining, in the continuous video frames, the candidate video frames whose position differences satisfy the movement condition includes:
determining a direction of displacement of the at least partially identical content between the successive video frames based on the position gap;
determining a displacement distance of the at least partially identical content between the successive video frames based on the position gap;
And determining a preset number of candidate video frames with the displacement direction and the displacement distance meeting the displacement condition in the continuous video frames.
In an alternative embodiment, the determining the candidate video frames related to the preset content in the video data to be analyzed includes:
determining a pixel value corresponding to the video frame in the video data to be analyzed based on the pixel point of the video frame in the video data to be analyzed;
adjacent video frames in which the pixel value difference exceeds the pixel threshold are determined, and candidate video frames are determined among the adjacent video frames.
In an alternative embodiment, the determining a target video frame in the candidate video frames based on the position of the candidate video frame in the video data to be analyzed includes:
determining a first video frame which is positioned and meets a position condition in the candidate video frames;
determining a second video frame adjacent to the first video frame in the candidate video frames;
a time interval between the first video frame and the second video frame is determined, and a target video frame is determined among the first video frame and the second video frame based on the time interval.
In an alternative embodiment, the determining a target video frame among the first video frame and the second video frame based on the time interval includes:
Determining whether the time interval exceeds a time threshold;
determining the second video frame as a target video frame if the time interval exceeds the time threshold;
and determining the first video frame as a target video frame in the condition that the time interval does not exceed the time threshold.
In a second aspect, embodiments of the present disclosure further provide a video analysis apparatus, including:
the acquisition unit is used for acquiring video data to be analyzed;
a first determining unit, configured to determine candidate video frames related to preset content in the video data to be analyzed;
a second determining unit configured to determine a target video frame among the candidate video frames based on a position of the candidate video frame in the video data to be analyzed; the target video frame is used for indicating the start and stop positions of the preset content in the video data to be analyzed.
In a third aspect, embodiments of the present disclosure further provide a computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect, or any of the possible implementations of the first aspect.
In a fourth aspect, the presently disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the first aspect, or any of the possible implementations of the first aspect.
In the embodiment of the disclosure, first, video data to be analyzed may be obtained, and candidate video frames related to preset content may be determined in the video data to be analyzed, where the preset content may be the above-mentioned insubstantial content. Then, a target video frame can be determined in the candidate video frames based on the positions of the candidate video frames in the video data to be analyzed, so that the start-stop positions of the preset contents are determined in the video data to be analyzed based on the target video frames, errors when marking the start-stop positions of the insubstantial contents or the substantive contents are reduced, and marking efficiency is improved.
The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.
FIG. 1 illustrates a flow chart of a video analysis method provided by an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a frame of video frame when an end subtitle is shown in a variety program according to an embodiment of the present disclosure;
FIG. 3 illustrates a flow chart of another video analysis method provided by an embodiment of the present disclosure;
FIG. 4 shows a schematic diagram of a video analysis apparatus provided by an embodiment of the present disclosure;
fig. 5 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The term "and/or" is used herein to describe only one relationship, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
When a video management platform manages a video program or a user views the video program through an application program, there is a need to distinguish the essential content from the non-essential content of the video program, however, the difficulty in distinguishing the essential content from the non-essential content of the video program is high, for example, when the video program is a product program, the similarity between the non-essential content and the essential content is often high, so that the error when labeling the starting and ending positions of the non-essential content or the essential content is high, and the labeling efficiency is low.
Based on the above study, the present disclosure provides a video analysis method, apparatus, computer device, and storage medium. In the embodiment of the disclosure, first, video data to be analyzed may be obtained, and candidate video frames related to preset content may be determined in the video data to be analyzed, where the preset content may be the above-mentioned insubstantial content. Then, a target video frame can be determined in the candidate video frames based on the positions of the candidate video frames in the video data to be analyzed, so that the start-stop positions of the preset contents are determined in the video data to be analyzed based on the target video frames, errors when marking the start-stop positions of the insubstantial contents or the substantive contents are reduced, and marking efficiency is improved.
For the sake of understanding the present embodiment, first, a detailed description will be given of a video analysis method disclosed in an embodiment of the present disclosure, where an execution body of the video analysis method provided in the embodiment of the present disclosure is generally a computer device with a certain computing capability. In some possible implementations, the video analytics method may be implemented by way of a processor invoking computer readable instructions stored in a memory.
In the video analysis method provided by the present disclosure, the computer device may be provided with the video management platform, where the video management platform may perform personalized management on the video program based on the determined start-stop position of the preset content in the video program, for example, when the preset content is the end of the process program, the personalized management may be: the tail is skipped intelligently when the user views. And determining whether the watching progress of the user is the tail of the film or not so as to determine whether the watching is finished based on the watching progress, determining that the user does not watch the whole viewing progress when the watching progress is not at the tail of the film, recommending the variety program to the user at the first page, and otherwise, recommending the variety program to the user at the first page. And (5) putting the associated content of the video program based on the start-stop position of the tail, like recommending the type of video program, jumping the next set, popularizing and the like.
Referring to fig. 1, a flowchart of a video analysis method according to an embodiment of the disclosure is shown, where the method includes steps S101 to S105, where:
s101: and acquiring video data to be analyzed.
In the embodiment of the disclosure, a video program may be acquired through the video management platform, and video data to be analyzed may be determined in the video program, where the video program may be intercepted according to a preset proportion, so that the amount of computation when labeling the video program is reduced.
Specifically, when the video program is intercepted, the area where the content to be identified is located may be estimated based on the result of the estimation, for example, if the content to be identified is a tail in the insubstantial content, the tail is estimated, the determined preset proportion may be 20%, and when the video program is intercepted, the video portion of the last 20% of the video program may be intercepted, so as to obtain the video data to be analyzed.
S103: candidate video frames related to preset content are determined in the video data to be analyzed.
In the embodiment of the present disclosure, the preset content may be content that is desired to be identified in the video program, for example, a trailer in insubstantial content. Before determining candidate video frames related to preset content in the video data to be analyzed, firstly, performing frame extraction processing on the video to be analyzed to obtain a video frame set corresponding to the video frames to be analyzed.
Here, the frame extraction processing may be a second-level frame extraction, that is, extracting a video frame of a fixed frame number in the video data every second, so as to obtain the video frame set, so as to further reduce the operation load of the device. For example, if the number of frames per second of the video data to be analyzed is 24 frames, during the frame extraction process, the first frame of video frames in the 24 frames corresponding to each second may be extracted, so as to obtain a video frame set.
In particular, the second-level frame extraction tool may be selected to implement frame extraction processing on video data, for example, ffmpeg (a video processing tool). In the frame extraction process, a fixed number of video frames may be extracted from the video frames at a fixed interval, for example, the fixed interval may be a first frame in each second, a last frame in each second, or the like, and the fixed number of frames may be one or more frames, which is not particularly limited in this disclosure.
After the video frame set is determined, candidate video frames related to the preset content may be screened in the video frame set. Here, the content characteristics of the preset contents may be summarized in advance, so that filtering may be performed based on the content characteristics, where the content characteristics may be the same content commonly included in different preset contents. For example, when the preset content is the end of the piece, the content feature of the preset content may be a keyword or the like that generally occurs when the end of the piece is located.
S105: determining a target video frame in the candidate video frames based on the position of the candidate video frames in the video data to be analyzed; the target video frame is used for indicating the start and stop positions of the preset content in the video data to be analyzed.
In the embodiment of the present disclosure, the content features corresponding to the preset content may be multiple, so the number of corresponding candidate video frames screened based on the content features may be multiple, where the corresponding positions of the candidate video frames in the video data to be analyzed may be different.
When determining the target video frame in the candidate video frames, firstly, the confidence coefficient of each candidate video frame can be determined, wherein the confidence coefficient can be the probability that the candidate video frame is used for indicating the start and stop positions of the preset content in the video data to be analyzed. Next, candidate video frames whose confidence satisfies the confidence condition may be determined as target video frames. Here, the confidence condition may be that the candidate video frame with the highest confidence is determined as the target video frame, or the candidate video frame with the confidence meeting the probability threshold is determined as the target video frame, so as to improve the accuracy of the determined start-stop position of the preset content in the video data to be analyzed, where the start-stop position may include a start position or an end position.
As can be seen from the foregoing description, in the embodiments of the present disclosure, first, video data to be analyzed may be obtained, and candidate video frames related to preset content may be determined in the video data to be analyzed, where the preset content may be the above-mentioned insubstantial content. Then, a target video frame can be determined in the candidate video frames based on the positions of the candidate video frames in the video data to be analyzed, so that the start-stop positions of the preset contents are determined in the video data to be analyzed based on the target video frames, errors when marking the start-stop positions of the insubstantial contents or the substantive contents are reduced, and marking efficiency is improved.
In an optional embodiment, the step S103 determines candidate video frames related to the preset content in the video data to be analyzed, which specifically includes the following steps:
s11: and determining screening key information based on the content characteristics corresponding to the preset content.
In the embodiment of the disclosure, a large number of video programs can be acquired through the video management platform, preset contents of the acquired video programs are identified, so that identical or similar contents among the preset contents of the video programs are determined, and content characteristics corresponding to the preset contents are determined based on the identical or similar contents.
After determining the content feature, screening key information may be determined based on the content feature, e.g., where the content feature is a show sponsor, the determined screening key information may include "exclusive top name", "special sponsor" keywords, and where the content feature is a show producer, the determined screening key information may include "producer" keywords.
S12: and determining candidate video frames matched with the screening key information in the video data to be analyzed.
After the screening keywords are determined, a video frame set corresponding to the video data to be analyzed can be obtained, text detection and recognition processing are carried out on the video frames in the video frame set, and the recognized text is matched with the keywords in the screening keyword information, so that candidate video frames comprising the keywords are obtained.
In the embodiment of the disclosure, the content characteristics of the preset content can be identified, and the screening key information is determined based on the content characteristics, so that the candidate video frames are screened out based on the screening key information, the confidence of the determined candidate video frames is higher, and the confidence of the target video frames determined based on the candidate video frames is further improved.
In an optional embodiment, the step S103 determines candidate video frames related to the preset content in the video data to be analyzed, which specifically includes the following steps:
s21: successive video frames comprising at least partially identical content in the video data to be analyzed are identified.
In the embodiment of the disclosure, since the end subtitle is often displayed in the preset content, fig. 2 shows a frame of video frame when the end subtitle is displayed in the process program. Based on this, the video frames including the end-of-chip subtitles in the video data to be analyzed can be identified.
It should be understood that, the end subtitle is always displayed by continuously scrolling at a preset scrolling speed, so that when the end subtitle is displayed, the content of the indicated part is the same before the adjacent video frame, and based on this, the continuous video frame including at least part of the same content in the video data to be analyzed can be first identified.
Specifically, the video frame set may be obtained first, and a text box in a video frame in the video frame set may be identified, where the text box includes the identified text content. After identifying the text box, a video frame n frames after the video frame may be acquired to determine whether text boxes of at least partially the same content are in the video frame n frames. If so, successive video frames of the n+1 frames may be acquired. Here, n may be a positive integer, for example, 4.
S22: a position gap of the at least partially identical content in the successive video frames is determined.
S23: and determining candidate video frames of which the position difference meets a displacement condition in the continuous video frames.
In the embodiment of the disclosure, the above-mentioned continuous video frames may be subjected to stacking processing, so that the contents in a plurality of continuous video frames are superimposed on the same layer, so as to determine a position gap between adjacent continuous video frames of the same contents in the plurality of continuous video frames, where the position gap may include a displacement direction and a displacement distance.
Next, it may be determined whether the displacement condition satisfies the above-described position gap, and a candidate video frame is determined based on the continuous video frames satisfying the position gap. Specifically, when the preset content is the end of the slice, the first frame video frame in the continuous video frames can be determined as the candidate video frame.
In the embodiment of the present disclosure, continuous video frames including at least part of the same content may be included in the video data to be analyzed, so as to determine continuous video frames that may include an end-of-a-chip subtitle, so as to determine candidate video frames based on the continuous video frames, where, because the end-of-chip subtitle may be a content in the preset content, the confidence of the determined candidate video frames is higher, and thus the confidence of the target video frames determined based on the candidate video frames is improved.
In an optional embodiment, the step S23, determining, in the continuous video frames, a candidate video frame whose position difference satisfies a movement condition, specifically includes the following steps:
(1) Determining a direction of displacement of the at least partially identical content between the successive video frames based on the position gap;
(2) Determining a displacement distance of the at least partially identical content between the successive video frames based on the position gap;
(3) And determining a preset number of candidate video frames with the displacement direction and the displacement distance meeting the displacement condition in the continuous video frames.
In the embodiment of the disclosure, the at least partially identical content is usually moved in a horizontal or vertical direction during the scroll presentation, so that the moving direction of the at least partially identical content between the continuous video frames can be determined based on the position gap.
Similarly, it can be known from the above that, when displaying the end subtitle, the display is often performed at a preset scrolling speed, so that the displacement distances of at least part of the same content in the continuous video frames are the same, and therefore, the displacement distances of at least part of the same content between the continuous video frames can be determined based on the position difference.
In the embodiment of the present disclosure, the displacement condition defines a displacement direction and a displacement distance of the same content in the continuous video frames, so that the determined continuous video frames satisfying the displacement condition are video frames displaying the end subtitles in a rolling manner. For example, the displacement conditions defining the displacement direction of the same content may be a horizontal direction or a vertical direction, and the defined displacement distance may be determined based on the scroll speed, for example, 20px.
After determining that the continuous video frames meet the displacement direction and the displacement distance, a preset number of candidate video frames can be determined in the continuous video frames, so that the number of the determined candidate video frames is reduced, and the subsequent calculation amount of the device when determining the target video frames based on the candidate video frames is reduced.
In the embodiment of the disclosure, the continuous video frames can be screened based on displacement conditions, so that the continuous video frames possibly including the tail subtitles are determined, the candidate video frames are determined based on the continuous video frames, the confidence of the determined candidate video frames is improved, and the confidence of the target video frames determined based on the candidate video frames is improved.
In an optional embodiment, the step S103 determines candidate video frames related to the preset content in the video data to be analyzed, which specifically includes the following steps:
s31: and determining a pixel value corresponding to the video frame in the video data to be analyzed based on the pixel point of the video frame in the video data to be analyzed.
In the embodiment of the disclosure, the above-mentioned variety program may be displayed by a transition (i.e. video scene transition) manner when displaying the preset content, for example, when the preset content is a trailer, the preset content may be switched from the feature film to the trailer by a transition manner, or when displaying the trailer subtitle, the preset content may be displayed by a multi-screen transition video frame, for example, the nth frame displays sponsors, the n+mth frame transitions, and the producers are displayed.
Based on the above, the transition video frame in the video frame set can be identified, wherein the difference between the pixel value of each pixel point in the transition video frame and the pixel value of each pixel point in the adjacent video frame is larger. For example, when the transition is performed by adopting the black screen transition mode, the pixel value of the pixel point in the transition video frame is lower than 10, and the adjacent video frame of the transition video frame normally renders the picture content, so that the difference between the pixel value of each pixel point in the transition video frame and the pixel value of each pixel point in the transition video frame is larger. Therefore, particularly when the transition video frame is identified, the pixel value of each pixel point in the video frame of the video frame set can be obtained, and the pixel values of each pixel point in the adjacent video frame can be compared.
S32: adjacent video frames in which the pixel value difference exceeds the pixel threshold are determined, and candidate video frames are determined among the adjacent video frames.
After determining the pixel value of each pixel point in the candidate video frame, the candidate video frame can be compared with the adjacent video frame, specifically, the pixel value difference of each pixel point in the adjacent video frame can be determined, a difference matrix is obtained, and whether the difference value indicated by the difference matrix exceeds a pixel threshold value is determined.
In the case that the determined difference value exceeds the pixel threshold, it may be determined that a transition video frame exists in the adjacent video frames, and specifically, the transition video frame is typically the last frame video frame in the adjacent video frames, and the last frame video frame may be determined as a candidate video frame.
In the embodiment of the present disclosure, considering that the content displayed in the video data to be analyzed may be switched to the preset content by a transition manner, the present disclosure determines a transition video frame by comparing difference values of pixel values of adjacent video frames, so as to determine the transition video frame as a candidate video frame, thereby improving the confidence of the determined candidate video frame, and further improving the confidence of the target video frame determined based on the candidate video frame.
In an optional embodiment, the step S105, based on the position of the candidate video frame in the video data to be analyzed, determines a target video frame in the candidate video frame, which specifically includes the following procedures:
s1051: and determining a first video frame which is positioned and meets the position condition in the candidate video frames.
In the embodiment of the disclosure, it is considered that in the above-mentioned variety program, the non-preset content portion may also be detected as a candidate video frame. For example, candidate video frames including the above-described keywords are detected in the non-preset content portion, and thus, the confidence of the candidate video frames needs to be analyzed to determine the target video frames based on the confidence.
In the implementation, first, a first video frame whose position satisfies a position condition may be determined from candidate video frames, where the position condition may be a position with higher relevance to non-preset content, and the first video frame determined based on the position condition has a higher likelihood of being non-preset content, that is, the confidence of the candidate video frame is lower, for example, the position condition may be used to screen a video frame located first in the candidate video frame.
S1052: a second video frame of the candidate video frames that is adjacent to the first video frame is determined.
S1053: a time interval between the first video frame and the second video frame is determined, and a target video frame is determined among the first video frame and the second video frame based on the time interval.
In embodiments of the present disclosure, after determining a first video frame, a second video frame adjacent to the candidate video frame may be determined among the candidate video frames, and a time interval between the first video frame and the second video frame may be determined. And then, determining a comparison result of the time interval and the time threshold, determining a video frame with better confidence from the first video frame and the second video frame based on the comparison result, and determining the video frame as a target video frame.
Based on this, in the embodiments of the present disclosure, the confidence of the first video frame may be determined based on the time interval between the first video frame and the adjacent second video frame in the candidate video frames, so as to determine the target video frame in the first video frame and the second video frame based on the confidence.
In an optional embodiment, step S1053 above, determining, based on the time interval, a target video frame among the first video frame and the second video frame, specifically includes the following procedures:
(1) Determining whether the time interval exceeds a time threshold;
(2) Determining the second video frame as a target video frame if the time interval exceeds the time threshold;
(3) And determining the first video frame as a target video frame if the time interval does not exceed the time threshold.
In the embodiment of the present disclosure, in the case that the time interval exceeds the time threshold, it may be determined that the confidence of the first video frame is worse and smaller than the second video frame; under the condition that the time interval does not exceed the time threshold, the confidence of the first video frame can be determined to be better and larger than that of the second video frame.
Specifically, if the time interval exceeds the time threshold, it may be determined that the confidence of the first video frame is poor, the first video frame is determined to be an invalid video frame, and the second video frame is determined to be a target video frame. If the time interval does not exceed the time threshold, the confidence of the first video frame can be determined to be good, and the first video frame is determined to be the target video frame.
For example, the first video frame is a video frame including the above key, the time interval between the first video frame and the second video frame is 5 minutes, and the time threshold is 1 minute, then the confidence of the first video frame may be determined to be poor, and the first video frame may be determined to be an invalid video frame. After the first video frame is determined to be an invalid video frame, a second video frame may be determined to be a target video frame. In addition, if the time interval between the first video frame and the second video frame is smaller than the time threshold, the confidence of the first video frame may be determined to be better, and the first video frame may be determined to be the target video frame.
In embodiments of the present disclosure, a confidence level of a first video frame may be determined based on a time interval between the first video frame and an adjacent second video frame in the candidate video frames to determine a target video frame in the first video frame and the second video frame based on the confidence level.
Referring to fig. 3, a flowchart of another video analysis method according to an embodiment of the present disclosure includes steps S301 to S306, where:
s301: and acquiring video data to be analyzed.
S302: and performing frame extraction processing on the video data to be analyzed to obtain the video frame set.
In the embodiment of the present disclosure, the frame extraction process of the frame extraction process is described in the embodiment corresponding to fig. 1, and is not described herein.
S303: and performing text detection processing on the video frames in the video frame set to obtain a text detection result.
In the embodiment of the disclosure, the text detection processing may identify text content in the video frame and obtain a text detection result including the text box.
S304: and determining candidate video frames matched with the screening key information based on the text detection result.
In the embodiment of the present disclosure, the method for determining the candidate video frames matching the filtering key information is described in the embodiment corresponding to step S103, and is not described herein again.
S305: based on the text detection result, continuous video frames including at least part of the same content are determined, and candidate video frames of which the position gap of the same content satisfies the movement condition are determined in the continuous video frames.
In the embodiment of the present disclosure, the manner of determining the continuous video frames including at least part of the same content and determining the candidate video frames with the position differences satisfying the movement condition in the continuous video frames is described in the embodiment corresponding to the step S103, and is not repeated here.
S306: based on the processing logic, a target video frame is determined among the candidate video frames.
In the embodiment of the present disclosure, based on the processing logic, the manner of determining the target video frame in the candidate video frames is described in the embodiment corresponding to the step S105, which is not described herein.
In summary, in the embodiment of the disclosure, first, video data to be analyzed may be obtained, and candidate video frames related to a preset content may be determined in the video data to be analyzed, where the preset content may be the above-mentioned insubstantial content. Then, a target video frame can be determined in the candidate video frames based on the positions of the candidate video frames in the video data to be analyzed, so that the start-stop positions of the preset contents are determined in the video data to be analyzed based on the target video frames, errors when marking the start-stop positions of the insubstantial contents or the substantive contents are reduced, and marking efficiency is improved.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Based on the same inventive concept, the embodiments of the present disclosure further provide a video analysis device corresponding to the video analysis method, and since the principle of solving the problem by the device in the embodiments of the present disclosure is similar to that of the video analysis method in the embodiments of the present disclosure, the implementation of the device may refer to the implementation of the method, and the repetition is omitted.
Referring to fig. 4, a schematic diagram of a video analysis device according to an embodiment of the disclosure is provided, where the device includes: an acquisition unit 41, a first determination unit 42, a second determination unit 43; wherein, the liquid crystal display device comprises a liquid crystal display device,
an acquisition unit 41 for acquiring video data to be analyzed;
a first determining unit 42, configured to determine candidate video frames related to preset content in the video data to be analyzed;
a second determining unit 43 for determining a target video frame among the candidate video frames based on the positions of the candidate video frames in the video data to be analyzed; the target video frame is used for indicating the start and stop positions of the preset content in the video data to be analyzed.
In the embodiment of the disclosure, first, video data to be analyzed may be obtained, and candidate video frames related to preset content may be determined in the video data to be analyzed, where the preset content may be the above-mentioned insubstantial content. Then, a target video frame can be determined in the candidate video frames based on the positions of the candidate video frames in the video data to be analyzed, so that the start-stop positions of the preset contents are determined in the video data to be analyzed based on the target video frames, errors when marking the start-stop positions of the insubstantial contents or the substantive contents are reduced, and marking efficiency is improved.
In a possible implementation manner, the first determining unit 42 is further configured to:
determining screening key information based on content characteristics corresponding to the preset content;
and determining candidate video frames matched with the screening key information in the video data to be analyzed.
In a possible implementation manner, the first determining unit 42 is further configured to:
identifying successive video frames in the video data to be analyzed that include at least partially identical content;
determining a position gap of the at least partially identical content in the successive video frames;
And determining candidate video frames of which the position difference meets a movement condition in the continuous video frames.
In a possible implementation manner, the first determining unit 42 is further configured to:
determining a direction of displacement of the at least partially identical content between the successive video frames based on the position gap;
determining a displacement distance of the at least partially identical content between the successive video frames based on the position gap;
and determining a preset number of candidate video frames with the displacement direction and the displacement distance meeting the displacement condition in the continuous video frames.
In a possible implementation manner, the first determining unit 42 is further configured to:
determining a pixel value corresponding to the video frame in the video data to be analyzed based on the pixel point of the video frame in the video data to be analyzed;
adjacent video frames in which the pixel value difference exceeds the pixel threshold are determined, and candidate video frames are determined among the adjacent video frames.
In a possible embodiment, the second determining unit 43 is further configured to:
determining a first video frame which is positioned and meets a position condition in the candidate video frames;
determining a second video frame adjacent to the first video frame in the candidate video frames;
A time interval between the first video frame and the second video frame is determined, and a target video frame is determined among the first video frame and the second video frame based on the time interval.
In a possible embodiment, the second determining unit 43 is further configured to:
determining whether the time interval exceeds a time threshold;
determining the second video frame as a target video frame if the time interval exceeds the time threshold;
and determining the first video frame as a target video frame in the condition that the time interval does not exceed the time threshold.
The process flow of each unit in the apparatus and the interaction flow between units may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.
Corresponding to the video analysis method in fig. 1, the embodiment of the present disclosure further provides a computer device 500, as shown in fig. 5, which is a schematic structural diagram of the computer device 500 provided in the embodiment of the present disclosure, including:
a processor 51, a memory 52, and a bus 53; memory 52 is used to store execution instructions, including memory 521 and external storage 522; the memory 521 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 51 and data exchanged with the external memory 522 such as a hard disk, and the processor 51 exchanges data with the external memory 522 through the memory 521, and when the computer device 500 is operated, the processor 51 and the memory 52 communicate with each other through the bus 53, so that the processor 51 executes the following instructions:
Acquiring video data to be analyzed;
determining candidate video frames related to preset contents in the video data to be analyzed;
determining a target video frame in the candidate video frames based on the position of the candidate video frames in the video data to be analyzed; the target video frame is used for indicating the start and stop positions of the preset content in the video data to be analyzed.
The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the video analysis method described in the method embodiments above. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.
The embodiments of the present disclosure further provide a computer program product, where the computer program product carries a program code, where instructions included in the program code may be used to perform the steps of the video analysis method described in the foregoing method embodiments, and specifically reference the foregoing method embodiments will not be described herein.
Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present disclosure, and are not intended to limit the scope of the disclosure, but the present disclosure is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, it is not limited to the disclosure: any person skilled in the art, within the technical scope of the disclosure of the present disclosure, may modify or easily conceive changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features thereof; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (10)

1. A method of video analysis, comprising:
acquiring video data to be analyzed;
determining candidate video frames related to preset contents in the video data to be analyzed;
determining a target video frame in the candidate video frames based on the position of the candidate video frames in the video data to be analyzed; the target video frame is used for indicating the start and stop positions of the preset content in the video data to be analyzed.
2. The method of claim 1, wherein said determining candidate video frames associated with the preset content in the video data to be analyzed comprises:
determining screening key information based on content characteristics corresponding to the preset content;
and determining candidate video frames matched with the screening key information in the video data to be analyzed.
3. The method of claim 1, wherein said determining candidate video frames associated with the preset content in the video data to be analyzed comprises:
identifying successive video frames in the video data to be analyzed that include at least partially identical content;
determining a position gap of the at least partially identical content in the successive video frames;
And determining candidate video frames of which the position difference meets a movement condition in the continuous video frames.
4. A method according to claim 3, wherein said determining candidate video frames in said continuous video frames for which said position gap satisfies a movement condition comprises:
determining a direction of displacement of the at least partially identical content between the successive video frames based on the position gap;
determining a displacement distance of the at least partially identical content between the successive video frames based on the position gap;
and determining a preset number of candidate video frames with the displacement direction and the displacement distance meeting the displacement condition in the continuous video frames.
5. The method of claim 1, wherein said determining candidate video frames associated with the preset content in the video data to be analyzed comprises:
determining a pixel value corresponding to the video frame in the video data to be analyzed based on the pixel point of the video frame in the video data to be analyzed;
adjacent video frames in which the pixel value difference exceeds the pixel threshold are determined, and candidate video frames are determined among the adjacent video frames.
6. The method of claim 1, wherein the determining a target video frame among the candidate video frames based on the location of the candidate video frame in the video data to be analyzed comprises:
Determining a first video frame which is positioned and meets a position condition in the candidate video frames;
determining a second video frame adjacent to the first video frame in the candidate video frames;
a time interval between the first video frame and the second video frame is determined, and a target video frame is determined among the first video frame and the second video frame based on the time interval.
7. The method of claim 6, wherein determining a target video frame among the first video frame and the second video frame based on the time interval comprises:
determining whether the time interval exceeds a time threshold;
determining the second video frame as a target video frame if the time interval exceeds the time threshold;
and determining the first video frame as a target video frame in the condition that the time interval does not exceed the time threshold.
8. A video analysis device, comprising:
the acquisition unit is used for acquiring video data to be analyzed;
a first determining unit, configured to determine candidate video frames related to preset content in the video data to be analyzed;
A second determining unit configured to determine a target video frame among the candidate video frames based on a position of the candidate video frame in the video data to be analyzed; the target video frame is used for indicating the start and stop positions of the preset content in the video data to be analyzed.
9. A computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of the video analysis method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the video analysis method according to any of claims 1 to 7.
CN202211690922.0A 2022-12-27 2022-12-27 Video analysis method, device, computer equipment and storage medium Pending CN116017047A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211690922.0A CN116017047A (en) 2022-12-27 2022-12-27 Video analysis method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211690922.0A CN116017047A (en) 2022-12-27 2022-12-27 Video analysis method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116017047A true CN116017047A (en) 2023-04-25

Family

ID=86022432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211690922.0A Pending CN116017047A (en) 2022-12-27 2022-12-27 Video analysis method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116017047A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015191333A1 (en) * 2014-06-11 2015-12-17 Arris Enterprises, Inc. Detection of demarcating segments in video
CA2971176A1 (en) * 2014-12-19 2016-06-23 Benedito J. Fonseca Jr. Detection of failures in advertisement replacement
CN108882057A (en) * 2017-05-09 2018-11-23 北京小度互娱科技有限公司 Video abstraction generating method and device
WO2019182834A1 (en) * 2018-03-20 2019-09-26 Hulu, LLC Content type detection in videos using multiple classifiers
CN112291589A (en) * 2020-10-29 2021-01-29 腾讯科技(深圳)有限公司 Video file structure detection method and device
CN113920465A (en) * 2021-10-29 2022-01-11 北京达佳互联信息技术有限公司 Method and device for identifying film trailer, electronic equipment and storage medium
CN114220057A (en) * 2021-12-16 2022-03-22 北京奇艺世纪科技有限公司 Video trailer identification method and device, electronic equipment and readable storage medium
CN114550070A (en) * 2022-03-08 2022-05-27 腾讯科技(深圳)有限公司 Video clip identification method, device, equipment and storage medium
CN115278300A (en) * 2022-07-28 2022-11-01 腾讯科技(深圳)有限公司 Video processing method, video processing apparatus, electronic device, storage medium, and program product

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015191333A1 (en) * 2014-06-11 2015-12-17 Arris Enterprises, Inc. Detection of demarcating segments in video
CA2971176A1 (en) * 2014-12-19 2016-06-23 Benedito J. Fonseca Jr. Detection of failures in advertisement replacement
CN108882057A (en) * 2017-05-09 2018-11-23 北京小度互娱科技有限公司 Video abstraction generating method and device
WO2019182834A1 (en) * 2018-03-20 2019-09-26 Hulu, LLC Content type detection in videos using multiple classifiers
CN112291589A (en) * 2020-10-29 2021-01-29 腾讯科技(深圳)有限公司 Video file structure detection method and device
CN113920465A (en) * 2021-10-29 2022-01-11 北京达佳互联信息技术有限公司 Method and device for identifying film trailer, electronic equipment and storage medium
CN114220057A (en) * 2021-12-16 2022-03-22 北京奇艺世纪科技有限公司 Video trailer identification method and device, electronic equipment and readable storage medium
CN114550070A (en) * 2022-03-08 2022-05-27 腾讯科技(深圳)有限公司 Video clip identification method, device, equipment and storage medium
CN115278300A (en) * 2022-07-28 2022-11-01 腾讯科技(深圳)有限公司 Video processing method, video processing apparatus, electronic device, storage medium, and program product

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
X. -C. YIN 等: "Text Detection, Tracking and Recognition in Video: A Comprehensive Survey", IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 25, no. 6, 14 April 2016 (2016-04-14), pages 2752 - 2773, XP011607865, DOI: 10.1109/TIP.2016.2554321 *
成培: "广西广播电视智慧监测监管一体化平台设计思路", 广播电视信息, vol. 28, no. 01, 15 January 2021 (2021-01-15), pages 93 - 97 *

Similar Documents

Publication Publication Date Title
US20220070405A1 (en) Detection of Transitions Between Text and Non-Text Frames in a Video Stream
Smeaton et al. Video shot boundary detection: Seven years of TRECVid activity
US8340498B1 (en) Extraction of text elements from video content
US9043860B2 (en) Method and apparatus for extracting advertisement keywords in association with situations of video scenes
EP3189469B1 (en) A method for selecting frames from video sequences based on incremental improvement
CN113613065B (en) Video editing method and device, electronic equipment and storage medium
US10679069B2 (en) Automatic video summary generation
EP2034426A1 (en) Moving image analyzing, method and system
US9418297B2 (en) Detecting video copies
US20140344853A1 (en) Comment information generation device, and comment display device
US8947600B2 (en) Methods, systems, and computer-readable media for detecting scene changes in a video
CN105718861A (en) Method and device for identifying video streaming data category
US20130088645A1 (en) Method of Processing Moving Picture and Apparatus Thereof
CN111836118B (en) Video processing method, device, server and storage medium
US10779036B1 (en) Automated identification of product or brand-related metadata candidates for a commercial using consistency between audio and image elements of products or brands detected in commercials
CN105657514A (en) Method and apparatus for playing video key information on mobile device browser
JP2011203790A (en) Image verification device
US9934449B2 (en) Methods and systems for detecting topic transitions in a multimedia content
US10237610B1 (en) Automated identification of product or brand-related metadata candidates for a commercial using persistence of product or brand-related text or objects in video frames of the commercial
CN111709762B (en) Information matching degree evaluation method, device, equipment and storage medium
CN110769291B (en) Video processing method and device, electronic equipment and storage medium
CN116017047A (en) Video analysis method, device, computer equipment and storage medium
US11483617B1 (en) Automoted identification of product or brand-related metadata candidates for a commercial using temporal position of product or brand-related text or objects, or the temporal position and audio, in video frames of the commercial
US10306304B1 (en) Automated identification of product or brand-related metadata candidates for a commercial using dominance and prominence of product or brand-related text or objects in video frames of the commercial
CN116017036A (en) Audio and video analysis method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination