CN111556338A - Method for detecting region in video, method and device for fusing information and storage medium - Google Patents

Method for detecting region in video, method and device for fusing information and storage medium Download PDF

Info

Publication number
CN111556338A
CN111556338A CN202010447859.2A CN202010447859A CN111556338A CN 111556338 A CN111556338 A CN 111556338A CN 202010447859 A CN202010447859 A CN 202010447859A CN 111556338 A CN111556338 A CN 111556338A
Authority
CN
China
Prior art keywords
video
area
information
detected
geometric
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010447859.2A
Other languages
Chinese (zh)
Other versions
CN111556338B (en
Inventor
张润泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010447859.2A priority Critical patent/CN111556338B/en
Publication of CN111556338A publication Critical patent/CN111556338A/en
Application granted granted Critical
Publication of CN111556338B publication Critical patent/CN111556338B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/2668Creating a channel for a dedicated end-user group, e.g. insertion of targeted commercials based on end-user profiles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a detection method, an information fusion method and device for regions in a video and a storage medium. The method comprises the following steps: acquiring a video to be detected; selecting a target video frame in the video to be detected; determining a geometric region of a target object in the target video frame based on the key points of the target video frame; and when determining that the key points of other video frames in the video to be detected respectively fall into the geometric surface where the geometric area is located, determining the geometric area meeting the preset size condition as an information promotion area. By adopting the method, the information popularization area can be quickly selected from the video.

Description

Method for detecting region in video, method and device for fusing information and storage medium
Technical Field
The present application relates to the field of video processing technologies, and in particular, to a method and an apparatus for detecting a region in a video, a method and an apparatus for information fusion, and a storage medium.
Background
With the continuous development of video technology and internet technology, a great number of users can conveniently watch various videos through intelligent terminals. For an information popularizing party, it is important to research how a user can view information other than a video while watching the video.
In the traditional scheme, an information popularizing party can play a video generally, and a geometric area which can be used for popularizing information is manually selected from a corresponding video frame in the process of playing the video, so that the corresponding popularizing information is inserted into the geometric area, and the popularizing information is fused into the corresponding video frame. However, when there are many videos, each video needs to be played completely to select a geometric region that can be used for promotion information, so that the efficiency of selecting regions in the video is low.
Disclosure of Invention
In view of the above, it is necessary to provide a method, an apparatus and a storage medium for detecting an area in a video, which can quickly select an information popularization area from the video.
A method of detecting regions in a video, the method comprising:
acquiring a video to be detected;
selecting a target video frame in the video to be detected;
determining a geometric region of a target object in the target video frame based on the key points of the target video frame;
and when determining that the key points of other video frames in the video to be detected respectively fall into the geometric surface where the geometric area is located, determining the geometric area meeting the preset size condition as an information promotion area.
In one embodiment, before the performing the homography detection on the new keypoint in the previous frame, the method further includes:
carrying out optical flow tracking on the new key point of the previous frame to obtain a third optical flow value of the new key point;
when the third optical flow value of the new keypoint reaches the optical flow threshold, then performing the step of homography detection on the new keypoint in the last frame.
In one embodiment, the homography detecting the new keypoints in the previous frame comprises:
and judging whether the new key point falls into the geometric surface of the geometric area or not by adopting a homography judgment model constructed by the homography matrix of the geometric area.
An apparatus for detection of regions in a video, the apparatus comprising:
the acquisition module is used for acquiring a video to be detected;
the selection module is used for selecting a target video frame in the video to be detected;
the first determination module is used for determining a geometric area of a target object in the target video frame based on key points of the target video frame;
and the second determining module is used for determining the geometric area meeting the preset size condition as the information promotion area when determining that the key points of other video frames in the video to be detected respectively fall into the geometric surface where the geometric area is located.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a video to be detected;
selecting a target video frame in the video to be detected;
determining a geometric region of a target object in the target video frame based on the key points of the target video frame;
and when determining that the key points of other video frames in the video to be detected respectively fall into the geometric surface where the geometric area is located, determining the geometric area meeting the preset size condition as an information promotion area.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a video to be detected;
selecting a target video frame in the video to be detected;
determining a geometric region of a target object in the target video frame based on the key points of the target video frame;
and when determining that the key points of other video frames in the video to be detected respectively fall into the geometric surface where the geometric area is located, determining the geometric area meeting the preset size condition as an information promotion area.
According to the method, the device, the computer equipment and the storage medium for detecting the area in the video, the target video frame in the video to be detected is selected, the geometric area of the target object in the target video frame can be determined through the key points of the target video frame, whether the key points of other video frames fall into the geometric surface where the geometric area is located is judged, when the key points of other video frames fall into the geometric surface where the geometric area is located, the geometric area meeting the preset size condition is determined as the information popularization area, the video frame containing the information popularization area with enough time length can be obtained without manual operation, the situation that the geometric area which can be used for popularizing information in the video can be selected only by watching the video is avoided, the selection time of the information popularization area is shortened, and the selection efficiency of the information popularization area is improved. In addition, the geometric area meeting the preset size condition is determined as the information popularization area, and the obtained information popularization area can be effectively ensured to have sufficient application value.
A method of information fusion, the method comprising:
acquiring a video to be detected containing an information popularization area;
selecting a corresponding target information promotion area when the value meets a preset value condition from the information promotion area;
acquiring popularization information for augmented reality;
inserting the popularization information into each video frame containing the target information popularization area;
and outputting the video to be detected with the inserted promotion information.
In one embodiment, the value is a value of the information promotion area, and is determined according to a size of the information promotion area in a corresponding video frame and a time length of the information promotion area appearing in the video to be detected;
the selecting of the target information promotion area corresponding to the condition that the value meets the preset value condition from the information promotion area comprises:
playing the video to be detected;
when the video frame containing the information promotion area is played, the playing is paused;
and selecting a corresponding target information promotion area when the value meets a preset value condition according to an input selection instruction from the information promotion area.
The first acquisition module is used for acquiring a to-be-detected video containing an information popularization area;
the selection module is used for selecting a corresponding target information promotion area when the value meets a preset value condition from the information promotion area;
the second acquisition module is used for acquiring popularization information for augmented reality;
the inserting module is used for inserting the popularization information into each video frame containing the target information popularization area;
and the output module is used for outputting the video to be detected inserted with the promotion information.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a video to be detected containing an information popularization area;
selecting a corresponding target information promotion area when the value meets a preset value condition from the information promotion area;
acquiring popularization information for augmented reality;
inserting the popularization information into each video frame containing the target information popularization area;
and outputting the video to be detected with the inserted promotion information.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a video to be detected containing an information popularization area;
selecting a corresponding target information promotion area when the value meets a preset value condition from the information promotion area;
acquiring popularization information for augmented reality;
inserting the popularization information into each video frame containing the target information popularization area;
and outputting the video to be detected with the inserted promotion information.
According to the information fusion method, the device, the computer equipment and the storage medium, because the acquired video to be detected already comprises the information popularization area, the corresponding target information popularization area with the value meeting the preset value condition can be directly selected, and then the popularization information for augmented reality is inserted into each video frame of the target information popularization area, so that the geometric area which can be used for the popularization information can be selected without watching the complete video when the popularization information is fused, the time for watching the complete video and the selection time for manually selecting the geometric area are avoided, and the selection efficiency of the information popularization area and the efficiency for fusing the popularization information are effectively improved.
Drawings
FIG. 1 is a diagram of an exemplary embodiment of a method for detecting regions in a video and a method for fusing information;
FIG. 2 is a flow diagram illustrating a method for detecting regions in a video according to one embodiment;
FIG. 3a is a schematic flow chart diagram illustrating a method for information fusion in one embodiment;
FIG. 3b is a timing diagram for detecting regions in a video and fusing information in one embodiment;
FIG. 4 is a flow diagram illustrating an initialization phase and a multi-plane tracking phase, according to one embodiment;
FIG. 5 is a schematic diagram of tracking planes on a building for video in one embodiment;
FIG. 6 is a diagram illustrating placement of advertisements in a video, in accordance with one embodiment;
FIG. 7 is a flow diagram that illustrates the placement of advertisements in a video, in accordance with one embodiment;
FIG. 8 is a block diagram of an apparatus for detecting regions in a video according to an embodiment;
FIG. 9 is a block diagram showing an arrangement for detecting regions in a video according to another embodiment;
FIG. 10 is a block diagram showing the construction of an information fusion apparatus according to another embodiment;
FIG. 11 is a diagram of the internal structure of a computer device in one embodiment;
fig. 12 is an internal structural view of a computer device in another embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The detection method and the information fusion method for the region in the video can be applied to an application target object shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The server 104 acquires a video to be detected; selecting a target video frame in a video to be detected; determining a geometric area of a target object in the target video frame based on the key points of the target video frame; when the key points of other video frames in the video to be detected are determined to respectively fall into the geometric surface where the geometric area is located, the geometric area meeting the preset size condition is determined as the information popularization area, and therefore the video frame to be detected with the determined information popularization area is obtained.
The terminal 102 acquires a video to be detected containing the information popularization area from the server 104; selecting a target information promotion area corresponding to the condition that the value meets a preset value condition from the information promotion area; acquiring popularization information for augmented reality; inserting the promotion information into each video frame containing the target information promotion area; and outputting the video to be detected with the inserted promotion information to the front end of the terminal 102 for playing.
The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, a method for detecting an area in a video is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:
s202, acquiring the video to be detected.
The video to be detected may be a geometric area (i.e., an information popularization area) in which whether information popularization is possible or not in the video needs to be detected. When the information promotion area exists in the video to be detected, the promotion information can be inserted into the information promotion area.
In one embodiment, the server receives an input acquisition instruction, and acquires one or more videos to be detected specified by the instruction from a video library.
In another embodiment, the video to be detected carries the tag to be detected, and the server acquires the video to be detected carrying the tag to be detected from the video library, or acquires the video to be detected carrying the tag to be detected and being stored recently from the video library.
In another embodiment, the server obtains the video to be detected from the terminal for video production. Specifically, when the terminal finishes making the video, the terminal is connected with the server, the made video is transmitted to the server as the video to be detected, and the server receives the video to be detected sent by the terminal.
And S204, selecting a target video frame in the video to be detected.
Wherein the video frame is a frame image constituting a video to be detected. The video to be detected contains a plurality of video frames, for example, each second video segment in one video to be detected can contain 24 frames, 30 frames or 36 frames of images, and each video to be detected can have m second video segments.
The target video frame may refer to a video frame corresponding to a video frame when an optical flow value between a video frame immediately above the target video frame or a video frame spaced apart from the target video frame by at least one frame reaches an optical flow threshold. For example, the optical flow value a between the ith video frame and the (i-1) th video frame of the video to be detected is large enough to reach the optical flow threshold b, and then the ith video frame is the target video frame. For another example, if the optical flow value c between the i-th video frame and the i-2 th video frame of the video to be detected is large enough to reach the optical flow threshold b, the i-th video frame is the target video frame.
The optical flow may be the motion of an object, a scene, or an object caused by the camera moving between two consecutive video frames. Correspondingly, the optical flow value may be the magnitude of the optical flow, which may represent the amount of displacement change of the target object in the video frame. The target object may include a person, an animal, a building, other objects, and the like.
In one embodiment, S204 may specifically include: the server calculates a first optical flow value between video frames in the video to be detected; and when the first optical flow value in each video frame reaches the optical flow threshold value, taking the video frame corresponding to the optical flow threshold value as a target video frame.
In one embodiment, the server may calculate a first optical flow value between adjacent video frames in the video to be detected; and in each adjacent video frame, if the first optical flow value of the target adjacent video frame reaches the optical flow threshold value, the next video frame in the target adjacent video frame is taken as the target video frame. The adjacent video frames may refer to two adjacent video frames, for example, an i-1 th video frame and an i-th video frame in the video to be detected are adjacent video frames, and the i-th video frame and an i +1 th video frame are adjacent video frames.
In another embodiment, the server may calculate a first optical flow value between video frames spaced apart from each other in the video to be detected; in the video frames with intervals, if the first optical flow value of the target interval video frame reaches the optical flow threshold value, the next video frame in the target interval video frame is used as the target video frame. The video frames having an interval therebetween may refer to two video frames spaced apart from each other by at least one frame, and the interval may be an interval value set according to an actual situation. For example, the i-2 th video frame and the i-th video frame in the video to be detected are video frames with intervals. Therefore, optical flow tracking is not needed to be carried out on each video frame in the video to be detected, and the calculation amount of the optical flow value is reduced.
In an embodiment, the step of calculating the first optical flow value between video frames in the video to be detected may specifically include: the server detects key points of a designated video frame in a video to be detected; and carrying out optical flow tracking on the detected key points in the video to be detected to obtain a first optical flow value between video frames in the video to be detected.
The video to be detected may include video segments of at least one shot, and the video segments of different shots may be video segments obtained by shooting an object (the object includes a person, an animal, other objects, and the like) at different angles or in different lens moving manners. For two video clips of different shots, the object may be different, or the object may be the same but the presented viewing angle is different, for example, the video clip of the first shot captures the object in a static manner, and the video clip of the second shot captures the object in a rotating manner.
The designated video frame may be the first frame in the video clips of different shots, for example, when performing region detection on a video to be detected, the key points of the first frame (i.e., the first video frame) in the video clips of different shots are detected first, and all the key points of the first frame are obtained. It should be noted that the video to be detected may refer to a video with a slice header removed.
The key points may be corner points of the object, such as four corner points of a computer or a mobile phone.
In one embodiment, the server detects feature points of a specified video frame in a video to be detected to obtain key points of the specified video frame. Then, the server sequentially carries out optical flow tracking in subsequent video frames of the specified video frames so as to track the positions of the detected key points appearing in the subsequent video frames, and calculates first optical flow values of the key points between different video frames according to the positions of the key points in different video frames, namely the first optical flow values of the key points in the period of time between different video frames.
For example, the server may perform Feature point detection on the specified video frame by using a Scale-invariant Feature Transform (SIFT) algorithm to obtain all the key points of the specified video frame. And then Tracking the key points detected by the appointed video frames by adopting a KLT (Kanade-Lucas-Tomasi Tracking) corner point Tracking algorithm.
S206, determining the geometric area of the target object in the target video frame based on the key points of the target video frame.
The object includes, but is not limited to, a person, an animal, other objects, and the like, and the other objects may be indoor or outdoor walls, buildings, tables, chairs, and stone walls, and the like. The geometric area may be a plane or curved area having a geometric shape, and in the following embodiments, the geometric area is exemplified as a plane area.
In one embodiment, after determining the target video frame, the server extracts the geometric region of the target object from the target video frame according to the key points tracked by the optical flow tracking.
In one embodiment, when the first optical-flow value is sufficiently large (i.e., reaches the optical-flow threshold), the corresponding video frame is taken as the target video frame, and then extraction of the geometric region is performed. Specifically, S206 may specifically include: the server carries out homography detection on key points of a target video frame; and when target key points which meet the homography condition and the number of the key points which meet the homography condition reaches a number threshold exist in the key points of the target video frame, extracting the geometric area where the target key points are located.
The homography can be a projection mapping of an object or a feature point from one video frame to another video frame, and can be used for describing the transformation situation of the object or the feature point in different video frames (which can be understood as different visual angles). For example, assuming that video frame a and video frame B are video frames taken at two different perspectives, the homography at this time may be a case where the object or the feature point is transformed at the two different perspectives.
For example, the server detects homographies of key points in the target video frame by using a RANSAC (RANdom SAmple Consensus) algorithm.
In an embodiment, the step of performing homography detection on the key points of the target video frame may specifically include: the method comprises the steps that a server extracts an initial geometric area by using a part of key points, then homography detection is carried out on the other part of key points by respectively adopting a RANSAC algorithm, if the detected key points meet a homography condition, the key points are positioned on a geometric surface where the initial geometric area is positioned, the key points are added into the initial geometric area, and the homography detection is stopped until the number of the key points in the area reaches a number threshold value or the number of the undetected key points is smaller than the number threshold value, so that a final geometric area is obtained. It should be noted that when the key point is added to the initial geometric area, the size of the obtained geometric area is increased, i.e. the size of the geometric area is larger than that of the initial geometric area. The size may be used to indicate the length and width of the geometric region, or may indicate the area of the geometric region.
In an embodiment, the step of performing homography detection on the key points of the target video frame may specifically include: the server determines a homography matrix; constructing a homography judgment model according to the homography matrix; and judging whether the key points of the target video frame are on the same plane or not through the homography judgment model.
In one embodiment, the server calculates a homography matrix from a target video frame and a previous video frame of the target video frame. Specifically, the server calculates the position of the key point in the target video frame and the position (i.e. pixel position) of the previous video frame during optical flow tracking of the key point, and a homography matrix can be calculated according to the two positions.
For example, the server calculates the position X of a part of key points (e.g., a part of key points extracted from all key points) in the previous video frame, then calculates the position Y of the part of key points in the target video frame by using an optical flow tracking algorithm, and calculates the homography matrix H of the geometric surface by combining the calculation formula Y ═ HX generated by all tracked key points. When the homography matrix of the geometric surface is calculated, a homography judgment model may be constructed by using the homography matrix H, where an algorithm of the homography judgment model may be y ═ Hx, where y is used to represent a position of a key point in a target video frame, x is used to represent a position of a key point in a previous video frame of the target video frame, and H is a homography matrix of a different geometric surface, and if the key point in the target video frame and the key point in the previous video frame are on the same geometric surface (e.g., the homography matrix H of the geometric surface), positions of both key points satisfy y ═ Hx (i.e., a homography condition).
And S208, when the key points of other video frames in the video to be detected are determined to respectively fall into the geometric surfaces where the geometric areas are located, determining the geometric areas meeting the preset size condition as the information popularization areas.
Wherein, the information promotion region may refer to a region for fusing promotion information so as to perform information promotion.
When at least one geometric area of the target video frame is obtained, the non-key frame before the target video frame in the video to be detected can be traced back, so that the key points in the non-key frame before the target video frame are added into the corresponding geometric area. In addition, when at least one geometric area of the target video frame is obtained, optical flow tracking can be performed on key points in the video frame behind the target video frame according to the geometric area, and new key points are detected, so that whether the new key points are located on a geometric surface where the geometric area is located can be judged.
Therefore, S208 can be described in the following two scenarios:
scene 1, a non-key frame (a static frame with a light stream value of zero or small enough) before a target video frame is backtracked to obtain a geometric region of the static frame.
In one embodiment, a non-key frame in the video to be detected before the target video frame is a still frame; the first optical-flow value between the stationary frames is less than the optical-flow threshold. The method further comprises the following steps: the server judges whether the key points of the static frame are on the same plane or not through the homography judgment model; when the key points of the static frame are in the same plane, judging that the key points of the static frame fall into the geometric plane where the geometric area is located, and obtaining the geometric area in the static frame. The geometric area may be a planar area or a curved area.
For example, assuming that n frames are total in the video to be detected, when SIFT detection is performed on a first video frame (i.e., a first frame) of the video to be detected to obtain a key point, then a KLT corner point tracking algorithm is adopted to track the key point in the video frame to be detected, and when optical flow values of an i-1 th video frame and an i-th video frame are large enough (i.e., an optical flow threshold value is reached), a plane area of a target object is extracted from the i-th video frame; if the optical flow value between the first video frame and the i-1 video frame is zero or small enough, the first video frame to the i-1 video frame are represented as static frames, and when a plane area in the i video frame is obtained, the static frames are traced back to judge whether key points of the static frames are on the same plane or not; when the key points of the static frame are in the same plane, judging that the key points of the static frame fall into the plane where the plane area is located, thereby obtaining the plane area in the static frame, and further obtaining the plane areas in the first video frame to the ith video frame. It should be noted that, in general, no change occurs to a stationary object, and therefore, the key points of the stationary object still belong to the same plane in different stationary frames. Wherein i is a positive integer greater than 1 and less than or equal to n.
And 2, tracking a geometric surface (the geometric surface can be a plane) of the video frame after the target video frame.
In one embodiment, the server performs optical flow tracking on key points falling into a geometric area in a target video frame to obtain a second optical flow value of the geometric area; when the second optical flow value reaches the optical flow threshold value, the video frame corresponding to the optical flow threshold value is taken as a key frame of the geometric area; detecting a new key point in a previous frame of the key frames; carrying out homography detection on the new key points in the previous frame; and when the new key point in the previous frame meets the homography condition, determining the geometric surface where the new key point falls into the geometric area.
Wherein, the previous frame refers to the previous video frame of the key frame.
After backtracking the still frame, enter the plane tracking stage. In one embodiment, the server performs optical flow tracking on the key points falling into the geometric area in the video frames after the target video frame, calculates optical flow values of the key points in the geometric area, and then averages the optical flow values to obtain a second optical flow value of the geometric area.
For example, assuming that n frames are shared by the video to be detected, the ith frame is a target video frame, and after planar areas of the first video frame to the ith video frame are obtained (i.e., after initialization is completed), optical flow tracking is performed on key points falling into the planar areas in the (i + 1) th video frame to the nth video frame, and optical flow values of the key points falling into the planar areas are calculated, so that second optical flow values of the planar areas in the (i + 1) th video frame to the nth video frame are obtained. If the second optical flow value of the plane area of the (i + 1) th video frame reaches the optical flow threshold value, the (i + 1) th video frame is represented as a key frame of the plane area; and if the second optical flow value of the plane area of the i + j video frame reaches the optical flow threshold value, indicating that the i + j video frame is a key frame of the plane area.
When the (i + j) th video frame is a key frame of the plane area, the server detects a new key point in the last frame (i.e. the (i + j-1) th video frame) of the key frame, and then performs homography detection on the new key point detected in the (i + j-1) th video frame to judge whether the new key point falls into the plane of the plane area.
Wherein i is a positive integer greater than 1, j is a positive integer greater than or equal to 1, and i + j is less than or equal to n.
In one embodiment, before the step of performing homography detection on the new keypoint in the previous frame, the method further comprises: the server carries out optical flow tracking on the new key point of the previous frame to obtain a third optical flow value of the new key point; and when the third optical flow value of the new key point reaches the optical flow threshold value, executing the step of performing homography detection on the new key point in the last frame.
Specifically, the server performs optical flow tracking on the new key point of the previous frame in the key frame, so as to obtain a third optical flow value of the new key point. For example, when the i + j video frame is a key frame of the plane area, the server performs optical flow tracking on a new key point detected in the i + j-1 video frame in the i + j video frame to obtain a third optical flow value of the new key point, and when the third optical flow value of the new key point reaches an optical flow threshold, the server starts to perform homography detection on the new key point in the previous frame.
In another embodiment, the server may further perform optical flow tracking on the new keypoint of the previous frame in the key frame and the video frames subsequent to the key frame, so as to obtain a third optical flow value of the new keypoint.
In an embodiment, the step of performing homography detection on the new keypoint in the previous frame may specifically include: and judging whether the new key point falls into the geometric surface of the geometric area or not by adopting a homography judgment model constructed by the homography matrix of the geometric area.
When the geometric area of the target key frame is obtained, the homography judgment models of all the geometric areas are already constructed, so that when the homography detection is carried out on the new key point in the previous frame, the homography judgment models constructed by the homography matrixes of the corresponding geometric areas can be directly adopted to judge whether the new key point falls into the geometric surface where the geometric area is located.
In one embodiment, S208 may specifically include: when determining that key points of other video frames in the video to be detected respectively fall into the geometric surfaces where the geometric areas are located, the server calculates the sizes of the geometric areas in the corresponding video frames respectively; or, calculating the total size of the geometric area in the corresponding video frame; and when the size or the total size meets the preset size condition, determining the geometric area meeting the preset size condition as the information popularization area.
In one embodiment, after S208, the method further comprises: the server determines the number of areas of different information popularization areas in the video to be detected; determining the time length of each information popularization area appearing in the video to be detected; and outputting the number of the areas and the corresponding duration.
The number of the regions is the number of the information popularization regions in different geometric planes. Outputting the number of the regions and the corresponding duration, wherein the number of the outputted regions can prompt an information popularizing party how many information popularizing regions can be used for fusing the popularizing information so as to popularize the information; the corresponding time length of the output can prompt the information promotion party that how long the information promotion region has can be used for information promotion, so that the problem that the promotion effect is poor due to the fact that the too short information promotion region is selected to fuse the promotion information is avoided.
In one embodiment, the method further comprises: and the server determines the value of the information popularization area according to the size and the duration of the information popularization area in the corresponding video frame. Wherein, when the size is large enough and the duration is long enough, it is more valuable to represent the information promotion area.
In one embodiment, when the information promotion area of the target object in each video frame of the video to be detected is obtained, each information promotion area is marked, then the marked video to be detected is stored, and simultaneously the marked video to be detected can be sent to the terminal, so that the information promotion party can select a proper information promotion area from the marked information promotion area to fuse promotion information, and the effect of information promotion is achieved.
In the above embodiment, the target video frame in the video to be detected is selected, the geometric area of the target object in the target video frame can be determined through the key point of the target video frame, whether the key point of other video frames falls into the geometric plane where the geometric area is located is judged, and when the key point of other video frames falls into the geometric plane where the geometric area is located, the geometric area meeting the preset size condition is determined as the information promotion area, so that the video frame including the information promotion area with enough time length can be obtained without manual operation, the situation that the geometric area which can be used for promoting information in the video can be selected only by watching the video is avoided, the selection time of the information promotion area is reduced, and the selection efficiency of the information promotion area is improved. In addition, the geometric area meeting the preset size condition is determined as the information popularization area, and the obtained information popularization area can be effectively ensured to have sufficient application value.
In an embodiment, as shown in fig. 3a, an information fusion method is provided, which is described by taking the method as an example applied to the terminal in fig. 1, and includes the following steps:
s302, acquiring the video to be detected containing the information popularization area.
Wherein, the information promotion region may refer to a region for fusing promotion information so as to perform information promotion.
In one embodiment, after the server performs area detection on the video to be detected, the server marks each information promotion area, and then directly sends the marked video to be detected to the terminal, so that the video to be detected acquired by the terminal includes the marked information promotion area. Or the server stores the marked video to be detected, and when the terminal needs the video to be detected, the video to be detected is downloaded from the server. When the information popularization area is marked, rectangular frames with different colors or frames with other shapes can be used for framing.
For the detection process of the information popularization area, reference may be made to S202 to S208 in the above-described embodiment.
S304, selecting a corresponding target information promotion area when the value meets a preset value condition from the information promotion areas.
In one embodiment, the value is a value of the information popularization area, and is determined according to a size of the information popularization area in the corresponding video frame and a time length of the information popularization area appearing in the video to be detected.
For S304, the terminal can directly and automatically select the target information promotion area according to the value so as to be used for fusing promotion information. In addition, the terminal can select and select the target information popularization area according to the selection instruction of the user.
In one embodiment, S304 may specifically include: the terminal plays the video to be detected; when the video frame containing the information promotion area is played, the playing is paused; and selecting a corresponding target information promotion area when the value meets a preset value condition according to the input selection instruction from the information promotion area.
In one embodiment, the terminal starts a client, the client can play the video to be detected and can display the marked information promotion area, and the information promotion party can select the target information promotion area in a playing interface of the client in a touch mode.
And S306, acquiring popularization information for augmented reality.
Among them, Augmented Reality (AR), a technology of applying virtual information to the real world, superimposes real objects and virtual objects on the same screen or space in real time and exists at the same time.
The promotion information can be virtual information used for enhancing reality of the video to be detected and promoting the target product, and can be stored in the form of character information, image information and combination thereof.
In one embodiment, the terminal acquires promotion information for augmented reality from the information base according to the input information acquisition instruction. Or the terminal determines the size and the shape of the target information popularization area, and selects matched popularization information according to the size and the shape of the target information popularization area so as to be fused into the video frame without adjusting the popularization information.
And S308, inserting the popularization information into each video frame containing the target information popularization area.
In one embodiment, the terminal detects the corner of the target information popularization area, acquires the position of the corner, and then inserts the popularization information into each video frame containing the target information popularization area according to the position of the corner.
In one embodiment, the terminal may also perform typesetting or adjustment on the popularization information according to the shape of the target information popularization area, so that the popularization information may fit the target information popularization area.
For example, if the popularization information is the popularization information of a graph (namely, a graph popularization identifier), the terminal calculates the position and the shape of a target information popularization area in the video to be detected, the shape of the popularization information is adjusted according to the shape of the target information popularization area, then the virtual popularization information is fused according to the position of the target information popularization area in the video to be detected, so that the audience can receive the popularization information unconsciously when watching the video to be detected, and the user experience is not influenced by the popularization information.
And S310, outputting the video to be detected with the popularization information inserted.
In one embodiment, the terminal transmits the video to be detected with the inserted promotion information to the front end for playing. Or the terminal outputs the video to be detected with the popularization information inserted to other equipment.
In the above embodiment, since the acquired video to be detected already includes the information promotion region, the target information promotion region corresponding to the case where the value satisfies the preset value condition can be directly selected, and then the promotion information for augmented reality is inserted into each video frame of the target information promotion region, so that the geometric region for promoting the information can be selected without watching the complete video when the promotion information is fused, the time for watching the complete video and the time for manually selecting the geometric region are avoided, and the selection efficiency of the information promotion region and the efficiency for fusing the promotion information are effectively improved.
In one embodiment, as shown in fig. 3b, fig. 3b is a timing diagram of detecting regions in a video and fusing information in one embodiment, and the corresponding steps include:
s402, the server obtains the video to be detected.
S404, the server detects key points of the appointed video frame in the video to be detected.
S406, the server performs optical flow tracking on the detected key points in the video to be detected to obtain a first optical flow value between video frames in the video to be detected.
And S408, when the first optical flow value in each video frame reaches the optical flow threshold value, taking the video frame corresponding to the optical flow threshold value as the target video frame.
S410, the server detects the homography of the key points of the target video frame.
In one embodiment, the server determines a homography matrix; constructing a homography judgment model according to the homography matrix; and judging whether the key points of the target video frame are on the same plane or not through the homography judgment model.
S412, when the target key points which meet the homography condition and the number of the key points which meet the homography condition reaches a number threshold exist in the key points of the target video frame, the server extracts the geometric area where the target key points are located.
Wherein, the non-key frame in front of the target video frame in the video to be detected is a static frame; the first optical-flow value between stationary frames is less than the optical-flow threshold.
And S414, the server judges whether the key points of the static frame are on the same plane or not through the homography judgment model.
S416, when the key points of the static frame are on the same plane, the server judges that the key points of the static frame fall into the geometric plane where the geometric area is located.
S418, the server performs optical flow tracking on the key points falling into the geometric area in the target video frame to obtain a second optical flow value of the geometric area.
S420, when the second optical flow value reaches the optical flow threshold value, the server takes the corresponding video frame reaching the optical flow threshold value as a key frame of the geometric area.
S422, the server detects a new key point in the previous frame of the key frame.
S424, the server performs homography detection on the new key points in the previous frame.
In one embodiment, prior to S424, the method further comprises: the server carries out optical flow tracking on the new key point of the previous frame to obtain a third optical flow value of the new key point; and when the third optical flow value of the new key point reaches the optical flow threshold value, executing the step of performing homography detection on the new key point in the last frame.
In one embodiment, S424 may specifically include: and the server judges whether the new key point falls into the geometric surface of the geometric area by adopting a homography judgment model constructed by the homography matrix of the geometric area.
S426, when the new keypoint in the previous frame satisfies the homography condition, the server determines the geometric surface where the new keypoint falls into the geometric region.
S428, the server determines the geometric area meeting the preset size condition as the information popularization area.
In one embodiment, S428 may specifically include: when determining that key points of other video frames in the video to be detected respectively fall into the geometric surfaces where the geometric areas are located, the server calculates the sizes of the geometric areas in the corresponding video frames respectively; or, calculating the total size of the geometric area in the corresponding video frame; and when the size or the total size meets the preset size condition, determining the geometric area meeting the preset size condition as the information popularization area.
In one embodiment, the server determines the number of areas of different information promotion areas in the video to be detected; determining the time length of each information popularization area appearing in the video to be detected; and outputting the number of the areas and the corresponding duration.
In one embodiment, the value of the information promotion area is determined according to the size and duration of the information promotion area in the corresponding video frame.
And S430, the terminal receives the video to be detected which is sent by the server and contains the information popularization area.
And S432, the terminal plays the video to be detected.
And S434, when the video frame containing the information popularization area is played, the terminal suspends the playing.
And S436, selecting a corresponding target information popularization area when the value meets a preset value condition according to the input selection instruction from the information popularization area by the terminal.
And S538, the terminal acquires popularization information for augmented reality.
And S540, the terminal inserts the popularization information into each video frame containing the target information popularization area.
And S542, the terminal outputs the video to be detected with the popularization information inserted.
As an example, the method for detecting the area in the video and the method for fusing the information are applied to the detection of the advertisement area and the scene of advertisement insertion. The steps of the method for detecting regions in a video can be divided into two stages, namely initialization and multi-plane tracking, as shown in fig. 4. The first frame of the video starting motion can be found in the initialization stage, and then multi-plane tracking is started, and the initialization stage and the multi-plane tracking stage are explained as follows:
(1) initialization phase
The video to be detected (namely the video to be detected) is selected, and after the first frame is input, all key points in the first frame are detected by using the SIFT algorithm. Then, when the second frame is input, the key points detected by the first frame are tracked by using a KLT light angle point tracking algorithm. If the optical flow values of the two frames of the first frame and the second frame are calculated to be large enough, the plane extraction in the second frame is started, otherwise, the input of a third frame is waited, and then optical flow tracking is carried out in the third frame so as to track the key point in the third frame until the optical flow values of the two frames of the second frame and the third frame are calculated to be large enough. Wherein, the first frame refers to the first video frame in the video.
Assuming that the video has n frames in total, when the optical flow values of the i-1 th frame and the i-th frame are large enough, the RANSAC algorithm is adopted to iteratively detect the plane in the i-th frame and the homography thereof. In the process of detecting the homography, the RANSAC algorithm is adopted to detect the homography from the key points which are not detected, the key points which meet the homography condition are removed from all the key points, and then the next cycle is started until the number of the remaining key points is less than the threshold value. Wherein i is a positive integer greater than 1 and less than n, and n is a positive integer greater than 1.
When more than one plane is detected, the stationary frames before the ith frame in the video are traced back, and the optical flow is traced again for each plane, so that more frames and key points are added.
(2) Multi-plane tracking phase
And after the initialization is completed, entering a multi-plane tracking stage. After inputting a new video frame, firstly, optical flow tracking is carried out on the key points added into the plane and the key points are key points in the previous frame of the new video frame, and an optical flow value is calculated. If the optical flow value is greater than or equal to the optical flow threshold value, the new video frame is a key frame for the plane, and the new video frame indicates that the plane has large motion; if the optical flow value is less than the optical flow threshold, then the new video frame is not a key frame for the plane.
Then, detecting new key points on the previous frame of the new video frame by using a SIFT algorithm, tracking the optical flow of the previous frame by using a KLT corner point tracking algorithm, if the previous frame is taken as a key frame for a certain plane, calculating the position error of the new key points by adopting the homography of the plane, if the error is smaller than a threshold value, indicating that the detected new key points are on the plane, and adding the key points into the plane, thereby tracking a plurality of planes. As shown in fig. 5, for a video containing buildings, five planes a-E can be tracked in the building for each video frame.
After all planes taking new video frames as key frames are expanded, extracting planes from the remaining key points which do not belong to any planes, iteratively detecting the planes and the homographies thereof by using a RANSAC algorithm, namely detecting the homographies from the key points which are not detected each time by using the RANSAC algorithm, removing the key points which meet the homography condition from all the key points, and then starting the next cycle until the number of the remaining key points is less than a threshold value.
After the above process is finished, as long as a plane is tracked, the frame waits for the input of a new video frame even if the tracking is successful, and enters the cycle of the next frame. If the plane is not tracked, which indicates that the tracking fails, the video may be directly switched to the shot, and at this time, the initialization process needs to be executed again to perform the tracking again.
After the tracking of the planes in the video is completed, statistics are performed on the traceable planes, as described in (3):
(3) statistically trackable planes
After the whole video is analyzed according to the algorithm, the area of each plane in each frame is counted. When the total area of a plane in all frames exceeds a threshold, it indicates that the plane can be used as an advertisement area for placing advertisements. Finally, how many advertising areas of the video are available for advertising in total and how much time the advertising areas are available for advertising in the video may be output.
In addition, the advertisement area in the video may also be labeled, for example, a frame with a dashed frame, such as the dashed frame shown in fig. 6 (a); or, labeling by using a frame with a corresponding color. And then storing the marked video or outputting the marked video to a front end. Wherein, the advertisement area marked in fig. 6(a) is a computer display screen.
(4) Tracking posted advertisements
After analyzing the trackable advertising areas, one of the labeled advertising areas can be selected for placement of an advertisement. In a first frame for preparing advertisement pasting, four corner points of the advertisement are designated, and then four vertexes of the advertisement are tracked by utilizing the previously analyzed homography of the plane, so that the advertisement is stably pasted in a video background, as shown in fig. 6(b), the advertisement is pasted on a computer display screen in fig. 6(b), the advertisement is prevented from being displayed on a video picture in a popup window or full screen display mode, a user can accept the advertisement mode more easily, and the user experience is improved.
(5) Implementing logic and processing flows
As shown in fig. 7, the user inputs a video to be advertised into the present algorithm. After the algorithm analyzes the video, the analyzed advertisement region which can be pasted is played to the user frame by frame, when the advertisement region of a certain frame can be pasted with the advertisement, the playing is suspended, the advertisement region to be tracked is selected in the frame, and four points are clicked to designate the position of the advertisement. Then, the algorithm automatically pastes the advertisement in the rest frames, and finally the video pasted with the advertisement is output.
Through the embodiment, the following beneficial effects can be brought:
(1) the user can be guided to select a reliable tracking plane (namely an advertisement area), and the effect of attaching advertisements to the video is improved;
(2) a plane with little texture but slow movement can be automatically analyzed, so that advertisements can be pasted on the plane;
(3) the advertisement value of the video can be automatically analyzed, and the video with high value can be automatically selected to be pasted into the advertisement.
It should be understood that although the various steps in the flowcharts of fig. 2, 3a, 3b are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in fig. 2, 3a, 3b may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
In one embodiment, as shown in fig. 8, there is provided an apparatus for detecting a region in a video, where the apparatus may be a part of a computer device using a software module or a hardware module, or a combination of the two modules, and the apparatus specifically includes: an obtaining module 802, a selecting module 804, a first determining module 806, and a second determining module 808, wherein:
an obtaining module 802, configured to obtain a video to be detected;
a selecting module 804, configured to select a target video frame in a video to be detected;
a first determining module 806, configured to determine a geometric area of a target object in a target video frame based on a key point of the target video frame;
and a second determining module 808, configured to determine, when it is determined that the key points of other video frames in the video to be detected respectively fall into the geometric plane where the geometric area is located, the geometric area meeting the preset size condition as the information popularization area.
In an embodiment, the selecting module 804 is further configured to calculate a first optical flow value between video frames in the video to be detected; and when the first optical flow value in each video frame reaches the optical flow threshold value, taking the video frame corresponding to the optical flow threshold value as a target video frame.
In one embodiment, the selecting module 804 is further configured to detect a key point of a specified video frame in the video to be detected; and carrying out optical flow tracking on the detected key points in the video to be detected to obtain a first optical flow value between video frames in the video to be detected.
In one embodiment, the first determining module 806 is further configured to perform homography detection on the key points of the target video frame; and when target key points which meet the homography condition and the number of the key points which meet the homography condition reaches a number threshold exist in the key points of the target video frame, extracting the geometric area where the target key points are located.
In one embodiment, the first determining module 806 is further configured to determine a homography matrix; constructing a homography judgment model according to the homography matrix; and judging whether the key points of the target video frame are on the same plane or not through the homography judgment model.
In one embodiment, a non-key frame in the video to be detected before the target video frame is a still frame; the first optical-flow value between stationary frames is less than the optical-flow threshold.
In one embodiment, as shown in fig. 9, the apparatus further comprises: a decision block 810; wherein the content of the first and second substances,
the judging module 810 is configured to judge whether the key points of the static frame are on the same plane through the homography judging model; when the key points of the static frame are in the same plane, the key points of the static frame are judged to fall into the geometric plane where the geometric area is located.
In one embodiment, as shown in fig. 9, the apparatus further comprises: tracking module 812, detection module 814; wherein the content of the first and second substances,
a tracking module 812, configured to perform optical flow tracking on a key point falling into a geometric area in a target video frame, to obtain a second optical flow value of the geometric area;
the first determining module 806 is further configured to, when the second optical flow value reaches the optical flow threshold, take the video frame corresponding to the optical flow threshold as a key frame of the geometric area;
a detecting module 814, configured to detect a new keypoint in a previous frame of the keyframes; carrying out homography detection on the new key points in the previous frame;
the first determining module 806 is further configured to determine that the new keypoint falls into the geometric plane in which the geometric region is located when the new keypoint in the previous frame satisfies the homography condition.
In one embodiment, the tracking module 812 is further configured to perform optical flow tracking on the new keypoint of the previous frame to obtain a third optical flow value of the new keypoint; when the third optical flow value of the new keypoint reaches the optical flow threshold, a step of homography detection of the new keypoint in the previous frame is performed by the detection module 814.
In an embodiment, the detecting module 814 is further configured to determine whether the new keypoint falls into a geometric plane in which the geometric region is located, by using a homography determination model constructed by the homography matrix of the geometric region.
In an embodiment, the second determining module 808 is further configured to, when it is determined that the key points of other video frames in the video to be detected respectively fall into the geometric surfaces where the geometric areas are located, calculate the sizes of the geometric areas in the corresponding video frames respectively; or, calculating the total size of the geometric area in the corresponding video frame; and when the size or the total size meets the preset size condition, determining the geometric area meeting the preset size condition as the information popularization area.
In one embodiment, as shown in fig. 9, the apparatus further comprises: an output module 816; wherein:
the first determining module 806 is further configured to determine the number of areas of different information promotion areas in the video to be detected; determining the time length of each information popularization area appearing in the video to be detected;
and an output module 816, configured to output the number of the regions and the corresponding duration.
In one embodiment, the first determining module 806 is further configured to determine a value of the information popularization area according to a size and a duration of the information popularization area in the corresponding video frame.
In the above embodiment, the target video frame in the video to be detected is selected, the geometric area of the target object in the target video frame can be determined through the key point of the target video frame, whether the key point of other video frames falls into the geometric plane where the geometric area is located is judged, and when the key point of other video frames falls into the geometric plane where the geometric area is located, the geometric area meeting the preset size condition is determined as the information promotion area, so that the video frame including the information promotion area with enough time length can be obtained without manual operation, the situation that the geometric area which can be used for promoting information in the video can be selected only by watching the video is avoided, the selection time of the information promotion area is reduced, and the selection efficiency of the information promotion area is improved. In addition, the geometric area meeting the preset size condition is determined as the information popularization area, and the obtained information popularization area can be effectively ensured to have sufficient application value.
For specific definition of the detection device for the region in the video, reference may be made to the above definition of the detection method for the region in the video, and details are not described here. The modules in the device for detecting the area in the video can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, as shown in fig. 10, there is provided an information fusion apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, and specifically includes: a first obtaining module 1002, a selecting module 1004, a second obtaining module 1006, an inserting module 1008, and an outputting module 1010, wherein:
a first obtaining module 1002, configured to obtain a to-be-detected video including an information popularization area;
a selecting module 1004, configured to select, from the information promotion areas, a corresponding target information promotion area when the value satisfies a preset value condition;
a second obtaining module 1006, configured to obtain popularization information for augmented reality;
an inserting module 1008, configured to insert the popularization information into each video frame including the target information popularization area;
and the output module 1010 is used for outputting the video to be detected, into which the promotion information is inserted.
In one embodiment, the value is a value of the information promotion area, and is determined according to a size of the information promotion area in a corresponding video frame and a time length of the information promotion area appearing in the video to be detected;
the selecting module 1004 is further configured to play the video to be detected; when the video frame containing the information promotion area is played, the playing is paused; and selecting a corresponding target information promotion area when the value meets a preset value condition according to an input selection instruction from the information promotion area.
In the above embodiment, since the acquired video to be detected already includes the information promotion region, the target information promotion region corresponding to the case where the value satisfies the preset value condition can be directly selected, and then the promotion information for augmented reality is inserted into each video frame of the target information promotion region, so that the geometric region for promoting the information can be selected without watching the complete video when the promotion information is fused, the time for watching the complete video and the time for manually selecting the geometric region are avoided, and the selection efficiency of the information promotion region and the efficiency for fusing the promotion information are effectively improved.
For specific limitations of the information fusion device, reference may be made to the above limitations of the information fusion method, which are not described herein again. The various modules in the information fusion device can be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing video data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of detecting regions in a video.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 12. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an information fusion method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 12 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, there is further provided a computer device including a memory and a processor, where the memory stores a computer program, and the processor implements the steps in the above embodiment of the method for detecting a region in a video when executing the computer program.
In one embodiment, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the steps in the above-described method embodiment of detecting a region in a video.
In an embodiment, there is further provided a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps in the above-mentioned information fusion method embodiment when executing the computer program.
In one embodiment, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the steps in the above-described information fusion method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (15)

1. A method for detecting regions in a video, the method comprising:
acquiring a video to be detected;
selecting a target video frame in the video to be detected;
determining a geometric region of a target object in the target video frame based on the key points of the target video frame;
and when determining that the key points of other video frames in the video to be detected respectively fall into the geometric surface where the geometric area is located, determining the geometric area meeting the preset size condition as an information promotion area.
2. The method according to claim 1, wherein the selecting the target video frame in the video to be detected comprises:
calculating a first optical flow value between video frames in the video to be detected;
and when the first optical flow value in each video frame reaches the optical flow threshold value, taking the video frame corresponding to the optical flow threshold value as a target video frame.
3. The method according to claim 2, wherein said calculating a first optical flow value between video frames in said video to be detected comprises:
detecting key points of a designated video frame in the video to be detected;
and carrying out optical flow tracking on the detected key points in the video to be detected to obtain a first optical flow value between video frames in the video to be detected.
4. The method of claim 1, wherein the determining a geometric region of a target object in the target video frame based on the keypoints of the target video frame comprises:
performing homography detection on key points of the target video frame;
and when target key points which meet the homography condition and the number of the key points which meet the homography condition reaches a number threshold exist in the key points of the target video frame, extracting the geometric area where the target key points are located.
5. The method of claim 4, wherein the homography detection of the key points of the target video frame comprises:
determining a homography matrix;
constructing a homography judgment model according to the homography matrix;
and judging whether the key points of the target video frame are on the same plane or not through the homography judgment model.
6. The method according to claim 5, wherein the non-key frames in the video to be detected before the target video frame are still frames; a first optical flow value between the stationary frames is less than an optical flow threshold; the method further comprises the following steps:
judging whether the key points of the static frame are on the same plane or not through the homography judgment model;
and when the key points of the static frame are in the same plane, judging that the key points of the static frame fall into the geometric plane where the geometric area is located.
7. The method of any of claims 1 to 6, further comprising:
performing optical flow tracking on key points falling into the geometric area in the target video frame to obtain a second optical flow value of the geometric area;
when the second optical flow value reaches an optical flow threshold value, taking a video frame corresponding to the optical flow threshold value as a key frame of the geometric area;
detecting a new keypoint in a frame previous to the keyframe;
carrying out homography detection on the new key points in the previous frame;
and when the new key point in the previous frame meets the homography condition, determining the geometric surface where the new key point falls into the geometric area.
8. The method according to any one of claims 1 to 6, wherein when it is determined that the key points of other video frames in the video to be detected respectively fall into the geometric surface where the geometric area is located, determining the geometric area satisfying a preset size condition as an information promotion area comprises:
when the key points of other video frames in the video to be detected are determined to fall into the geometric surfaces where the geometric areas are located respectively, calculating the sizes of the geometric areas in the corresponding video frames respectively; alternatively, the first and second electrodes may be,
calculating the total size of the geometric area in the corresponding video frame;
and when the size or the total size meets a preset size condition, determining the geometric area meeting the preset size condition as an information promotion area.
9. The method according to any one of claims 1 to 6, wherein after determining the geometric area satisfying the preset size condition as the information promotion area, the method further comprises:
determining the number of areas of different information promotion areas in the video to be detected;
determining the time length of each information popularization area appearing in the video to be detected;
and outputting the number of the areas and the corresponding duration.
10. The method of claim 9, further comprising:
and determining the value of the information popularization area according to the size of the information popularization area in the corresponding video frame and the duration.
11. An information fusion method, characterized in that the method comprises:
acquiring a video to be detected containing an information popularization area;
selecting a corresponding target information promotion area when the value meets a preset value condition from the information promotion area;
acquiring popularization information for augmented reality;
inserting the popularization information into each video frame containing the target information popularization area;
and outputting the video to be detected with the inserted promotion information.
12. An apparatus for detecting regions in a video, the apparatus comprising:
the acquisition module is used for acquiring a video to be detected;
the selection module is used for selecting a target video frame in the video to be detected;
the first determination module is used for determining a geometric area of a target object in the target video frame based on key points of the target video frame;
and the second determining module is used for determining the geometric area meeting the preset size condition as the information promotion area when determining that the key points of other video frames in the video to be detected respectively fall into the geometric surface where the geometric area is located.
13. An information fusion apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for acquiring a to-be-detected video containing an information popularization area;
the selection module is used for selecting a corresponding target information promotion area when the value meets a preset value condition from the information promotion area;
the second acquisition module is used for acquiring popularization information for augmented reality;
the inserting module is used for inserting the popularization information into each video frame containing the target information popularization area;
and the output module is used for outputting the video to be detected inserted with the promotion information.
14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 11 when executing the computer program.
15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 11.
CN202010447859.2A 2020-05-25 2020-05-25 Method for detecting region in video, method for information fusion, device and storage medium Active CN111556338B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010447859.2A CN111556338B (en) 2020-05-25 2020-05-25 Method for detecting region in video, method for information fusion, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010447859.2A CN111556338B (en) 2020-05-25 2020-05-25 Method for detecting region in video, method for information fusion, device and storage medium

Publications (2)

Publication Number Publication Date
CN111556338A true CN111556338A (en) 2020-08-18
CN111556338B CN111556338B (en) 2023-10-31

Family

ID=72002107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010447859.2A Active CN111556338B (en) 2020-05-25 2020-05-25 Method for detecting region in video, method for information fusion, device and storage medium

Country Status (1)

Country Link
CN (1) CN111556338B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113259713A (en) * 2021-04-23 2021-08-13 深圳信息职业技术学院 Video processing method and device, terminal equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010025404A (en) * 2000-12-22 2001-04-06 유명현 System and Method for Virtual Advertisement Insertion Using Camera Motion Analysis
JP2010015292A (en) * 2008-07-02 2010-01-21 Yahoo Japan Corp Emphasis display addition method, display control program and server
CN101641873A (en) * 2007-03-22 2010-02-03 美国索尼电脑娱乐公司 Be used for determining advertisement and the position of other inserts and the scheme of sequential of medium
US20130340000A1 (en) * 2012-06-19 2013-12-19 Wistron Corporation Method, apparatus and system for bitstream editing and storage
CN105472434A (en) * 2014-11-17 2016-04-06 Tcl集团股份有限公司 Method and system for embedding content in video demonstration
CN109461174A (en) * 2018-10-25 2019-03-12 北京陌上花科技有限公司 Video object area tracking method and video plane advertisement method for implantation and system
CN109741245A (en) * 2018-12-28 2019-05-10 杭州睿琪软件有限公司 The insertion method and device of plane information
CN110121034A (en) * 2019-05-09 2019-08-13 腾讯科技(深圳)有限公司 A kind of method, apparatus and storage medium being implanted into information in video
CN110225366A (en) * 2019-06-26 2019-09-10 腾讯科技(深圳)有限公司 Video data processing and advertisement position determine method, apparatus, medium and electronic equipment
CN111314626A (en) * 2020-02-24 2020-06-19 北京字节跳动网络技术有限公司 Method and apparatus for processing video

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010025404A (en) * 2000-12-22 2001-04-06 유명현 System and Method for Virtual Advertisement Insertion Using Camera Motion Analysis
CN101641873A (en) * 2007-03-22 2010-02-03 美国索尼电脑娱乐公司 Be used for determining advertisement and the position of other inserts and the scheme of sequential of medium
JP2010015292A (en) * 2008-07-02 2010-01-21 Yahoo Japan Corp Emphasis display addition method, display control program and server
US20130340000A1 (en) * 2012-06-19 2013-12-19 Wistron Corporation Method, apparatus and system for bitstream editing and storage
CN105472434A (en) * 2014-11-17 2016-04-06 Tcl集团股份有限公司 Method and system for embedding content in video demonstration
CN109461174A (en) * 2018-10-25 2019-03-12 北京陌上花科技有限公司 Video object area tracking method and video plane advertisement method for implantation and system
CN109741245A (en) * 2018-12-28 2019-05-10 杭州睿琪软件有限公司 The insertion method and device of plane information
CN110121034A (en) * 2019-05-09 2019-08-13 腾讯科技(深圳)有限公司 A kind of method, apparatus and storage medium being implanted into information in video
CN110225366A (en) * 2019-06-26 2019-09-10 腾讯科技(深圳)有限公司 Video data processing and advertisement position determine method, apparatus, medium and electronic equipment
CN111314626A (en) * 2020-02-24 2020-06-19 北京字节跳动网络技术有限公司 Method and apparatus for processing video

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113259713A (en) * 2021-04-23 2021-08-13 深圳信息职业技术学院 Video processing method and device, terminal equipment and storage medium

Also Published As

Publication number Publication date
CN111556338B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
CN109727303B (en) Video display method, system, computer equipment, storage medium and terminal
US9384588B2 (en) Video playing method and system based on augmented reality technology and mobile terminal
CN110012209B (en) Panoramic image generation method and device, storage medium and electronic equipment
US10810430B2 (en) Augmented reality with markerless, context-aware object tracking
KR20180111970A (en) Method and device for displaying target target
Zhi et al. Toward dynamic image mosaic generation with robustness to parallax
JP2009505553A (en) System and method for managing the insertion of visual effects into a video stream
CN103299610A (en) Method and apparatus for video insertion
CN112927349B (en) Three-dimensional virtual special effect generation method and device, computer equipment and storage medium
CN112637665B (en) Display method and device in augmented reality scene, electronic equipment and storage medium
KR20190044814A (en) Generate training data for deep learning
US9426514B1 (en) Graphic reference matrix for virtual insertions
CN113965773A (en) Live broadcast display method and device, storage medium and electronic equipment
CN112954443A (en) Panoramic video playing method and device, computer equipment and storage medium
CN112308977A (en) Video processing method, video processing apparatus, and storage medium
CN111556338B (en) Method for detecting region in video, method for information fusion, device and storage medium
CN112288877A (en) Video playing method and device, electronic equipment and storage medium
US11436788B2 (en) File generation apparatus, image generation apparatus, file generation method, and storage medium
CN110599525A (en) Image compensation method and apparatus, storage medium, and electronic apparatus
CN113486941B (en) Live image training sample generation method, model training method and electronic equipment
CN111625101B (en) Display control method and device
Liu et al. Image completion based on views of large displacement
KR101399633B1 (en) Method and apparatus of composing videos
Kim et al. Vision-based all-in-one solution for augmented reality and its storytelling applications
Liang et al. Video2Cartoon: Generating 3D cartoon from broadcast soccer video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40027314

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant