CN116308530A - Advertisement implantation method, advertisement implantation device, advertisement implantation equipment and readable storage medium - Google Patents

Advertisement implantation method, advertisement implantation device, advertisement implantation equipment and readable storage medium Download PDF

Info

Publication number
CN116308530A
CN116308530A CN202310548091.1A CN202310548091A CN116308530A CN 116308530 A CN116308530 A CN 116308530A CN 202310548091 A CN202310548091 A CN 202310548091A CN 116308530 A CN116308530 A CN 116308530A
Authority
CN
China
Prior art keywords
target
advertisement
video
key frame
implanted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310548091.1A
Other languages
Chinese (zh)
Inventor
刘聪
杨松
杨波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Feihu Information Technology Tianjin Co Ltd
Original Assignee
Feihu Information Technology Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Feihu Information Technology Tianjin Co Ltd filed Critical Feihu Information Technology Tianjin Co Ltd
Priority to CN202310548091.1A priority Critical patent/CN116308530A/en
Publication of CN116308530A publication Critical patent/CN116308530A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0276Advertisement creation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses an advertisement implantation method, an advertisement implantation device, advertisement implantation equipment and a readable storage medium, wherein a depth image corresponding to each key frame is obtained by carrying out depth estimation processing on the key frame to be implanted in an original video; determining a target key frame and a target insertion position corresponding to the target key frame in the key frames based on the depth map; performing scene matching on the target key frame and the advertisement in the advertisement database to obtain a target implantation advertisement corresponding to the original video to be implanted; and inserting the target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video. Based on the method, the advertisement is implanted into the original video to be implanted, so that automation of advertisement implantation can be realized, the efficiency of advertisement implantation is improved, and the advertisement implantation is more natural and accurate.

Description

Advertisement implantation method, advertisement implantation device, advertisement implantation equipment and readable storage medium
Technical Field
The present application relates to the field of video image processing technology, and more particularly, to an advertisement implantation method, apparatus, device, and readable storage medium.
Background
Information transmission in the form of video on the internet has become a common form, wherein, advertisements are taken as important ways for users to acquire information and advertisers to promote popularity and publicize products, and the information transmission has become normal by utilizing a method of inserting advertisements into the video.
In the prior art, a clipper or a behind-the-scenes technician is usually required to detect an original video frame by frame through tools such as video editing software and the like in the original video, determine video points where advertisements are inserted and implant advertisements so as to achieve the effect of naturally blending the advertisements into the original video. However, in summary, in the process of implanting the advertisement, the video needs to be browsed manually frame by frame, the specific implantation position of the implanted advertisement is determined, and the operations such as editing are performed.
Disclosure of Invention
In view of the foregoing, the present application provides an advertisement implanting method, apparatus, device, and readable storage medium for solving the problem of low advertisement implanting efficiency and accuracy in the existing advertisement implanting manner.
In order to achieve the above object, the following solutions have been proposed:
An advertisement implantation method, comprising:
acquiring an original video to be implanted and an advertisement database;
performing depth estimation processing on the key frames in the original video to be implanted to obtain a depth map corresponding to each key frame, wherein the key frames are video frames meeting preset conditions in the original video to be implanted;
determining a target key frame and a target insertion position corresponding to the target key frame in the key frames based on the depth map;
performing scene matching on the target key frame and the advertisement in the advertisement database to obtain a target implantation advertisement corresponding to the original video to be implanted;
and inserting the target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video.
Optionally, the determining, based on the depth map, a target key frame and a target insertion position corresponding to the target key frame in the key frames may include:
calculating the spatial mapping relation between all video frames between every two adjacent key frames and the first key frame in the two adjacent key frames to obtain a first spatial transformation matrix corresponding to each key frame;
Performing object detection processing on each key frame to obtain a detection result corresponding to each key frame, wherein the detection result at least comprises an object and object characteristic information;
determining the key frames of which the detection results meet object detection conditions as candidate key frames;
carrying out plane area estimation on the candidate key frames based on the object information and the depth map corresponding to each candidate key frame to obtain an object plane area ratio corresponding to each candidate key frame;
determining the candidate key frames corresponding to the object plane area ratio reaching a preset threshold as target key frames;
determining the object position in the object feature information corresponding to each target key frame as a target detection point of each target key frame;
determining a target detection point of each video frame between each two adjacent target key frames based on the target detection point of a first one of the two adjacent target key frames and the first spatial transformation matrix;
and combining the target detection points of all the target key frames with the target detection points of the video frames to obtain target insertion positions corresponding to the original video to be implanted.
Optionally, the scene matching the target key frame with the advertisement in the advertisement database to obtain the target implant advertisement corresponding to the original video to be implanted may include:
performing scene recognition on each target key frame to obtain a scene corresponding to each target key frame and a scene prediction probability corresponding to each scene;
determining the scene corresponding to the scene prediction probability reaching a preset scene prediction threshold as an adaptive scene of the target key frame;
and matching the adaptation scene with the application scene of each advertisement in the advertisement database to obtain the target implanted advertisement corresponding to the original video to be implanted.
Optionally, the target implantation advertisement includes a single frame image, and the inserting the target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video may include:
performing perspective transformation on the single-frame image and the target insertion position to obtain a second space transformation matrix;
covering the single-frame image to each video frame in the target insertion position through the second space transformation matrix to obtain a fusion image corresponding to each video frame;
Carrying out gradient calculation on the fusion images to obtain gradient fields corresponding to each fusion image;
determining the divergence of each fusion image based on the gradient field corresponding to each fusion image;
generating a new pixel value corresponding to each of the fused images based on the divergence of each of the fused images;
and adjusting the pixel value of each fusion image to the new pixel value to obtain a target video.
Optionally, the target implantation advertisement includes a continuous multi-frame image, and the inserting the target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video may include:
extracting an image key frame from the continuous multi-frame image;
performing perspective transformation on the image key frame and the target insertion position to obtain a third space transformation matrix;
based on the third space transformation matrix, covering the continuous multi-frame images to the target insertion position to obtain a fusion image corresponding to each video frame in the target insertion position;
carrying out gradient calculation on the fusion images to obtain gradient fields corresponding to each fusion image;
Determining the divergence of each fusion image based on the gradient field corresponding to each fusion image;
generating a new pixel value for each of the fused images based on the divergence of each of the fused images;
and adjusting the pixel value of each fusion image to the new pixel value to obtain a target video.
Optionally, performing scene matching on the target key frame and the advertisement in the advertisement database to obtain a target implanted advertisement corresponding to the original video to be implanted, and further including:
carrying out image matting processing on the video clip where the target insertion position is located to obtain background matting and image matting;
determining whether the background matting is behind the portrait matting based on the background matting and the portrait matting;
if not, placing the image layer where the portrait matting is located in front of the image layer where the background matting is located, and inserting a target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video;
if yes, inserting a target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video.
Optionally, the performing depth estimation processing on the key frames in the original video to be implanted to obtain a depth map corresponding to each key frame may include:
calculating an interframe difference value between every two video frames in the original video to be implanted;
carrying out average calculation on all the interframe difference values to obtain average interframe difference intensity;
determining a target threshold based on the average inter-frame differential intensity;
determining the video frame corresponding to the inter-frame difference value equal to the target threshold as a key frame;
and carrying out depth estimation processing on each key frame to obtain a depth map corresponding to each key frame.
An advertising implanting device, comprising:
the data acquisition unit is used for acquiring the original video to be implanted and the advertisement database;
the key frame depth processing unit is used for carrying out depth estimation processing on the key frames in the original video to be implanted to obtain a depth map corresponding to each key frame, wherein the key frames are video frames meeting preset conditions in the original video to be implanted;
a target insertion position determining unit, configured to determine a target key frame and a target insertion position corresponding to the target key frame in the key frames based on the depth map;
The scene matching unit is used for performing scene matching on the target key frame and the advertisement in the advertisement database to obtain a target implanted advertisement corresponding to the original video to be implanted;
and the advertisement implantation unit is used for inserting the target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video.
An advertisement implanting apparatus comprising: a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of any one of the advertisement implantation methods described above.
A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the advertisement insertion methods described above.
As can be seen from the above technical solution, according to the advertisement implantation method, apparatus, device and readable storage medium provided in the embodiments of the present application, target detection processing can be performed according to a depth map of a key frame in an original video to be implanted, during the target detection processing, a key frame is screened to obtain a target key frame with a higher utilization rate of an implanted advertisement, and a target insertion position capable of implementing an implanted advertisement in the original video to be implanted can be determined based on the target key frame. Meanwhile, through scene recognition on the target key frames, advertisements which are more relevant to the original video content to be implanted can be found from an advertisement database, and the adaptation degree between the advertisements and the original video to be implanted is improved. Based on the method, the advertisement is implanted into the original video to be implanted, so that automation of advertisement implantation can be realized, the efficiency of advertisement implantation is improved, and the advertisement implantation is more natural and accurate.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
FIG. 1 is a schematic flow chart of an alternative implementation of the advertisement implantation method according to the embodiment of the present application;
FIG. 2 is an exemplary diagram of a keyframe provided in an embodiment of the present application;
fig. 3 is an exemplary diagram of an original video to be implanted according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an alternative embodiment of an advertisement insertion apparatus according to the present application;
fig. 5 is a schematic structural diagram of an alternative advertisement implanting apparatus according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
According to the scheme, when a user needs to insert advertisements to any original video, the advertisements with scene adaptation can be conveniently and rapidly inserted into the adaptive insertion positions of the original video for the user, so that the suitability of advertisement implantation is guaranteed, and the efficiency of advertisement implantation is improved.
In the embodiment of the present application, the above advertisement implantation method may be used in a server, a terminal, a mobile terminal, etc. for performing advertisement implantation on an original video to be implanted, which is input by a user. Referring to fig. 1, an optional flowchart for implementing the advertisement implantation method according to an embodiment of the present application is shown, where the flowchart may include:
step S110, acquiring an original video to be implanted and an advertisement database.
The original video to be implanted can be obtained from a user terminal, the user terminal can be a mobile terminal of a mobile phone or a data interaction port of a computer user terminal and the like, a user can upload the original video to be implanted with advertisements from the user terminal, and the port of a service device carrying the advertisement implantation method receives the original video sent from the user terminal.
The advertisement database can store a large amount of advertisement materials, and the advertisement materials can be advertisement pictures presented in a single-frame image mode or advertisement videos in a multi-frame image mode. Advertisements can be selected in the advertisement database and implanted into the original video to be implanted. But for some marketing reasons, the original video to be implanted has a specific advertisement that is allowed to be implanted into the original video to be implanted, and then a specific advertisement database can be created separately from the specific advertisement, so that the advertisement is selected and implanted from the specific advertisement in the specific advertisement database.
And step S120, performing depth estimation processing on the key frames in the original video to be implanted to obtain a depth map corresponding to each key frame.
The original video to be implanted is essentially a sequence of images of successive multiframes, each image being a video frame, and each video frame being at very short intervals. In order to solve the information redundancy, the video frames in the sequence only reflect the local changes of the images, and a series of complete images are arranged at key positions in the sequence, wherein the video frames in which the images are arranged are key frames. In the embodiment of the application, the key frames capable of reflecting the complete image can be extracted first, and the video frames between every two adjacent key frames reflecting the local change can be correspondingly processed according to the subsequent processing procedures of the key frames. In addition, since the key frames are extracted from the video, all the extracted key frames should be arranged in the order in which the video is played.
The key frames in the original video to be implanted are extracted, the judgment can be performed according to the information amount carried by each video frame, and when the information amount of a certain video frame is the maximum value in a certain video frame area, the video frame can be determined to be the key frame of the area. Or, a difference value between every two adjacent video frames in the original video to be implanted can be calculated, and when the difference value meets a certain set threshold condition, the video frame corresponding to the difference value meeting the condition is determined to be a key frame. Specifically, the method for determining the key frame of the original video to be implanted is not unique.
Further, the depth estimation process is performed on the key frame, the depth of an RGB image (an image displayed in an RGB color mode) of the key frame can be estimated, a depth map corresponding to the key frame is output, and each pixel value in the depth map can represent the distance from an object in the key frame image to a camera. When the advertisement is inserted and fused conveniently, the pixel value of the advertisement can be adjusted according to the pixel value of the depth image, so that the distance between the advertisement and the camera is visually the same as that between the object and the camera, the advertisement fusion is more natural, and the position suitable for embedding the advertisement can be determined in the key frame based on the depth image.
In the embodiment of the application, the depth estimation of the key frame may sequentially predict the depth map in the key frame using a NeW CRFs monocular depth estimation algorithm. Specifically, the key frames can be input into a depth estimation model which is trained by a large number of depth sets corresponding to training image sets for processing through prediction by a traditional method based on machine learning, and a depth map corresponding to each key frame is obtained through prediction. The depth of the key frames may also be solved by monocular depth cues to obtain a depth map for each of the key frames.
Step S130, determining a target key frame and a target insertion position corresponding to the target key frame in the key frames based on the depth map.
Specifically, the video frames between every two adjacent key frames are approximately the same as the content of the adjacent key frames, but the video frames only show local content changes, so that the key frames of the original video to be implanted can be detected first, and the detection result of the video frames can be determined based on the detection result.
In this embodiment of the present application, after obtaining the above-mentioned key frames, it is first required to detect whether all the key frames can be subjected to advertisement implantation for the key frames, and detect advertisement implantation positions in the key frames suitable for advertisement implantation. Alternatively, an object in the image may be detected based on the key frame image, and object feature information of the detected object, such as an object type, an object position, an object size, and the like, may be acquired, and a part of the key frame or a delivery position in the key frame may be screened out by the object type. And further calculating whether the object has enough plane space in the key frames for advertising by utilizing the depth map of the key frames, and screening target key frames from all the key frames.
Further, detecting the position of the target object of the key frame to obtain a target detection point and a target insertion position. Or after the target key frame is obtained, detecting the position of the target object in the target key frame, namely a target detection point, determining the target detection point of each video frame through the mapping relation between all video frames corresponding to the target key frame and the target key frame, and collecting the target detection points of all the target key frame and the video frame to obtain the advertisement target insertion position of the original video to be implanted.
For example, as shown in fig. 2, an exemplary diagram of a keyframe is provided for an embodiment of the present application, where the content of the keyframe is that two people stand in front of a wall, and there is a large screen on the wall, and three trees on the flat ground on the wall. The objects in the image, namely the person 1, the person 2, the screen, the tree and the advertising board with furniture wholesale, can be detected, and the category, the position relation, the object size and the like of each object can be obtained. It is also possible to determine the object that can be used for advertisement delivery according to the object category, and it is apparent that in fig. 2, the object that can be used for advertisement delivery is a "screen". Further, according to the depth map of the key frame, calculating whether a 'screen' has enough plane space in the key frame for advertisement delivery, if so, judging whether the 'screen' reaches the condition only when the object area occupies one third of the total image area, if so, further calculating the specific position of the screen in the key frame, and determining by adopting coordinates, thereby obtaining the target insertion position.
And step 140, performing scene matching on the target key frame and the advertisement database to obtain a target implanted advertisement corresponding to the original video to be implanted.
In order to ensure that the implanted advertisement accords with the scene of the original video to be implanted, the scene of the target key frame is required to be matched with the scene of each advertisement in the advertisement database, and the target implanted advertisement with the matched scene is obtained.
For example, if the key frame shown in fig. 2 has a word of "furniture wholesale", the scene may be a gateway of a furniture wholesale market, and then the advertisement database may search for advertisements related to "furniture".
And step S150, inserting a target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video.
In this embodiment of the present application, the target insertion position includes at least a target key frame and a video frame corresponding to the target key frame, and the target detection points of all the target key frames and the video frames may be determined by the target key frame first at the approximate position of the original video to be implanted, and the insertion coordinates of the advertisement in the frame may be determined based on the target detection points, and the advertisement may be aligned with the insertion coordinates, so that the advertisement implantation may be implemented.
Obviously, according to the advertisement implantation method provided by the embodiment of the application, target detection processing can be performed according to the depth map of the key frame in the original video to be implanted, in the process of the target detection processing, the key frame is screened to obtain the target key frame with higher utilization rate of the implanted advertisement, and the target insertion position capable of realizing the implantation of the advertisement in the original video to be implanted can be determined based on the target key frame. Meanwhile, through scene recognition on the target key frames, advertisements which are more relevant to the original video content to be implanted can be found from an advertisement database, and the adaptation degree between the advertisements and the original video to be implanted is improved. Based on the method, the advertisement is implanted into the original video to be implanted, so that automation of advertisement implantation can be realized, the efficiency of advertisement implantation is improved, and the advertisement implantation is more natural and accurate.
Next, the embodiment of the present application will further describe the above advertisement implantation method.
The method is implemented based on key frames extracted from the original video to be implanted, wherein in the embodiment of the application, the optional step of extracting the key frames may include: calculating an interframe difference value between every two video frames in the original video to be implanted; carrying out average calculation on all the interframe difference values to obtain average interframe difference intensity; determining a target threshold based on the average inter-frame differential intensity; and determining the video frame corresponding to the inter-frame difference value equal to the target threshold as a key frame.
And calculating the average value of all the inter-frame differences between all the video frames to be implanted with the original video and calculating the inter-frame difference between every two adjacent video frames to obtain the average inter-frame difference intensity. And selecting the video frame with the average inter-frame difference intensity local maximum value as the key frame of the original video to be implanted. Further, performing depth estimation processing on each key frame to obtain a depth map corresponding to each key frame.
Having generally described the process of determining a target key frame and a target insertion position corresponding to the target key frame in the key frame based on a depth map of the key frame, the process will be described in detail, and may specifically include:
calculating the spatial mapping relation between all video frames between every two adjacent key frames and the first key frame in the two adjacent key frames to obtain a first spatial transformation matrix corresponding to each key frame;
performing target detection on each key frame to obtain a detection result corresponding to each key frame, wherein the detection result at least comprises an object and object characteristic information;
Determining the key frames of which the detection results meet object detection conditions as candidate key frames;
carrying out plane area estimation on the candidate key frames based on the object information and the depth map corresponding to each candidate key frame to obtain an object plane area ratio corresponding to each candidate key frame;
determining the candidate key frames corresponding to the object plane area ratio reaching a preset threshold as target key frames;
determining the object position in the object feature information corresponding to each target key frame as a target detection point of each target key frame;
determining a target detection point of each video frame between each two adjacent target key frames based on the target detection point of a first one of the two adjacent target key frames and the first spatial transformation matrix;
and combining the target detection points of all the target key frames with the target detection points of the video frames to obtain target insertion positions corresponding to the original video to be implanted.
The spatial transformation matrix can reflect the mapping relation of the key frame in the video frames corresponding to the key frame, for example, in the key frame shown in fig. 2, but the video frames behind the key frame may have only partial image content, for example, the original video where the key frame is located shows the video content of the conversation of the person 1 and the person 2 around the frame, so that there are likely to be only two changes of the positions of the persons in the following video frame, and no image showing of the target insertion position "screen" is shown, if the advertisement is wanted to be inserted into the "screen", the "screen" in the key frame needs to be mapped to all video frames, and an expression capable of representing the mapping relation, namely, the first spatial transformation matrix is obtained.
Typically, the shot and scene recorded by a video are changeable, and there are a plurality of extracted key frames, each key frame has a corresponding video segment, and the video segment contains a plurality of video frames. In this embodiment of the present application, when calculating the first spatial transformation matrix, a video segment between two key frames may be determined as a video segment corresponding to a candidate key frame in the two adjacent key frames, and then all video frames and a first key frame included in the video segment between two key frames are calculated based on an image matching algorithm, so as to obtain the first spatial transformation matrix capable of representing a mapping relationship between the video frame and the first key frame.
Assuming that an original video to be implanted is V, wherein the video frames respectively have V1, V2, V3, V4, V5, V6, V7, V8 and V9, the key frames extracted from the video frames include V1, V4 and V8, and V1, V4 and V8 are replaced by A, B, C in sequence for distinguishing the key frames, the video frames included in the original video V are A, V2, V3, B, V5, V6, V7 and C, V9. The first space matrix between every two adjacent key frames needs to be calculated based on an image matching algorithm, and based on the method, the calculation is performed on the adjacent key frames AB and BC and video frames contained in video segments between the AB and BC respectively. If the spatial mapping relationship between the key frame a and v2, v3 is obtained, and v2, v3 are the video segment and the video frame corresponding to the key frame a, where the first spatial mapping matrix can reflect the mapping relationship between frame images of substantially the same picture, for example, if the key frame shown in fig. 2 is the key frame a in this example, the center coordinate point of the person 1 in the key frame a is (1, 1), but the center coordinate point of the person 1 in the video frame v2 is (3, 1), and the first spatial mapping matrix of the key frame a can reflect this spatial mapping relationship. Based on this, a first spatial transformation matrix corresponding to key frame B and video frames v5, v6, v7 can be determined. In addition, the first spatial matrix of key frame C may be calculated based on video frame v9.
On the basis, in order to improve the algorithm efficiency of image matching, the embodiment of the application carries out combined compiling on OpenCv and Gpu, and the GPU is utilized to accelerate, so that the operation speed can be increased by 5-10 times, and the instantaneity of the advertisement implantation processing process is achieved.
In addition, not every key frame has a place where an advertisement can be inserted, and all key frames need to be filtered. First, object detection needs to be performed on each key frame to determine whether an object suitable for advertisement insertion exists in the key frames.
In this embodiment of the present application, a Yolo series method is adopted to sequentially perform target detection on each key frame, a detection threshold of the Yolo target detection model is set to be 0.5, and target detection is performed on each key frame to obtain a corresponding detection result, where the detection result may include a prediction score (or a confidence score), and may further include an object and object feature information. The prediction score can reflect whether an object target exists in the current image, if the prediction score is closer to 1, the more likely the object target exists in the current image, the object target can be further extracted to obtain object and object characteristic information; otherwise, if the prediction score is smaller than the object target of 0.5, the corresponding object target or key frame is automatically discarded.
If no object is detected in the key frame based on the detection result, the returned detection result is None; if an object is detected in the key frame, the detection result required to be returned is the detected object and object characteristic information. So far, the first screening is completed on all the key frames, and the key frames capable of detecting the object are determined as candidate key frames.
Because not all detected objects can be used to place advertisements, it is also necessary to detect and screen objects detected from the candidate key frames on the basis of the candidate key frames. Still taking the keyframe shown in fig. 2 as an example, objects that can be detected by the keyframe of fig. 2 include "billboards" and "screens" that can be determined to be candidate keyframes. Further, based on the depth map corresponding to the candidate key frame and the object characteristic information corresponding to each of the advertisement board and the screen, the object plane map area of each of the two objects in the candidate key frame or the object plane area ratio of the whole picture area of the occupied key frame is calculated. In the embodiment of the application, the object can be determined to be used for advertisement display by setting that only the plane area of the object exceeds 80% of the plane area of the plane view. Based on this, it is possible to continue comparing the areas of the "screen" and "billboard" with the area of the plan view, determine the planar area ratio of each object to the plan view, and determine whether the planar area ratio can meet the above-described assumption condition. If only the planar area ratio of the "screen" reaches the above assumption condition, four vertex coordinates of the "screen" on the plan view may be directly determined as the target detection points. Further, it is also necessary to determine a target detection point of a series of video frames corresponding to the candidate key frame.
The process of calculating the object plane map area can utilize Gaussian descriptor to filter the depth map of the candidate key frame, slide on the depth map by using a sweep window, traverse the adjacency matrix according to the BFS algorithm, and obtain the plane map of the candidate key frame. The plane area in the plane graph is calculated based on the object characteristic information of the screen and the billboard, wherein the object characteristic information can be the coordinate positions of the screen and the billboard in the current key frame.
According to the above assumption: the key frame in fig. 2 is the key frame a in the original video V to be implanted, and the specific coordinate points of the "screen" in the video frames V2 and V3 can be determined according to the first spatial transformation matrix of the key frame a by taking the target detection point in the key frame a as a reference or taking four vertex coordinates of the "screen" on the key frame a as a reference, and the coordinate points are determined as the target detection points. The target insertion positions corresponding to the video segments represented by the key frame a (video frame v 1) and the video frames v2 and v3 can be obtained by combining the target detection points of the key frame a and the video frames v2 and v 3.
After the target insertion position of the advertisement is obtained, determining an advertisement matched with the scene of the target insertion position, specifically performing scene matching on the target key frame and the advertisement in the advertisement database, and obtaining the target implantation advertisement corresponding to the original video to be implanted may include: performing scene recognition on each target key frame to obtain a scene corresponding to each target key frame and a scene prediction probability corresponding to each scene; determining the scene corresponding to the scene prediction probability reaching a preset scene prediction threshold as an adaptive scene of the target key frame; and matching the adaptation scene with the application scene of each advertisement in the advertisement database to obtain the target implanted advertisement corresponding to the original video to be implanted.
In the embodiment of the application, scene recognition on the key frames can adopt a Places-CNNs method to sequentially recognize scenes in the key frames. There are a large number of scene classification pictures in the Places dataset, and more than 400 different types of scene environments can be effectively identified from the key frames. For each target key frame, the obtained prediction result at least comprises a scene category and a scene prediction probability corresponding to each scene category. Assuming that for a key frame, the scene prediction probabilities corresponding to all the scene categories obtained by prediction are lower than 0.5, the scene prediction probabilities can be ranked, only the scene categories corresponding to the field prediction probabilities of the three preceding ranks are taken, and the three scene categories are determined to be the adaptive scenes of the key frame; and when the scene prediction probabilities corresponding to all the predicted scene categories are not lower than 0.5, determining the scene category with the scene prediction probability larger than 0.5 as the adaptive scene of the key frame. Based on the above, scene prediction can be performed on the target key frame, and an adaptive scene corresponding to the target key frame is obtained.
In addition, each advertisement in the advertisement database has an advertisement scene corresponding to the advertisement, and possibly each advertisement has a plurality of advertisement scenes. And matching the adaptive scene of the target key frame with each advertisement scene in the advertisement database based on the matching, obtaining an adaptive advertisement corresponding to the target key frame, and determining the adaptive advertisement as a target implant advertisement. In this process, there may be a case where a plurality of adaptive advertisements are obtained, and then the adaptive probability of each adaptive advertisement corresponding to the video segment of the target key frame may be further calculated, and the adaptive advertisement with the highest adaptive probability may be selected as the target implant advertisement.
Further, the target implantation advertisement is inserted into the target insertion position in the original video to be implanted, and a target video is obtained. For an original video to be implanted, there may be multiple target insertion positions, for example, the original video is V, the target insertion positions have video frames V2-V3 and video frames V7-V8, and the two positions are not connected, and belong to two target insertion positions that are independent of each other, and scene recognition and matching can be performed on the two target insertion positions respectively, so as to obtain target implantation advertisements corresponding to the two target insertion positions respectively, and the target implantation advertisements are inserted into the corresponding target insertion positions.
Since the target embedded advertisement obtained by matching from the advertisement database may have a picture advertisement of a single frame image or a video advertisement of a plurality of frames image, and the video advertisement needs to be considered to be aligned with the video frame of the original video, the target embedded advertisement may be divided into two cases when being inserted, and when the target embedded advertisement is a picture advertisement of a single frame image, the process of inserting the advertisement may include: performing perspective transformation on the single-frame image and the target insertion position to obtain a second space transformation matrix; covering the single-frame image to each video frame in the target insertion position through the second space transformation matrix to obtain a fusion image corresponding to each video frame; carrying out gradient calculation on the fusion images to obtain gradient fields corresponding to each fusion image; determining a divergence of each of the fused images based on the gradient fields corresponding to each of the fused images; generating a new pixel value corresponding to each of the fused images based on the divergence of each of the fused images; and adjusting the pixel value of each fusion image to the new pixel value to obtain a target video.
And when the targeted advertisement is a video advertisement of a continuous multi-frame image, the process of inserting the advertisement may include: extracting an image key frame from the continuous multi-frame image; performing perspective transformation on the image key frame and the target insertion position to obtain a third space transformation matrix; based on the third space transformation matrix, covering the continuous multi-frame images to the target insertion position to obtain a fusion image corresponding to each video frame in the target insertion position; carrying out gradient calculation on the fusion images to obtain gradient fields corresponding to each fusion image; determining a divergence of each of the fused images based on the gradient fields corresponding to each of the fused images; generating new pixel values for each of the fused images based on the divergence of each of the fused images; and adjusting the pixel value of each fusion image to the new pixel value to obtain a target video.
The second spatial transformation matrix can specifically lock a mapping relationship between the current advertisement and the target insertion position, for example, the single frame image is a rectangular picture, the single frame image can be mapped to the target insertion position, for example, the target insertion position is a screen shown in fig. 2, the single frame image is mapped to a key frame to obtain four vertex coordinates of the rectangular image in the key frame, and the single frame image is mapped to video frames v2 and v3 to obtain a second spatial transformation matrix corresponding to the single frame image and the target insertion position. And covering the single frame image to the target insertion position based on the second space transformation matrix to obtain a fusion image after the key frame A and the video frames v2 and v3 are respectively inserted into the single frame image.
When the advertisement is a video advertisement of continuous multi-frame images, the problem of video alignment needs to be considered, and the advertisement video is usually presented in a rectangular window mode, so that an image key frame serving as the advertisement video can be extracted from the continuous multi-frame images. And performing perspective transformation on the key frame and the target insertion position to obtain a third space transformation matrix, wherein the third transformation matrix further reflects the position corresponding relation between the matrix window and each frame in the target insertion position, and the position corresponding relation not only can reflect the coordinate corresponding relation, but also can reflect the distance relation between the same object and the lens, and the like, between each frame.
Based on the third spatial transformation matrix, sequentially aligning the first frame image of the continuous multi-frame image with a target detection point of the first video frame (key frame) of the target insertion position as a reference, so as to obtain a fusion image corresponding to each video frame in the target insertion position.
After the image advertisement and the video advertisement are inserted into the target insertion position, the brightness, shadow, tone and the like of the picture are different because the shooting conditions of the image advertisement and the video advertisement are different from those of the original video to be implanted. Therefore, after the advertisement is inserted into the original video to be implanted, the situation that the advertisement image and the original video image are fused is likely to exist, in the embodiment of the application, a new pixel value can be determined for each fused image by calculating the gradient and the divergence of the fused image, then the pixel value of the fused image is adjusted to the new pixel value, and a target video is obtained, so that the advertisement can be fused into the original video to be implanted more naturally.
In addition, in the embodiment of the present application, when the target implant advertisement is the video advertisement of the continuous multiframe, the playing time of the video clip corresponding to the target insertion position in video playing may be counted, the time length when the video advertisement is played at a "x 1" multiple speed is compared, if the time length of the video advertisement is longer than the playing time length, the playing multiple speed of the video advertisement may be selected to be accelerated, otherwise, if the time length of the video advertisement is shorter than the playing time length, the playing multiple speed of the video advertisement may be selected to be slowed down; if the number of frames of the video advertisement is greater than the number of frames of the target insertion location, the number of excess frames in the video advertisement may be deleted.
Before the target embedded advertisement is inserted into the target insertion position, it may be further determined that the target insertion position may not be in a situation of blocking important content in the original video to be embedded, so as to prevent the important content from being blocked by the target embedded advertisement after the target embedded advertisement is inserted into the target insertion position, where the important content is usually a portrait appearing in the original video. Specifically, the judgment and solving of this case can be made by the following steps, which may include: carrying out image matting processing on the video clip where the target insertion position is located to obtain background matting and image matting; determining whether the background matting is behind the portrait matting based on the background matting and the portrait matting; if not, placing the image layer where the portrait matting is located in front of the image layer where the background matting is located, and inserting a target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video; if yes, inserting a target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video.
In this embodiment of the present application, a RVM algorithm (RobustVideoMatting algorithm) may be used to perform video portrait matting processing on a video segment corresponding to the target insertion position, to obtain a matting mask image, and a cyclic network (RNN) is used to optimize the mask image, so that front-to-back frame transformation is relatively stable, and flicker at the edge is reduced. The mask image is a portrait matting result of each frame of image in the video clip, the pixel value range is 0 and 1, the pixel value in the matting result is 0 and represents the background, the pixel value is 1 and represents the portrait, and based on the result, the portrait matting result can be divided into background matting and portrait matting.
Based on the background matting and the image matting, whether the background matting is in front of (or above) the image matting is judged, if yes, the background matting can be judged to cover the image matting, the image layer where the background matting is located is arranged below (or behind) the image layer where the image matting is located, then the target implantation advertisement is inserted into the target insertion position, and the target insertion position is used as the background to be already below (or behind) the image layer where the image matting is located, so that the image can be ensured not to be covered. And if the background matte is not in front of (or above) the portrait matte, the target implant advertisement is directly inserted into the target insertion position without redundant processing of the video piece pair.
Based on this, in the embodiment of the present application, a TensorRt technology may also be used to implement high throughput, low delay, and low device memory occupation, so as to implement the gpu acceleration on the RVM algorithm, and achieve the effect of accelerating the image matting efficiency.
In summary, according to the scheme, the advertisement implantation processing can be performed completely on the original video to be implanted, the implanted advertisement can be ensured to conform to the scene of the original video to be implanted, the picture of the frame image can be accurately adjusted, and the effects of nature and high accuracy of advertisement implantation are achieved. And automation of advertisement implantation is realized, and working efficiency is improved.
The following may refer to an exemplary diagram of an original video to be implanted provided in the embodiment of the present application shown in fig. 3, which provides a practical application example for the advertisement implantation method. As shown in fig. 3, a short video with three video frames is illustrated, and the video frames v1 are identical to the key frames shown in fig. 2, whereas the video frames v2, v3 have only local information of the image. Firstly, after the video is obtained, key frames in three video frames should be extracted, obviously, the key frames in the original video are regarded as video frames v1 visually, inter-frame difference values between the video frames v1v2 and v2v3 should be calculated sequentially in the data processing process, and calculation is performed based on the inter-frame difference values to determine that the video frames v1 are the key frames of the original video. Further, performing depth processing on the key frame to obtain a depth map corresponding to the key frame.
Further, processing is performed according to the depth map, and a target key frame and a target insertion position corresponding to the target key frame are determined in the key frames. Further, scene recognition is performed on the target key frame, and because the advertisement board text "furniture wholesale" appears on the video frame v1, the keywords of the adaptation scene recognized by the video frame v1 may be "furniture", "wholesale market", etc., and based on this, the scene of each advertisement in the advertisement database is matched, so as to obtain the target implanted advertisement.
In this example, the target key frame is the key frame v1, and the target insertion position is the position of the "screen" shown in the image. But visually, if a targeted advertisement is inserted on the "screen", it is possible that "character 1" and "character 2" that act in front of the "screen" are blocked. Then the three video frames are required to be subjected to the matting processing to obtain a background matting "screen" and a person image matting "person 1" and "person 2", and whether the person image matting is in front of the background matting is judged, and the layer is set to ensure that the person image matting is in front. After the setting is completed, the target implantation advertisement can be inserted into the target insertion position, and then the advertisement implantation can be completed.
The following describes an advertisement implanting device provided in the embodiments of the present application, and the advertisement implanting device described below and the advertisement implanting method described above may be referred to correspondingly.
First, referring to fig. 4, an advertisement implanting apparatus applicable to hardware devices such as a server, a terminal, a mobile terminal, etc. will be described, and as shown in fig. 4, the advertisement implanting apparatus may include:
a data acquisition unit 100 for acquiring an original video to be implanted and an advertisement database;
a key frame depth processing unit 200, configured to perform depth estimation processing on key frames in the original video to be implanted to obtain a depth map corresponding to each key frame, where the key frames are video frames in the original video to be implanted that satisfy a preset condition;
a target insertion position determining unit 300, configured to determine a target key frame and a target insertion position corresponding to the target key frame in the key frames based on the depth map;
the scene matching unit 400 is configured to perform scene matching on the target key frame and the advertisement in the advertisement database, so as to obtain a target implanted advertisement corresponding to the original video to be implanted;
and the advertisement implantation unit 500 is used for inserting the target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video.
According to the advertisement implanting device, target detection processing can be carried out according to the depth map of the key frame in the original video to be implanted, the key frame is screened in the target detection processing process, the target key frame with high utilization rate of the implanted advertisement is obtained, and the target inserting position capable of realizing the implantation advertisement in the original video to be implanted can be determined based on the target key frame. Meanwhile, through scene recognition on the target key frames, advertisements which are more relevant to the original video content to be implanted can be found from an advertisement database, and the adaptation degree between the advertisements and the original video to be implanted is improved. Based on the method, the advertisement is implanted into the original video to be implanted, so that automation of advertisement implantation can be realized, the efficiency of advertisement implantation is improved, and the advertisement implantation is more natural and accurate.
Optionally, the target insertion position determining unit 300 may include:
a transformation matrix first calculating subunit, configured to calculate a spatial mapping relationship between all video frames between each two adjacent key frames and a first key frame of the two adjacent key frames, so as to obtain a first spatial transformation matrix corresponding to each key frame;
The target detection subunit is used for carrying out target detection on each key frame to obtain a detection result corresponding to each key frame, wherein the detection result at least comprises an object and object characteristic information;
a key frame screening subunit, configured to determine, as candidate key frames, the key frames whose detection results satisfy an object detection condition;
a plane estimation subunit, configured to perform plane area estimation on the candidate key frames based on the object information and the depth map corresponding to the candidate key frames, so as to obtain an object plane area ratio corresponding to each candidate key frame;
a target key frame determining subunit, configured to determine, as a target key frame, the candidate key frame corresponding to the object plane area ratio reaching a preset threshold;
a key frame detection subunit, configured to determine an object position in the object feature information corresponding to each target key frame as a target detection point of each target key frame;
a video frame detection subunit, configured to determine, based on the target detection point of a first one of the two adjacent target key frames and the first spatial transformation matrix, a target detection point of each video frame between the two adjacent target key frames;
And the target insertion position determining subunit is used for combining the target detection points of all the target key frames and the target detection points of the video frames to obtain target insertion positions corresponding to the original video to be implanted.
Optionally, the scene matching unit 400 includes:
the scene recognition subunit is used for carrying out scene recognition on each target key frame to obtain a scene corresponding to each target key frame and a scene prediction probability corresponding to each scene;
an adaptive scene screening subunit, configured to determine, as an adaptive scene of the target key frame, the scene corresponding to the scene prediction probability that reaches a preset scene prediction threshold;
and the scene matching subunit is used for matching the adaptive scene with the application scene of each advertisement in the advertisement database to obtain the target implanted advertisement corresponding to the original video to be implanted.
Optionally, the targeted advertisement includes a single frame image, and the advertisement implanting unit 500 may include:
the second computation subunit of the space matrix is used for performing perspective transformation on the single-frame image and the target insertion position to obtain a second space transformation matrix;
An advertisement image covering subunit, configured to cover the single-frame image to each video frame in the target insertion position through the second spatial transformation matrix, so as to obtain a fusion image corresponding to each video frame;
the gradient calculation first subunit is used for carrying out gradient calculation on the fusion images to obtain gradient fields corresponding to each fusion image;
a divergence calculation first subunit configured to determine a divergence of each of the fused images based on the gradient fields corresponding to each of the fused images;
a pixel value determining first subunit configured to generate a new pixel value corresponding to each of the fused images based on the divergence of each of the fused images;
and the pixel value adjusting second subunit is used for adjusting the pixel value of each fusion image to the new pixel value to obtain a target video.
Optionally, the targeted advertisement includes a continuous multi-frame image, and the advertisement implanting unit 500 may include:
a key frame extraction subunit, configured to extract an image key frame from the continuous multi-frame image;
a third computation subunit of the space matrix is used for performing perspective transformation on the image key frame and the target insertion position to obtain a third space transformation matrix;
An advertisement video overlaying subunit, configured to overlay the continuous multi-frame image to the target insertion position based on the third spatial transformation matrix, to obtain a fused image corresponding to each video frame in the target insertion position;
the gradient calculation second subunit is used for carrying out gradient calculation on the fusion images to obtain gradient fields corresponding to each fusion image;
a divergence calculation second subunit configured to determine a divergence of each of the fused images based on the gradient fields corresponding to each of the fused images;
a pixel value determining second subunit for generating a new pixel value for each of the fused images based on the divergence of each of the fused images;
and the pixel value adjusting second subunit is used for adjusting the pixel value of each fusion image to the new pixel value to obtain a target video.
Optionally, the method further comprises:
the matting processing unit is configured to perform scene matching on the target key frame and the advertisement in the advertisement database by using the scene matching unit 400, so as to obtain a target implant advertisement corresponding to the original video to be implanted, and then perform portrait matting processing on a video segment where the target insertion position is located, so as to obtain background matting and portrait matting;
The position judging unit is used for determining whether the background matting is behind the portrait matting or not based on the background matting and the portrait matting;
the image layer setting unit is used for placing the image layer where the portrait matting is located in front of the image layer where the background matting is located when the judging result of the position judging unit is negative, and inserting a target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video;
and the advertisement second implantation unit is used for inserting a target implantation advertisement into the target insertion position of the original video to be implanted when the judgment result of the position judgment unit is yes, so as to obtain a target video.
Optionally, the key frame depth processing unit 200 includes:
an inter-frame difference calculating subunit, configured to calculate an inter-frame difference value between every two video frames in the original video to be implanted;
the average value calculating subunit is used for carrying out average calculation on all the interframe difference values to obtain average interframe difference intensity;
a threshold setting subunit configured to determine a target threshold based on the average inter-frame differential strength;
a video frame screening subunit, configured to determine, as a key frame, the video frame corresponding to the inter-frame difference value equal to the target threshold;
And the depth processing subunit is used for carrying out depth estimation processing on each key frame to obtain a depth map corresponding to each key frame.
The advertisement implanting device provided by the embodiment of the application can be applied to advertisement implanting equipment. The advertisement implanting device may be a server, a mobile terminal, or the like.
Fig. 5 illustrates a schematic structural diagram of an advertisement implanting apparatus provided in an embodiment of the present application, and referring to fig. 5, the structure of the advertisement implanting apparatus may include: at least one processor 10, at least one memory 20, and at least one communication bus 30, at least one communication interface 40;
in the embodiment of the present application, the number of the processor 10, the memory 20, the communication bus 30 and the communication interface 40 is at least one, and the processor 10, the memory 20 and the communication interface 40 complete communication with each other through the communication bus 30;
the processor 10 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;
the memory 20 may comprise a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory;
The memory stores a program, and the processor can call the program stored in the memory, wherein the program is used for realizing each processing flow in the advertisement implantation scheme.
The embodiment of the application also provides a storage medium, which can store a program suitable for being executed by a processor, and the program is used for realizing each processing flow in the advertisement implantation scheme.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An advertisement implanting method, comprising:
acquiring an original video to be implanted and an advertisement database;
performing depth estimation processing on the key frames in the original video to be implanted to obtain a depth map corresponding to each key frame, wherein the key frames are video frames meeting preset conditions in the original video to be implanted;
determining a target key frame and a target insertion position corresponding to the target key frame in the key frames based on the depth map;
Performing scene matching on the target key frame and the advertisement in the advertisement database to obtain a target implantation advertisement corresponding to the original video to be implanted;
and inserting the target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video.
2. The method of claim 1, wherein determining a target key frame and a target insertion position corresponding to the target key frame in the key frames based on the depth map comprises:
calculating the spatial mapping relation between all video frames between every two adjacent key frames and the first key frame in the two adjacent key frames to obtain a first spatial transformation matrix corresponding to each key frame;
performing target detection on each key frame to obtain a detection result corresponding to each key frame, wherein the detection result at least comprises an object and object characteristic information;
determining the key frames of which the detection results meet object detection conditions as candidate key frames;
carrying out plane area estimation on the candidate key frames based on the object feature information and the depth map corresponding to the candidate key frames to obtain object plane area ratios corresponding to each candidate key frame;
Determining the candidate key frame corresponding to the object plane area ratio reaching a preset threshold as a target key frame;
determining the object position in the object feature information corresponding to each target key frame as a target detection point of each target key frame;
determining a target detection point of a video frame between each two adjacent target key frames based on the target detection point of a first one of the two adjacent target key frames and the first spatial transformation matrix;
and combining the target detection points of all the target key frames with the target detection points of the video frames to obtain target insertion positions corresponding to the original video to be implanted.
3. The method of claim 1, wherein the scene matching the target key frame with the advertisement in the advertisement database to obtain the target implant advertisement corresponding to the original video to be implanted comprises:
performing scene recognition on each target key frame to obtain a scene corresponding to each target key frame and a scene prediction probability corresponding to each scene;
Determining the scene corresponding to the scene prediction probability reaching a preset scene prediction threshold as an adaptive scene of the target key frame;
and matching the adaptation scene with the application scene of each advertisement in the advertisement database to obtain the target implanted advertisement corresponding to the original video to be implanted.
4. The method of claim 1, wherein the targeted implant advertisement comprises a single frame image, the inserting the targeted implant advertisement into the targeted insertion location of the original video to be implanted, resulting in a targeted video, comprising:
performing perspective transformation on the single-frame image and the target insertion position to obtain a second space transformation matrix;
covering the single-frame image to each video frame in the target insertion position through the second space transformation matrix to obtain a fusion image corresponding to each video frame;
carrying out gradient calculation on the fusion images to obtain gradient fields corresponding to each fusion image;
determining a divergence of each of the fused images based on the gradient fields corresponding to each of the fused images;
generating a new pixel value corresponding to each of the fused images based on the divergence of each of the fused images;
And adjusting the pixel value of each fusion image to the new pixel value to obtain a target video.
5. The method of claim 1, wherein the targeted advertisement comprises a succession of multi-frame images, the inserting the targeted advertisement into the targeted insertion location of the original video to be implanted, resulting in a targeted video, comprising:
extracting an image key frame from the continuous multi-frame image;
performing perspective transformation on the image key frame and the target insertion position to obtain a third space transformation matrix;
based on the third space transformation matrix, covering the continuous multi-frame images to the target insertion position to obtain a fusion image corresponding to each video frame in the target insertion position;
carrying out gradient calculation on the fusion images to obtain gradient fields corresponding to each fusion image;
determining a divergence of each of the fused images based on the gradient fields corresponding to each of the fused images;
generating a new pixel value for each of the fused images based on the divergence of each of the fused images;
and adjusting the pixel value of each fusion image to the new pixel value to obtain a target video.
6. The method of claim 1, further comprising, after scene matching the target keyframe with the advertisement in the advertisement database to obtain a target implant advertisement corresponding to the original video to be implanted:
carrying out image matting processing on the video clip where the target insertion position is located to obtain background matting and image matting;
determining whether the background matting is behind the portrait matting based on the background matting and the portrait matting;
if not, placing the image layer where the portrait matting is located in front of the image layer where the background matting is located, and inserting a target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video;
if yes, inserting a target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video.
7. The method according to claim 1, wherein performing depth estimation processing on the key frames in the original video to be implanted to obtain a depth map corresponding to each key frame comprises:
calculating an interframe difference value between every two video frames in the original video to be implanted;
Carrying out average calculation on all the interframe difference values to obtain average interframe difference intensity;
determining a target threshold based on the average inter-frame differential intensity;
determining the video frame corresponding to the inter-frame difference value equal to the target threshold as a key frame;
and carrying out depth estimation processing on each key frame to obtain a depth map corresponding to each key frame.
8. An advertising implanting device, comprising:
the data acquisition unit is used for acquiring the original video to be implanted and the advertisement database;
the key frame depth processing unit is used for carrying out depth estimation processing on the key frames in the original video to be implanted to obtain a depth map corresponding to each key frame, wherein the key frames are video frames meeting preset conditions in the original video to be implanted;
a target insertion position determining unit, configured to determine a target key frame and a target insertion position corresponding to the target key frame in the key frames based on the depth map;
the scene matching unit is used for performing scene matching on the target key frame and the advertisement in the advertisement database to obtain a target implanted advertisement corresponding to the original video to be implanted;
And the advertisement implantation unit is used for inserting the target implantation advertisement into the target insertion position of the original video to be implanted to obtain a target video.
9. An advertising implanting device, comprising: a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the advertisement implantation method according to any one of claims 1 to 7.
10. A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the advertisement implantation method according to any of claims 1-7.
CN202310548091.1A 2023-05-16 2023-05-16 Advertisement implantation method, advertisement implantation device, advertisement implantation equipment and readable storage medium Pending CN116308530A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310548091.1A CN116308530A (en) 2023-05-16 2023-05-16 Advertisement implantation method, advertisement implantation device, advertisement implantation equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310548091.1A CN116308530A (en) 2023-05-16 2023-05-16 Advertisement implantation method, advertisement implantation device, advertisement implantation equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN116308530A true CN116308530A (en) 2023-06-23

Family

ID=86798077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310548091.1A Pending CN116308530A (en) 2023-05-16 2023-05-16 Advertisement implantation method, advertisement implantation device, advertisement implantation equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN116308530A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116939293A (en) * 2023-09-17 2023-10-24 世优(北京)科技有限公司 Implantation position detection method and device, storage medium and electronic equipment
CN116993405A (en) * 2023-09-25 2023-11-03 深圳市火星人互动娱乐有限公司 Method, device and system for implanting advertisements into VR game

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107509045A (en) * 2017-09-11 2017-12-22 广东欧珀移动通信有限公司 Image processing method and device, electronic installation and computer-readable recording medium
CN109842811A (en) * 2019-04-03 2019-06-04 腾讯科技(深圳)有限公司 A kind of method, apparatus and electronic equipment being implanted into pushed information in video
CN110830852A (en) * 2018-08-07 2020-02-21 北京优酷科技有限公司 Video content processing method and device
CN111402369A (en) * 2020-03-10 2020-07-10 京东数字科技控股有限公司 Interactive advertisement processing method and device, terminal equipment and storage medium
CN111988657A (en) * 2020-08-05 2020-11-24 网宿科技股份有限公司 Advertisement insertion method and device
CN112613473A (en) * 2020-12-31 2021-04-06 湖南快乐阳光互动娱乐传媒有限公司 Advertisement implanting method and system
CN113038268A (en) * 2021-03-11 2021-06-25 湖南快乐阳光互动娱乐传媒有限公司 Plane advertisement implanting method and device
CN113516696A (en) * 2021-06-02 2021-10-19 广州虎牙科技有限公司 Video advertisement implanting method and device, electronic equipment and storage medium
CN113923516A (en) * 2021-09-29 2022-01-11 平安科技(深圳)有限公司 Video processing method, device and equipment based on deep learning model and storage medium
CN114051166A (en) * 2020-07-24 2022-02-15 北京达佳互联信息技术有限公司 Method, device, electronic equipment and storage medium for implanting advertisement in video
CN114881707A (en) * 2022-05-30 2022-08-09 咪咕文化科技有限公司 Advertisement implanting method and device
CN114943549A (en) * 2021-02-10 2022-08-26 中国科学技术大学 Advertisement delivery method and device
CN115187301A (en) * 2022-07-14 2022-10-14 郭杰 Advertisement instant implanting method, system and device based on user portrait

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107509045A (en) * 2017-09-11 2017-12-22 广东欧珀移动通信有限公司 Image processing method and device, electronic installation and computer-readable recording medium
CN110830852A (en) * 2018-08-07 2020-02-21 北京优酷科技有限公司 Video content processing method and device
CN109842811A (en) * 2019-04-03 2019-06-04 腾讯科技(深圳)有限公司 A kind of method, apparatus and electronic equipment being implanted into pushed information in video
CN111402369A (en) * 2020-03-10 2020-07-10 京东数字科技控股有限公司 Interactive advertisement processing method and device, terminal equipment and storage medium
CN114051166A (en) * 2020-07-24 2022-02-15 北京达佳互联信息技术有限公司 Method, device, electronic equipment and storage medium for implanting advertisement in video
CN111988657A (en) * 2020-08-05 2020-11-24 网宿科技股份有限公司 Advertisement insertion method and device
CN112613473A (en) * 2020-12-31 2021-04-06 湖南快乐阳光互动娱乐传媒有限公司 Advertisement implanting method and system
CN114943549A (en) * 2021-02-10 2022-08-26 中国科学技术大学 Advertisement delivery method and device
CN113038268A (en) * 2021-03-11 2021-06-25 湖南快乐阳光互动娱乐传媒有限公司 Plane advertisement implanting method and device
CN113516696A (en) * 2021-06-02 2021-10-19 广州虎牙科技有限公司 Video advertisement implanting method and device, electronic equipment and storage medium
CN113923516A (en) * 2021-09-29 2022-01-11 平安科技(深圳)有限公司 Video processing method, device and equipment based on deep learning model and storage medium
CN114881707A (en) * 2022-05-30 2022-08-09 咪咕文化科技有限公司 Advertisement implanting method and device
CN115187301A (en) * 2022-07-14 2022-10-14 郭杰 Advertisement instant implanting method, system and device based on user portrait

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘传才: "《图像理解与计算视觉》", vol. 1, 厦门大学出版社, pages: 62 - 63 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116939293A (en) * 2023-09-17 2023-10-24 世优(北京)科技有限公司 Implantation position detection method and device, storage medium and electronic equipment
CN116939293B (en) * 2023-09-17 2023-11-17 世优(北京)科技有限公司 Implantation position detection method and device, storage medium and electronic equipment
CN116993405A (en) * 2023-09-25 2023-11-03 深圳市火星人互动娱乐有限公司 Method, device and system for implanting advertisements into VR game
CN116993405B (en) * 2023-09-25 2023-12-05 深圳市火星人互动娱乐有限公司 Method, device and system for implanting advertisements into VR game

Similar Documents

Publication Publication Date Title
CN110166827B (en) Video clip determination method and device, storage medium and electronic device
CN108322788B (en) Advertisement display method and device in live video
CN109173263B (en) Image data processing method and device
CN110381369B (en) Method, device and equipment for determining recommended information implantation position and storage medium
CN116308530A (en) Advertisement implantation method, advertisement implantation device, advertisement implantation equipment and readable storage medium
KR101759453B1 (en) Automated image cropping and sharing
CN101425133B (en) Human image retrieval system
WO2017190639A1 (en) Media information display method, client and server
CN111861572B (en) Advertisement putting method and device, electronic equipment and computer readable storage medium
CN108121957A (en) The method for pushing and device of U.S. face material
US20210406549A1 (en) Method and apparatus for detecting information insertion region, electronic device, and storage medium
JP2002232839A (en) Device and method for generating label object video of video sequence
CN111222450B (en) Model training and live broadcast processing method, device, equipment and storage medium
CN105684046B (en) Generate image composition
CN106203286A (en) The content acquisition method of a kind of augmented reality, device and mobile terminal
CN111429341B (en) Video processing method, device and computer readable storage medium
EP4086786A1 (en) Video processing method, video searching method, terminal device, and computer-readable storage medium
CN109408672A (en) A kind of article generation method, device, server and storage medium
CN106649629A (en) System connecting books with electronic resources
CN111401238A (en) Method and device for detecting character close-up segments in video
CN110766645B (en) Target person recurrence map generation method based on person identification and segmentation
WO2020192187A1 (en) Media processing method and media server
CN116761037B (en) Method, device, equipment and medium for video implantation of multimedia information
CN109788311B (en) Character replacement method, electronic device, and storage medium
CN113596354A (en) Image processing method, image processing device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination