WO2020259510A1 - 信息植入区域的检测方法、装置、电子设备及存储介质 - Google Patents

信息植入区域的检测方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2020259510A1
WO2020259510A1 PCT/CN2020/097782 CN2020097782W WO2020259510A1 WO 2020259510 A1 WO2020259510 A1 WO 2020259510A1 CN 2020097782 W CN2020097782 W CN 2020097782W WO 2020259510 A1 WO2020259510 A1 WO 2020259510A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
implanted
video
area
information
Prior art date
Application number
PCT/CN2020/097782
Other languages
English (en)
French (fr)
Inventor
生辉
黄东波
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2020259510A1 publication Critical patent/WO2020259510A1/zh
Priority to US17/370,764 priority Critical patent/US20210406549A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26208Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists the scheduling operation being performed under constraints
    • H04N21/26241Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists the scheduling operation being performed under constraints involving the time of distribution, e.g. the best time of the day for inserting an advertisement or airing a children program
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • H04N21/4316Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for displaying supplemental content in a region of the screen, e.g. an advertisement in a separate window
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Definitions

  • This application relates to the field of computer technology, and in particular to a detection method for an information implantation area, a detection device for an information implantation area, electronic equipment and a storage medium.
  • Video-In is embedded advertising.
  • Video-Out is a scene pop-up advertisement based on the car, face, Understand goals and scenarios, and display pop-up ads related to video content.
  • Video-In For video advertisements in the form of Video-In, it is usually necessary for professional designers to manually retrieve the video placement advertisement spots, which consumes a lot of manpower and time.
  • the embodiments of the present application provide a detection method for an information implantation area, a detection device for an information implantation area, electronic equipment, and a storage medium, which can improve the detection efficiency of video implantation advertising spaces.
  • the embodiment of the application provides a method for detecting an information implantation area, including:
  • the target candidate to be implanted area is determined from the candidate to be implanted area, and the largest rectangle search is performed on the target candidate to be implanted area to obtain the target to be implanted area.
  • An embodiment of the application also provides a detection device for an information implantation area, including:
  • the shot segmentation module is configured to obtain a video to be implanted, and segment the video to be implanted to obtain video segments;
  • An object labeling module configured to obtain a target frame in the video segment, and to identify and segment the object in the target frame to obtain label information corresponding to the object;
  • a clustering module configured to determine a target object according to the annotation information, and cluster the target object to obtain multiple candidate regions to be implanted;
  • the area search module is configured to determine a target candidate to be implanted area from the candidate to be implanted area, and perform a largest rectangular search on the target candidate to be implanted area to obtain the target to be implanted area.
  • An embodiment of the present application also provides an electronic device, including: one or more processors; a storage device, configured to store one or more programs, when the one or more programs are used by the one or more processors When executed, the method for detecting the above-mentioned information implantation area provided in the embodiment of the present application is realized.
  • the embodiment of the present application also provides a computer storage medium, the computer storage medium stores a computer program, and the computer program is used to execute the information implantation area detection method provided by the embodiment of the present application.
  • the acquired video to be implanted is segmented to obtain video segments, and then the target frame is determined from each video segment, and objects in the target frame are identified and segmented to obtain all objects in the target frame Corresponding labeling information; then determine the target object according to the labeling information, cluster the target objects to obtain multiple candidate implanted regions; finally determine the target candidate implanted region from the candidate implanted regions, and compare the target candidates
  • the area to be implanted is searched for the largest rectangle to obtain the target area to be implanted.
  • FIG. 1 is a schematic diagram of a system architecture of a method for detecting an information implantation area provided by an embodiment of the application;
  • Figures 2A-2C are effect diagrams of desktop scene implantation in related technologies
  • FIG. 3 is a flowchart of a method for detecting an information implantation area provided by an embodiment of the application
  • FIG. 4 is a schematic diagram of a process of segmenting a video to be implanted according to an embodiment of the application
  • FIG. 5 is the object labeling information output by the instance segmentation model provided by the embodiment of this application.
  • FIG. 6 is a schematic flowchart of a mean shift processing for an object provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of the process of searching for the largest rectangle of the target candidate implanted area provided by an embodiment of the application.
  • FIG. 8A is a schematic diagram of a desktop structure before eliminating island noise areas according to an embodiment of the application.
  • FIG. 8B is a schematic diagram of the desktop structure after eliminating the island noise area according to an embodiment of the application.
  • FIG. 9 is a schematic diagram of the flow of advertisement placement provided by an embodiment of the application.
  • FIG. 10 is a block diagram of a detection device for an information implantation area provided by an embodiment of the application.
  • FIG. 11 is a schematic structural diagram of a computer system of an electronic device provided by an embodiment of the application.
  • Maximum rectangle search By finding adjacent pixels in a specific area with the same pixel value, the rectangle area with the largest area can be determined from the specific area; for example, the maximum rectangle search can be achieved in the following way:
  • any pixel in the target area to be implanted as a reference point, and search for adjacent pixels with the same pixel value based on the pixel value of the reference point; if so, use the adjacent pixel as the reference point, repeat In the above steps, until all adjacent pixels with the same pixel value are obtained, any pixel is used as a vertex, a rectangle is formed according to the vertex and the adjacent pixel, the area of the rectangle is calculated, and the target rectangle with the largest area is selected, And take the area corresponding to the target rectangle as the target area to be implanted.
  • FIG. 1 shows a schematic diagram of an exemplary system architecture of a method for detecting an information implantation area provided by an embodiment of the present application.
  • the system architecture 100 may include terminal devices (as shown in FIG. 1, one or more of the smart phone 101, the tablet computer 102, and the portable computer 103, of course, it may also be a desktop computer, etc.), a network 104 and server 105.
  • the network 104 is used as a medium for providing a communication link between the terminal device and the server 105.
  • the network 104 may include various connection types, such as wired communication links, wireless communication links, and so on.
  • the numbers of terminals, networks, and servers in FIG. 1 are merely illustrative. According to actual needs, there can be any number of terminals, networks and servers.
  • the server 105 may be a server cluster composed of multiple servers.
  • the terminal device 101 which may also be the terminal devices 102 and 103, sends a request for obtaining a video to be implanted to the server 105 via the network 104.
  • the request for obtaining a video to be implanted includes the video number of the video to be implanted.
  • the server 105 can The matched video to be implanted is sent to the terminal device 101 (or terminal devices 102, 103) through the network 104.
  • the terminal device After receiving the video to be implanted, the terminal device performs shot segmentation on the video to be implanted to obtain video scores. Pieces; then obtain the target frame from each video segment, and identify and segment the objects in the target frame to obtain the label information corresponding to all the objects.
  • the label information includes the classification information, confidence, and Mask and calibration frame, etc.; then the terminal device 101 determines the target object according to the labeling information, and clusters the target object to obtain multiple candidate regions to be implanted in the target object; finally, from the multiple candidate regions to be implanted Determine the target candidate to be implanted area, by performing the largest rectangular search on the target candidate to be implanted area, the target to be implanted area can be obtained, and the target to be implanted area is the information implantation area, which can be used for video advertising implantation. Into.
  • the method for detecting the information implantation area provided in the embodiments of this application can be executed by the terminal device. Accordingly, the identification device of the information implantation area can be set in the terminal device; In some embodiments, the method for detecting the information implantation area provided in the embodiments of this application may also be implemented by the server.
  • FIGS. 2A-2C show the effect of the desktop scene implantation. As shown in Figure 2A, it is the image in the original video.
  • the physical object is implanted into the advertising space, as shown in Figure 2B .
  • the embodiment of this application first proposes a detection method of the information implantation area.
  • the detection method of the information implantation area in the embodiment of the application can be used for video advertisement implantation, etc., as follows:
  • the video advertisement implantation is taken as an example to elaborate the implementation details of the technical solution of the embodiment of the present application:
  • FIG. 3 schematically shows a flowchart of a method for detecting an information implantation area provided by an embodiment of the present application.
  • the method for detecting an information implantation area may be executed by a terminal device, which may be the one shown in FIG. 1 Terminal Equipment.
  • the method for detecting the information implantation area at least includes steps S310 to S340, which are described in detail as follows:
  • step S310 a video to be implanted is obtained, and the video to be implanted is segmented to obtain video segments.
  • video advertisement implantation is a new technology system that uses computer vision technology to intelligently implant advertisements in videos that have already been produced (that is, videos to be implanted). Users can search for videos to be implanted online.
  • the video to be implanted can also be obtained from the video folder or video database of the terminal device 101.
  • the video to be implanted can be a video file of any format, such as avi., mp4., rmvb., etc., this application The embodiment does not limit this.
  • a request for acquiring a video to be implanted may be sent to the server 105 through the terminal device 101, and the request for acquiring a video to be implanted includes a video number of the video to be implanted.
  • the video number has any number format, such as a number.
  • the format number, English letters + numbers are numbered, etc.; after the server 105 parses the request for acquiring the implanted video to obtain the video number, it can match the video number with the video numbers of all videos stored in the server to Obtain the video to be implanted corresponding to the video number in the video to be implanted acquisition request; finally, the server 105 can return the matched video to be implanted to the terminal device 101, so that the terminal device 101 can make the advertising space in the video to be implanted Perform a search.
  • the video to be implanted after receiving the video to be implanted, can be segmented to obtain video segments constituting the video to be implanted.
  • the basic structure of a video is a hierarchical structure composed of frames, shots, scenes, and video programs.
  • the frame is a static image, which is the smallest logical unit that composes the video.
  • the sequence of consecutive frames in time is played continuously at equal intervals to form Dynamic video;
  • a lens is a sequence of frames continuously shot by a camera from power on to power off, depicting an event or a part of a scene, without or with weak semantic information, emphasizing the similarity of the visual content of the frame;
  • the scene is semantically related
  • the continuous shots of the same subject can be shot with different angles and different techniques, or a combination of shots with the same theme and event, emphasizing semantic relevance;
  • the video program contains a complete event or story as the highest level of video content Structure, which includes the composition relationship of the video and the summary, semantics and general description of the video.
  • the lens can be used as the processing unit, that is, each lens is regarded as a video segment.
  • the to-be-implanted video can be segmented into multiple pieces. Shots, and then obtain the video segments that constitute the video to be implanted.
  • the similarity algorithm can be used to identify the segmentation of the video to be implanted.
  • FIG. 4 shows a schematic flow chart of segmenting the video to be implanted. As shown in FIG. 4, in step S401, The target feature of the image frame is extracted from the video; in step S402, similarity algorithm recognition is performed on the target feature in adjacent image frames, and the video to be implanted is segmented according to the recognition result to obtain video segments.
  • each pixel in two adjacent image frames can be compared for similarity.
  • the similarity of the pixels is compared one by one, it takes a lot of resources.
  • the processing efficiency is low, so the target feature can be extracted from the video to be implanted.
  • the target feature can be a multi-bit feature in the image frame contained in the video to be implanted.
  • the similarity algorithm is performed on the target feature in the adjacent image frame Recognition to determine the boundary image frame of adjacent shots.
  • performing similarity algorithm recognition on target features in adjacent image frames in step S402, and segmenting the to-be-implanted video according to the recognition result to obtain video segments may include:
  • the distance can be Euclidean distance, cosine distance, etc., take Euclidean distance as an example to obtain target features in adjacent image frames After the Euclidean distance between them, the Euclidean distance can be compared with a preset distance threshold to determine the similarity between adjacent image frames. When the distance is less than the preset distance threshold, it is determined that the adjacent image frames belong to the same video segment; when the distance is greater than or equal to the preset distance threshold, it is determined that the adjacent image frames belong to different video segments.
  • step S320 a target frame in the video segment is acquired, and an object in the target frame is identified and segmented to obtain label information corresponding to the object.
  • when information is implanted it can be implanted in typical and representative frames in the shot. For example, implanting information in key frames or representative frames can improve the efficiency of implantation.
  • one or more target frames (which can be key frames of the video) can be determined from each video segment, and the objects in the target frame can be processed to obtain the target frame that can be used for advertising The implanted target area to be implanted.
  • an instance segmentation model may be used to identify and segment the objects in the target frame, so as to classify the objects in the target frame and obtain label information corresponding to each object.
  • the instance segmentation model can preprocess the input target frame, and perform convolution operation on the preprocessed target frame to extract features, and then obtain the feature image; then the feature image can be processed through the candidate region generation network, Acquire multiple candidate regions of interest, and classify and regress the multiple candidate regions of interest to obtain the target region of interest; then, the pixels of the target region of interest can be aligned with the pixels of the region in the target frame; Finally, operations such as classification, border regression, and mask generation can be performed on the target region of interest to obtain the label information corresponding to each object in the target frame.
  • the instance segmentation model may be a Mask R-CNN model, of course, it may also be other machine learning models that can identify, segment and identify objects in the target frame.
  • the label information corresponding to each object in the target frame can be obtained.
  • the label information can include the classification information, confidence, mask, and calibration frame of the object.
  • Figure 5 shows the instance segmentation.
  • the object annotation information output by the model is shown in Figure 5. There are desktops and cups placed on the desktop in the input target frame.
  • the desktop and the cup can be segmented through the instance segmentation model recognition segmentation, and use different color masks and
  • the label box is marked, the dark area is the mask corresponding to the desktop, the dashed box A is the label box corresponding to the desktop, and the corresponding classification information is table with a confidence of 0.990; the light area is the mask corresponding to the cup, the dotted line Box B is the label box corresponding to the cup, the corresponding classification information is cup, and the confidence level is 0.933.
  • a large number of image frames can be collected as training samples to train the instance segmentation model.
  • one or more image frames in the video can be used as training samples, and the training samples can be input To the instance segmentation model to be trained, by comparing the label information output by the model with the label information corresponding to the training sample, it is judged whether the model has completed the training.
  • step S330 a target object is determined according to the annotation information, and the target object is clustered to obtain multiple candidate regions to be implanted.
  • the target object after obtaining the label information of the object in the target frame, the target object can be obtained according to the label information. For example, if you want to implement advertisement implantation on the desktop, you can obtain the classification according to the classification information of the object in the label information. It is a table (desktop) object. For example, there are one or more desktops in the target frame, then the one or more desktops can be determined as the target object, and the target area to be implanted can be determined from the target object. Or select one of multiple desktops as the target object, and determine the target area to be implanted from the target object.
  • the label information For example, if you want to implement advertisement implantation on the desktop, you can obtain the classification according to the classification information of the object in the label information. It is a table (desktop) object. For example, there are one or more desktops in the target frame, then the one or more desktops can be determined as the target object, and the target area to be implanted can be determined from the target object. Or select one of multiple desktops
  • ad placement can only be carried out in a more conspicuous location, so that a target object can be determined from multiple objects to be implanted, and The target object is clustered to obtain multiple candidate regions to be implanted.
  • the target object for implantation can be determined from a plurality of objects to be implanted according to the area of the mask in the annotation information.
  • the mask with the largest area is selected.
  • the object to be implanted is the target object, for example, the desktop with the largest mask area is selected. This desktop is the desktop occupying the main position in the target frame. If the advertisement is implanted on the desktop, the user will be able to pay attention when watching the video To this advertisement, the user reach rate of the advertisement is increased.
  • the objects in the target frame can be filtered.
  • the confidence level corresponding to each object can be compared with a preset confidence threshold.
  • the preset confidence threshold can be set according to actual needs, for example, it can be set to 0.5, so that only all objects with confidence greater than 0.5 in the target frame can be retained.
  • the pixels in the target object can be clustered to obtain multiple candidate regions to be implanted.
  • the target object can be processed by mean shifting Clustering objects, FIG. 6 shows a schematic flow chart of the mean shift processing of the objects.
  • step S601 any pixel in the target object is used as the target point, and the The target point is the center of the circle, and the target range is determined according to the preset radius; in step S602, the mean offset vector is determined according to the distance vector between the target point and any pixel in the target range, and the mean offset vector is determined according to the mean value.
  • step S603 The offset vector moves the target point to the end point of the mean offset vector; in step S603, the end point is used as the target point, and steps S601-S602 are repeated until the position of the target point no longer changes
  • step S604 determine the pixel set according to the pixel points corresponding to the final target point and the pixel points within the preset radius; in step S605, obtain the distance between the pixel sets, and compare the distance with the preset A distance threshold is set for comparison, so as to determine the candidate region to be implanted according to the comparison result.
  • the following method can be performed:
  • the two pixel sets corresponding to the distance are respectively used as candidate regions to be implanted.
  • the target object By performing the mean shift processing on the target object, multiple connected regions in the target object can be obtained. These connected regions are candidate regions to be implanted.
  • the target object is a desktop, and the mean shift is performed for each pixel in the desktop.
  • the mean shift is performed for each pixel in the desktop.
  • multiple connected areas can be obtained, and these connected areas form a desktop. In actual implementation, all these connected areas can be used as candidate areas to be implanted in the desktop that can be used for advertisement implantation.
  • step S340 a target candidate to be implanted area is determined from the candidate to be implanted area, and a largest rectangle search is performed on the target candidate to be implanted area to obtain the target to be implanted area.
  • a plurality of candidate regions to be implanted after a plurality of candidate regions to be implanted are determined, they can be screened to determine the target candidate regions to be implanted; for example, those that do not contain non-target objects can be selected from the plurality of candidate regions to be implanted
  • Candidate areas to be implanted such as mobile phones, tea cups, vases and other objects on the desktop.
  • the candidate areas to be implanted that contain mobile phones, tea cups, vases and other objects can be discarded, and only the non-targets are kept
  • the candidate implantation area of the object then the area of the candidate implantation area that does not include the non-target object can be calculated, and the candidate implantation area with the largest area is selected as the target candidate implantation area. In this way, the core airspace of the target object with the optimal area and blank area can be obtained.
  • the shape of the target candidate to be implanted area may be irregular, if the advertisement is directly implanted in the target candidate to be implanted area, there may be missing information. Therefore, when the target is determined After the candidate area to be implanted, that is, the core airspace of the target object, the largest rectangular search for the target candidate area to be implanted is needed to determine the target area to be implanted for advertisement implantation.
  • FIG. 7 shows a schematic diagram of the process of searching for the largest rectangle of the target candidate to be implanted area.
  • the process of searching for the largest rectangle of the target candidate to be implanted includes:
  • step S701 any pixel point in the target area to be implanted is used as a reference point, and the pixel value of the reference point is used to find whether there are adjacent pixels having the same pixel value.
  • the pixel value of the pixel in the target area to be implanted is 0 or 1. If the pixel value of a certain pixel is 0, the pixel is used as the reference point to find the adjacent pixel with the pixel value of 0 If the pixel value of a certain pixel is 1, the pixel is used as the reference point to find a pixel with a pixel value of 1 adjacent to the pixel. If there is an adjacent pixel with a pixel value of 0.
  • step S702 when there are adjacent pixels with the same pixel value, using the adjacent pixels as the reference point, step S701 is repeated until all adjacent pixels with the same pixel value are obtained.
  • the adjacent pixel point is used as the reference point and continues to expand outward to determine whether there is an adjacent pixel point that has the same pixel value. After multiple judgments, all adjacent pixels with the same pixel value can be obtained.
  • step S703 use any one of the pixels as a vertex, and form a rectangle according to the vertex and the adjacent pixel.
  • a rectangle is formed according to the vertex and the neighboring pixels. It is worth noting that the pixels in the rectangle all have the same pixel value and do not include pixels with different pixel values.
  • the vertex can be a left vertex, a right vertex, etc., which is not specifically limited in the embodiment of the present application.
  • step S704 the area of the rectangle is calculated, the target rectangle with the largest area is selected, and the area corresponding to the target rectangle is used as the target area to be implanted.
  • step S703. Multiple rectangles can be obtained through step S703. In order to obtain the optimal information implantation area, it can be determined according to the area of the rectangle. In practical applications, the area of each rectangle can be calculated, and the target rectangle with the largest area can be selected from it, and The area corresponding to the target rectangle is used as the target area to be implanted for advertisement implantation.
  • island noise regions there may be island noise regions in the target area to be implanted. These island noise regions are mainly formed by the light and shadow of the target object. In order to avoid the influence of these island noise regions on the maximum rectangle search result, Before the largest rectangle search is performed on the target region to be implanted, the mean filter processing can be performed on the target region to be implanted to obtain a uniform and smooth target region to be implanted.
  • Figure 8A shows a schematic diagram of the desktop structure before eliminating the island noise area
  • Figure 8B shows a schematic diagram of the desktop structure after eliminating the island noise area.
  • FIG 8A there are some island noise areas in the core airspace of the desktop, as shown in the figure. After the average filtering process, these small black areas are eliminated, and the core airspace of the desktop becomes uniform and smooth, as shown in Figure 8B.
  • the method for detecting the information implantation area in the embodiment of the present application can be used to detect the area in the desktop that can be used for advertisement implantation, and can also be used to detect the advertisement implantation area in other desktop-like objects, for example, It can detect the advertising placement area for objects such as the cash register in the supermarket, the bench in the park, and the running belt of the treadmill.
  • Figure 9 shows a schematic diagram of the flow of advertisement placement, as shown in Figure 9:
  • step S901 the video to be implanted is segmented to obtain video segments, and a target frame is obtained from the video segments.
  • the advertisement placement scene of a certain brand of beverages corresponding to the supermarket checkout counter the target frame can be the scene of supermarket checkout.
  • step S902 the target frame is input to the instance segmentation model to obtain label information of all objects in the target frame.
  • step S903 the target object is determined according to the classification information of the object and the area of the corresponding mask.
  • the cashier counter corresponds to the largest mask area, so the cashier counter can be used as the target object.
  • step S904 mean shift processing is performed on the target object to obtain multiple candidate regions to be implanted.
  • step S905 a candidate region to be implanted that does not contain a non-target object with the largest area is obtained, and the candidate region to be implanted is used as the target candidate region to be implanted.
  • step S906 mean filtering is performed on the target candidate to be implanted region to obtain a uniform and smooth target candidate to be implanted region.
  • step S907 a largest rectangle search is performed in the target candidate region to be implanted to obtain the target region to be implanted.
  • the target area to be implanted is the airspace where advertisements can be implanted on the cashier counter.
  • the target area to be implanted can be the airspace near the display on the cashier counter, or the corner of the cashier counter near the entrance. The embodiment does not specifically limit this.
  • step S908 the advertisement is implanted into the target area to be implanted.
  • the advertisement can be a drink entity, or a 3D model containing a drink promotion poster, and so on.
  • the detection method of the information implantation area may also be executed by a server, which may be a server dedicated to data processing. Accordingly, An instance segmentation model is set in the server. After receiving the video to be implanted from the server 105, the terminal device 101 can send the video to be implanted to the server for data processing.
  • the The video to be implanted is segmented to obtain video segments; then the target frame in the video segment is identified and segmented to obtain the label information of the object in the target frame; then the target object is determined according to the label information, and the target object Perform clustering to obtain multiple candidate implanted regions; finally, determine the target candidate implanted region from the candidate implanted regions, and perform the largest rectangular search for the target implanted region to obtain the target implanted region , And then send the target area to be implanted to the terminal device 101 to implement video advertisement implantation.
  • the technical solution of the embodiment of the present application uses the instance segmentation model to identify and segment the object in the target frame, and combines the color block clustering in the mask and the largest rectangle search to determine the target area to be implanted from the target object, thereby realizing automatic
  • the detection time is 1.5 times longer than the original manual video time.
  • the detection method of the embodiment of the present application can compress the time to 0.2 times the video time length, which reduces labor costs on the one hand and increases
  • the detection accuracy of the artificial intelligence-based information implantation area detection method in the embodiment of the application can reach 0.91, which greatly improves the accuracy of advertising space detection and avoids different manual screening The situation where the identified ad slots are different.
  • the following describes the device embodiments of the present application, which can be configured to execute the information implantation area detection method in the foregoing embodiments of the present application.
  • the device embodiments of this application please refer to the above-mentioned embodiment of the detection method of the information implantation area in this application.
  • Fig. 10 schematically shows a block diagram of a device for detecting an information implantation area provided by an embodiment of the present application.
  • the information implantation area detection device 1000 provided by the embodiment of the present application includes: a shot segmentation module 1001, an object labeling module 1002, a clustering module 1003, and an area search module 1004.
  • the shot segmentation module 1001 is configured to obtain a video to be implanted, and segment the video to be implanted to obtain video segments;
  • the object labeling module 1002 is configured to obtain a target frame in the video segment, and to identify and segment the object in the target frame to obtain label information corresponding to the object;
  • the clustering module 1003 is configured to determine a target object according to the labeling information, and cluster pixels in the target object to obtain multiple candidate regions to be implanted;
  • the region search module 1004 is configured to determine a target candidate region to be implanted from the candidate region to be implanted, and perform a largest rectangular search on the target region to be implanted to obtain the target region to be implanted.
  • the detection device 1000 further includes:
  • a number sending module configured to send a request for acquiring a video to be implanted to the server, where the request for acquiring a video to be implanted includes a video number with the implanted video;
  • the video receiving module is configured to receive the to-be-implanted video corresponding to the video number returned by the server in response to the to-be-implanted video acquisition request.
  • the lens segmentation module 1001 includes:
  • a feature extraction unit configured to extract target features from the video to be implanted
  • the similarity recognition unit is configured to perform similarity algorithm recognition on adjacent image frames, and segment the to-be-implanted video according to the recognition result to obtain the video segment.
  • the object labeling module 1002 includes:
  • the model processing unit is configured to input the target frame into an instance segmentation model, and identify and segment the object in the target frame through the instance segmentation model to obtain the label information.
  • the model processing unit is configured to: preprocess the target frame through the instance segmentation model, and perform feature extraction on the preprocessed target frame to obtain a feature image; A plurality of candidate regions of interest are determined on the feature image, and the plurality of candidate regions of interest are classified and regressed to obtain a target region of interest; an alignment operation is performed on the target region of interest to align the target The pixels in the frame are aligned with the pixels of the target region of interest; classification, frame regression, and mask generation are performed on the target region of interest to obtain the label information of the object.
  • the annotation information includes classification information, confidence, mask, and calibration frame of the object.
  • the clustering module 1003 includes:
  • the target object determining unit is configured to determine the target object according to the classification information in the annotation information and the area of the mask;
  • the clustering unit is configured to perform mean shift processing on the target object, so as to cluster pixels in the target object, and obtain the multiple candidate regions to be implanted.
  • each video segment includes one or more target frames
  • the clustering module 1003 may also be configured to: compare the confidence levels of the objects contained in each target frame with a preset confidence threshold, and reserve the confidence levels in each target frame to be greater than the preset confidence. Then, delete the target frame that does not contain the object to be implanted, and the classification information of the object to be implanted is the same as that of the target object.
  • the clustering unit includes:
  • a range determining unit configured to use any pixel in the target object as a target point, with the target point as the center of the circle, and determine the target range according to a preset radius;
  • the moving unit is configured to determine a mean offset vector according to the distance vector between the target point and any pixel in the target range, and move the target point to the mean offset according to the mean offset vector The end of the shift vector;
  • a repeating unit configured to use the end point as the target point, and repeat the above steps until the position of the target point no longer changes;
  • a pixel set determining unit configured to determine a pixel set according to a pixel point corresponding to the final target point and a pixel point within the preset radius
  • the comparison unit is configured to obtain the distance between the pixel sets and compare the distance with a preset distance threshold to determine the candidate region to be implanted according to the comparison result.
  • the comparison unit is configured to: when the distance is less than or equal to the preset distance threshold, combine two sets of pixels corresponding to the distance to form the candidate region to be implanted When the distance is greater than the preset distance threshold, two pixel sets corresponding to the distance are respectively used as the candidate regions to be implanted.
  • the region search module 1004 is configured to: obtain candidate regions for implantation that do not contain non-target objects in the candidate region to be implanted; calculate the candidate regions for implantation that do not contain non-target objects The area of the candidate to be implanted with the largest area is used as the target candidate to be implanted area.
  • the detection device further includes: a filtering module configured to perform average filtering on the target region to be implanted to obtain a uniform and smooth target region to be implanted.
  • the area search module 1004 is configured to: use any pixel in the target area to be implanted as a reference point, and search for whether there are phases with the same pixel value according to the pixel value of the reference point. Adjacent pixel; if it exists, use the adjacent pixel as the reference point and repeat the above steps until all adjacent pixels with the same pixel value are obtained; use any one of the pixels as the vertex, according to the The vertices and the adjacent pixel points form a rectangle; the area of the rectangle is calculated, a target rectangle with the largest area is selected, and the area corresponding to the target rectangle is used as the target area to be implanted.
  • FIG. 11 is a schematic structural diagram of a computer system of an electronic device provided in an embodiment of the application, and the electronic device is used to implement the method for detecting an information implantation area provided in an embodiment of the application.
  • the computer system 1100 includes a central processing unit (CPU) 1101, which can be loaded into a random storage unit according to a program stored in a read-only memory (Read-Only Memory, ROM) 1102 or from a storage part 1108. Access to the program in the memory (Random Access Memory, RAM) 1103 to execute various appropriate actions and processing to implement the detection method of the information implantation area described in the foregoing embodiment. In RAM1103, various programs and data required for system operation are also stored.
  • the CPU 1101, the ROM 1102, and the RAM 1103 are connected to each other through a bus 1104.
  • An input/output (Input/Output, I/O) interface 1105 is also connected to the bus 1104.
  • the following components are connected to the I/O interface 1105: the input part 1106 including keyboard, mouse, etc.; including the output part 1107 such as cathode ray tube (Cathode Ray Tube, CRT), liquid crystal display (LCD), and speakers, etc. ; A storage part 1108 including a hard disk, etc.; and a communication part 1109 including a network interface card such as a local area network (LAN) card, a modem, and the like.
  • the communication section 1109 performs communication processing via a network such as the Internet.
  • the driver 1110 is also connected to the I/O interface 1105 as needed.
  • a removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 1110 as needed, so that the computer program read from it is installed into the storage portion 1108 as needed.
  • the process described below with reference to the flowchart can be implemented as a computer software program.
  • the embodiments of the present application include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication part 1109, and/or installed from the removable medium 1111.
  • CPU central processing unit
  • various functions defined in the system of the present application are executed.
  • the computer-readable medium shown in the embodiments of the present application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electric, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above.
  • Computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Erasable Programmable Read Only Memory (EPROM), flash memory, optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable of the above The combination.
  • the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein.
  • This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of code, and the above-mentioned module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram or flowchart, and the combination of blocks in the block diagram or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations, or can be It is realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present application can be implemented in software or hardware, and the described units can also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • the embodiments of the present application also provide a computer-readable medium.
  • the computer-readable medium may be included in the electronic device described in the foregoing embodiment; or it may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by an electronic device, the electronic device realizes the method for detecting the information implantation area provided in the embodiments of the present application.
  • the exemplary embodiments described herein can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) execute the method according to the embodiment of the present application.
  • a computing device which can be a personal computer, a server, a touch terminal, or a network device, etc.
  • a video to be implanted is acquired, and the video to be implanted is segmented to obtain video segments; the target frame in the video segment is acquired, and the object in the target frame is identified and Segmentation to obtain label information corresponding to the object; determine a target object according to the label information, and cluster the target object to obtain multiple candidate regions to be implanted; from the candidate region to be implanted Determine the target candidate to be implanted area in, and perform a largest rectangle search on the target candidate to be implanted area to obtain the target to be implanted area.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Marketing (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了一种信息植入区域的检测方法、装置、电子设备及计算机存储介质。该方法包括:获取待植入视频,对所述待植入视频进行切分以获取视频分片;获取所述视频分片中的目标帧,并对所述目标帧中的对象进行识别和分割,以获取与所述对象对应的标注信息;根据所述标注信息确定目标对象,并对所述目标对象进行聚类,以获取多个候选待植入区域;从所述候选待植入区域中确定目标候选待植入区域,并对所述目标候选待植入区域进行最大矩形搜索,以获取目标待植入区域。

Description

信息植入区域的检测方法、装置、电子设备及存储介质
相关申请的交叉引用
本申请基于申请号为201910578322.7、申请日为2019年06月28日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及计算机技术领域,尤其涉及一种信息植入区域的检测方法、信息植入区域的检测装置、电子设备及存储介质。
背景技术
随着信息电子化的逐步成熟,电子媒体广告逐渐成为主要的广告传播形式,以视频广告为例,视频广告可以分为Video-In和Video-Out两种形式,Video-In即为植入广告,是一种软广告形式,在视频内的桌面、墙面、相框和广告牌的位置植入屏幕或者实物广告;Video-Out是一种场景弹窗广告,基于对视频图像中汽车、人脸、目标和场景的理解,展示与视频内容相关的弹窗广告。
相关技术中,对于Video-In形式的视频广告,通常需要由专业设计人员全人工进行视频植入广告位的检索,导致耗费大量的人力和时间。
发明内容
本申请实施例提供了一种信息植入区域的检测方法、信息植入区域的检测装置、电子设备及存储介质,能够提高视频植入广告位的检测效率。
本申请实施例提供了一种信息植入区域的检测方法,包括:
获取待植入视频,对所述待植入视频进行切分以获取视频分片;
获取所述视频分片中的目标帧,并对所述目标帧中的对象进行识别和分割,以获取与所述对象对应的标注信息;
根据所述标注信息确定目标对象,并对所述目标对象进行聚类,以获取多个候选待植入区域;
从所述候选待植入区域中确定目标候选待植入区域,并对所述目标候选待植入区域进行最大矩形搜索,以获取目标待植入区域。
本申请实施例还提供了一种信息植入区域的检测装置,包括:
镜头切分模块,配置为获取待植入视频,对所述待植入视频进行切分以获取视频分片;
对象标注模块,配置为获取所述视频分片中的目标帧,并对所述目标帧中的对象进行识别和分割,以获取与所述对象对应的标注信息;
聚类模块,配置为根据所述标注信息确定目标对象,并对所述目标对象进行聚类,以获取多个候选待植入区域;
区域搜索模块,配置为从所述候选待植入区域中确定目标候选待植入区域,并对所述目标候选待植入区域进行最大矩形搜索,以获取目标待植入区域。
本申请实施例还提供了一种电子设备,包括:一个或多个处理器;存储装置,配置为存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,实现本申请实施例提供的上述信息植入区域的检测方法。
本申请实施例还提供了一种计算机存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序用于执行本申请实施例提供的所述信息植入区域的检测方法。
应用本申请实施例提供的基于病理图像的图像状态确定方法、装置、 电子设备及存储介质,至少具有以下有益技术效果:
首先对获取的待植入视频进行镜头切分以获取视频分片,接着从每个视频分片中确定目标帧,并对目标帧中的对象进行识别和分割,以获取与目标帧中所有对象对应的标注信息;然后根据标注信息确定目标对象,对目标对象进行聚类,以获取多个候选待植入区域;最后从候选待植入区域中确定目标候选待植入区域,并对目标候选待植入区域进行最大矩形搜索,以获取目标待植入区域。如此,能够实现自动检测视频中是否存在信息植入区域,避免了人工筛选标注,减少了人工成本;同时,能够大幅减少信息植入区域的检测时间,提高了视频广告的植入效率和精准度。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1为本申请实施例提供的信息植入区域的检测方法的系统架构的示意图;
图2A-2C为相关技术中桌面场景植入的效果图;
图3为本申请实施例提供的信息植入区域的检测方法流程图;
图4为本申请实施例提供的切分待植入视频的流程示意图;
图5为本申请实施例提供的实例分割模型输出的对象标注信息;
图6为本申请实施例提供的对对象进行均值偏移处理的流程示意图;
图7为本申请实施例提供的对目标候选待植入区域进行最大矩形搜索的流程示意图;
图8A为本申请实施例的消除孤岛噪声区域之前的桌面结构示意图;
图8B为本申请实施例的消除孤岛噪声区域之后的桌面结构示意图;
图9为本申请实施例提供的广告植入的流程示意图;
图10为本申请实施例提供的信息植入区域的检测装置的框图;
图11为本申请实施例提供的电子设备的计算机系统的结构示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本申请将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本申请的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请的各方面。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
对本发明实施例进行进一步详细说明之前,对本发明实施例中涉及的名词和术语进行说明,本发明实施例中涉及的名词和术语适用于如下的解 释。
最大矩形搜索:通过查找特定区域中像素值相同的相邻像素点,以从该特定区域中确定最大面积的矩形区域的方式;例如,可通过如下方式实现最大矩形搜索:
将目标待植入区域中的任一像素点作为基准点,根据基准点的像素值查找是否存在具有相同像素值的相邻像素点;若存在,则以该相邻像素点为基准点,重复上述步骤,直至获取所有具有相同像素值的相邻像素点,并将任一像素点作为顶点,根据顶点和相邻像素点形成矩形,计算所述矩形的面积,选取具有最大面积的目标矩形,并将目标矩形对应的区域作为目标待植入区域。
图1示出了本申请实施例提供的信息植入区域的检测方法的示例性系统架构的示意图。
如图1所示,系统架构100可以包括终端设备(如图1中所示智能手机101、平板电脑102和便携式计算机103中的一种或多种,当然也可以是台式计算机等等)、网络104和服务器105。网络104用以在终端设备和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线通信链路、无线通信链路等等。
应该理解,图1中的终端、网络和服务器的数目仅仅是示意性的。根据实际需要,可以具有任意数目的终端、网络和服务器。比如服务器105可以是多个服务器组成的服务器集群等。
在一些实施例中,终端设备101,也可以是终端设备102、103,通过网络104向服务器105发送待植入视频获取请求,该待植入视频获取请求包含待植入视频的视频编号,服务器105中存储有多个被编号的视频,通过将获取的视频编号与所有的视频编号进行匹配,即可获得与待植入视频获取请求中的视频编号对应的待植入视频,服务器105可以将 匹配到的待植入视频通过网络104发送至终端设备101(也可以是终端设备102、103),终端设备接收到待植入视频后,对该待植入视频进行镜头切分以获取视频分片;接着从每个视频分片中获取其中的目标帧,并对目标帧中的对象进行识别、分割,以获取其中所有对象对应的标注信息,该标注信息包括对象的分类信息、置信度、蒙版和标定框等;然后终端设备101根据标注信息确定目标对象,并对目标对象进行聚类,以获取目标对象中的多个候选待植入区域;最后从多个候选待植入区域中确定目标候选待植入区域,通过对该目标候选待植入区域进行最大矩形搜索,即可获取目标待植入区域,该目标待植入区域即为信息植入区域,可用于进行视频广告植入。
应用本申请上述实施例,一方面实现自动检测视频中是否存在信息植入区域,避免了人工筛选标注,减少了人工成本;另一方面,能够大幅减少信息植入区域的检测时间,提高了视频广告的植入效率;通过在目标待植入区域中进行最大矩形搜索,获取目标待植入区域,能够提高信息植入区域的检测精准度。
需要说明的是,在一些实施例中,本申请实施例所提供的信息植入区域的检测方法能够由终端设备执行,相应地,信息植入区域的识别装置可以设置于终端设备中;在另一些实施例中,也可以由服务器实施本申请实施例所提供的信息植入区域的检测方法。
在本领域的相关技术中,以采用Video-In的方式进行广告植入为例,为了在视频的内部植入实物、3D模型和平面广告等,一般会在桌面、墙面和相框位置进行植入,然而,相关技术中视频植入广告位检索是由专业设计人员全人工实现的,并没有实现自动化。图2A-2C示出了桌面场景植入的效果图,如图2A所示,为原视频中的图像,通过检测桌面中的广告位,将实物植入到广告位中,如图2B所示,也可以将3D模型和海 报同时植入到广告位中,如图2C所示。
但是,进行广告位植入机会的人工检索,一般而言,需要的时间为视频时长的1.5倍,对于视频广告植入方而言,这是非常耗时耗力的,严重影响了广告植入的效率,并且还会存在广告位的精准度低的缺陷,进一步影响了广告植入的效果。
鉴于相关技术中存在的问题,本申请实施例首先提出了一种信息植入区域的检测方法,本申请实施例中的信息植入区域的检测方法可以用于视频广告植入等等,以下以视频广告植入为例对本申请实施例的技术方案的实现细节进行详细阐述:
图3示意性示出了本申请实施例提供的信息植入区域的检测方法的流程图,该信息植入区域的检测方法可以由终端设备来执行,该终端设备可以是图1中所示的终端设备。参照图3所示,该信息植入区域的检测方法至少包括步骤S310至步骤S340,详细介绍如下:
在步骤S310中,获取待植入视频,对所述待植入视频进行切分,得到视频分片。
在一些实施例中,视频广告植入是在已经制作完成的视频(即是待植入视频)中利用计算机视觉技术智能植入广告的新型技术系统,用户可以通过在线搜索视频的方式获取待植入视频,也可以从终端设备101的视频文件夹或视频数据库中获取待植入视频,该待植入视频可以是任意格式的视频文件,如avi.,mp4.,rmvb.等等,本申请实施例对此不做限定。
在一些实施例中,可以通过终端设备101向服务器105发送待植入视频获取请求,该待植入视频获取请求包含待植入视频的视频编号,该视频编号具有任意的编号格式,例如以数字形式编号、英文字母+数字的形式进行编号,等等;服务器105对待植入视频获取请求进行解析得到 该视频编号后,可以将该视频编号与服务器中存储的所有视频的视频编号进行匹配,以获取与待植入视频获取请求中的视频编号对应的待植入视频;最后服务器105可以将匹配得到的待植入视频返回至终端设备101,以使终端设备101对待植入视频中的广告位进行检索。
在一些实施例中,接收到待植入视频后,可以对待植入视频进行切分,以获取构成待植入视频的视频分片。视频的基本结构是由帧、镜头、场景和视频节目构成的层次结构,其中帧是一幅静态图像,是组成视频的最小逻辑单元,将时间上连续的帧序列按等间隔连续播放,便形成动态视频;镜头是一台摄像机从开机到关机连续拍摄的帧序列,描绘一个事件或一个场面的一部分,不具有或具有较弱的语义信息,强调构成帧的视觉内容相似性;场景是语义相关的连续镜头,可以是相同对象的不同角度、不同技法拍摄,也可以是具有相同主题和事件的镜头组合,强调语义的相关性;视频节目包含一个完整的事件或故事,作为最高层的视频内容结构,它包括视频的组成关系以及对视频的摘要、语义和一般性描述等。为了有效识别信息植入区域,可以以镜头为处理单元,即将每个镜头作为一个视频分片,在实际应用中,可以对待植入视频进行镜头切分,以将待植入视频切分为多个镜头,进而获取构成待植入视频的视频分片。
在一些实施例中,可以通过相似度算法识别对待植入视频进行切分,图4示出了切分待植入视频的流程示意图,如图4所示,在步骤S401中,从待植入视频中提取图像帧的目标特征;在步骤S402中,对相邻图像帧中的目标特征进行相似度算法识别,并根据识别结果对待植入视频进行切分,以获取视频分片。
这里,在相似度算法识别时,可以将相邻两个图像帧中的每个像素进行相似度比对,但是由于图像中像素数量巨大,如果逐一比对像素的 相似度需要占用大量资源,数据处理效率低,因此可以从待植入视频中提取目标特征,该目标特征可以是待植入视频所包含的图像帧中的多位特征,通过对相邻图像帧中的目标特征进行相似度算法识别以确定相邻镜头的分界图像帧。
在一些实施例中,步骤S402中对相邻图像帧中的目标特征进行相似度算法识别,并根据识别结果对待植入视频进行切分,以获取视频分片,可以包括:
计算相邻图像帧中的目标特征之间的距离,并根据距离进行相似度算法识别;该距离可以是欧式距离、余弦距离等等,以欧式距离为例,获取相邻图像帧中的目标特征之间的欧式距离后,可以将该欧式距离与预设距离阈值进行比较,判断相邻图像帧之间的相似度。当距离小于预设距离阈值时,判定相邻图像帧属于同一个视频分片;当距离大于或等于预设距离阈值时,判定相邻图像帧属于不同的视频分片。
在步骤S320中,获取所述视频分片中的目标帧,并对所述目标帧中的对象进行识别和分割,以获取与所述对象对应的标注信息。
在一些实施例中,在进行信息植入时,可以在镜头中具有典型性、代表性的帧里面进行植入,例如在关键帧或代表帧中植入信息可以提高植入效率,因此在切分完待植入视频后,可以从各个视频分片中确定一个或多个目标帧(可以为视频的关键帧),通过对目标帧中的对象进行处理,以获取目标帧中可用于进行广告植入的目标待植入区域。
在一些实施例中,可以通过一个实例分割模型对目标帧中的对象进行识别和分割,用以对目标帧中的对象进行分类,获取与各个对象对应的标注信息。该实例分割模型可以对输入至其中的目标帧进行预处理,并对预处理后的目标帧进行卷积操作以提取特征,进而获取特征图像;接着可以通过候选区域生成网络对特征图像进行处理,获取多个候选感 兴趣区域,并对该多个候选感兴趣区域进行分类和回归,以获取目标感兴趣区域;然后可以将目标感兴趣区域的像素与目标帧中该区域的像素进行对准;最后可以对目标感兴趣区域进行分类、边框回归和掩码生成等操作,以获取与目标帧中各对象对应的标注信息。
在一些实施例中,该实例分割模型可以是Mask R-CNN模型,当然也可以是其它具有对目标帧中的对象进行识别分割并标识的机器学习模型。通过实例分割模型对目标帧进行处理后,可以得到目标帧中各个对象所对应的标注信息,该标注信息可以包括对象的分类信息、置信度、蒙版和标定框,图5示出了实例分割模型输出的对象标注信息,如图5所示,在输入的目标帧中存在桌面和放置在桌面上的杯子,通过实例分割模型识别分割可以将桌面和杯子进行分割,并用不同颜色的蒙版和标注框进行标注,其中深色区域为桌面对应的蒙版,虚线框A为桌面对应的标注框,其对应的分类信息为table,置信度为0.990;浅色区域为杯子对应的蒙版,虚线框B为杯子对应的标注框,其对应的分类信息为cup,置信度为0.933。
在通过实例分割模型对目标帧进行处理之前,可以采集大量的图像帧作为训练样本对该实例分割模型进行训练,例如可以将一个或多个视频中的图像帧作为训练样本,通过将训练样本输入至待训练的实例分割模型,通过将该模型输出的标注信息与训练样本对应的标注信息进行对比,判断模型是否完成训练。
在步骤S330中,根据所述标注信息确定目标对象,并对所述目标对象进行聚类,以获取多个候选待植入区域。
在一些实施例中,在获取目标帧中的对象的标注信息后,可以根据标注信息获取目标对象,例如想要在桌面上进行广告植入,那么可以根据标注信息中的对象的分类信息获取分类为table(桌面)的对象,例如 目标帧中存在一个或多个桌面,那么可以将该一个或多个桌面确定为目标对象,并从目标对象中确定目标待植入区域,也可以从该一个或多个桌面中选择一个作为目标对象,并从目标对象中确定目标待植入区域。
在实际实施时,为了提高广告植入的效果,并节省广告主的开销,可以仅在比较醒目的位置进行广告植入,这样就可以从多个待植入对象中确定一个目标对象,并对该目标对象进行聚类,以获取其中的多个候选待植入区域。
在一些实施例中,可以根据标注信息中的蒙版的面积大小,从多个待植入对象中确定用于植入的目标对象,在一些实施例中,选取面积最大的蒙版所对应的待植入对象作为目标对象,例如选取蒙版面积最大的桌面,该桌面即为目标帧中占据主要位置的桌面,如果在该桌面上进行广告植入,那么用户在观看视频时,便能够注意到该广告,提高了广告的用户触达率。
在一些实施例中,在根据标注信息获取目标对象之前,可以对目标帧中的对象进行筛选,在实际应用中,可以将各个对象对应的置信度与预设置信度阈值进行比较,当对象的置信度大于预设置信度阈值时,将该对象保留,删除置信度小于或等于预设置信度阈值的对象。该预设置信度阈值可以根据实际需要进行设定,例如可以设定为0.5,这样可以只保留目标帧中置信度大于0.5的所有对象。
在实际实施时,还可以对目标帧进行筛选,由于待植入视频可能包含多个视频分片,并且每个视频分片中可能包含一个或多个目标帧,但是并不是每个目标帧中都存在目标对象,因此可以删除不包含目标对象的目标帧,例如当确定目标对象为桌面时,那么可以删除不包含桌面的目标帧,只对包含桌面的目标帧中的广告位进行检测。
在一些实施例中,获取目标对象后,可以对目标对象中的像素点进 行聚类,以获取多个候选待植入区域,在实际应用中,可以通过对目标对象进行均值偏移处理,以对对象进行聚类,图6示出了对对象进行均值偏移处理的流程示意图,如图6所示,在步骤S601中,将所述目标对象中的任一像素点作为目标点,以所述目标点为圆心,根据预设半径确定目标范围;在步骤S602中,根据所述目标点与所述目标范围中任一像素点之间的距离向量确定均值偏移向量,并根据所述均值偏移向量将所述目标点移动至所述均值偏移向量的终点;在步骤S603中,将所述终点作为所述目标点,重复步骤S601-S602,直至所述目标点的位置不再变化;在步骤S604中,根据最终的目标点对应的像素点和所述预设半径范围内的像素点确定像素集合;在步骤S605中,获取像素集合之间的距离,并将所述距离与预设距离阈值进行比较,以根据比较结果确定所述候选待植入区域。
在根据像素集合之间的距离和预设距离阈值之间的比较结果确定候选待植入区域时,可以按照以下方式执行:
当像素集合之间的距离小于或等于预设距离阈值时,将与该距离对应的两个像素集合合并,以形成候选待植入区域;
当像素集合之间的距离大于预设距离阈值时,将与该距离对应的两个像素集合分别作为候选待植入区域。
通过对目标对象进行均值偏移处理,可以获取目标对象中的多个连通区域,这些连通区域即为候选待植入区域,例如目标对象为桌面,通过针对桌面中的每一个像素进行均值偏移处理,可以获取多个连通区域,该些连通区域组成桌面,在实际实施时,该些连通区域都可以作为桌面中可用于广告植入的候选待植入区域。
在步骤S340中,从所述候选待植入区域中确定目标候选待植入区域,并对所述目标候选待植入区域进行最大矩形搜索,以获取目标待植入区 域。
在一些实施例中,确定多个候选待植入区域后,可以对其进行筛选,以确定目标候选待植入区域;例如,可以从多个候选待植入区域中选取不包含非目标对象的候选待植入区域,例如桌面上有手机、茶杯、花瓶等物件,为了保证广告植入的效果,可以将包含手机、茶杯、花瓶等物件的候选待植入区域舍弃,只保留不包含非目标对象的候选待植入区域;接着可以计算不包含非目标对象的候选待植入区域的面积,并从中选取面积最大的候选待植入区域作为目标候选待植入区域。如此,可以获取面积最优且区域空白的目标对象的核心空域。
在一些实施例中,由于目标候选待植入区域的形状可能存在不规则的情况,如果直接在该目标候选待植入区域进行广告植入,可能会出现信息缺失的情况,因此在确定了目标候选待植入区域,即目标对象的核心空域之后,还需要对该目标候选待植入区域进行最大矩形搜索,才能确定用于广告植入的目标待植入区域。
图7示出了对目标候选待植入区域进行最大矩形搜索的流程示意图,如图7所示,目标候选待植入区域进行最大矩形搜索的流程包括:
在步骤S701中,将所述目标待植入区域中的任一像素点作为基准点,根据所述基准点的像素值查找是否存在具有相同像素值的相邻像素点。
例如目标待植入区域中像素的像素值为0或1,若某一像素点像素值为0,则以该像素点为基准点,寻找与该像素点相邻的具有像素值0的像素点,若某一像素点像素值为1,则以该像素点为基准点,寻找与该像素点相邻的具有像素值1的像素点若存在像素值为0的相邻像素点。
在步骤S702中,当存在具有相同像素值的相邻像素点时,以所述相邻像素点为所述基准点,重复步骤S701,直至获取所有具有相同像素值的相邻像素点。
例如存在与基准点相邻的像素点具有与基准点相同的像素值,则以该相邻像素点为基准点,继续向外扩展,判断是否存在与其具有相同像素值的相邻像素点,经过多次判断,即可获取所有具有相同像素值的相邻像素点。
在步骤S703中,将所述任一像素点作为顶点,根据所述顶点和所述相邻像素点形成矩形。
以首次作为基准点的像素点为顶点,根据该顶点和相邻像素点形成矩形,值得注意的是,该矩形中的像素点均具有相同的像素值,不包含具有不同像素值的像素点,并且该顶点可以是左顶点、右顶点等等,本申请实施例对此不做具体限定。
在步骤S704中,计算所述矩形的面积,选取具有最大面积的目标矩形,并将所述目标矩形对应的区域作为所述目标待植入区域。
通过步骤S703可以获取多个矩形,为了获取最优的信息植入区域,可以根据矩形的面积进行确定,在实际应用中,可以计算各个矩形的面积,从中选取具有最大面积的目标矩形,并将该目标矩形对应的区域作为目标待植入区域,用于进行广告植入。
在一些实施例中,在目标待植入区域中可能会存在孤岛噪声区域,这些孤岛噪声区域主要是由目标对象的光影突变形成的,为了避免该些孤岛噪声区域对最大矩形搜索结果的影响,可以在对目标待植入区域进行最大矩形搜索之前,对目标待植入区域进行均值滤波处理,以获取均匀光滑的目标待植入区域。
图8A示出了消除孤岛噪声区域之前的桌面结构示意图,图8B示出了消除孤岛噪声区域之后的桌面结构示意图,如图8A所示,在桌面的核心空域存在一些孤岛噪声区域,如图中的细小的黑色区域所示;经过均值滤波处理后,该些细小的黑色区域被消除了,桌面核心空域变得均匀 光滑,如图8B所示。
本申请实施例中的信息植入区域的检测方法,可以用于对桌面中可用于广告植入的区域的检测,也可以用于对其它类似桌面的对象中的广告植入区域进行检测,例如可以对超市的收银台、公园中的长椅、跑步机的跑带等对象进行广告植入区域的检测。
以对超市收银台进行某品牌饮品的广告植入为例,图9示出了广告植入的流程示意图,如图9所示:
在步骤S901中,对待植入视频进行切分获取视频分片,并从视频分片中获取目标帧。
这里,对应超市收银台进行某品牌饮品的广告植入场景,目标帧可以为超市结账的场景。
在步骤S902中,将目标帧输入至实例分割模型,以获取目标帧中所有对象的标注信息。
在步骤S903中,根据对象的分类信息及对应的蒙版的面积确定目标对象。
例如,在目标帧中收银台对应的蒙版面积最大,因此可以将收银台作为目标对象。
在步骤S904中,对目标对象进行均值偏移处理,以获取多个候选待植入区域。
在步骤S905中,获取面积最大的不包含非目标对象的候选待植入区域,并将该候选待植入区域作为目标候选待植入区域。
这里,收银台上可能会存在显示器、广告展示牌等物体,为了确定收银台中的核心空域,可以对多个候选待植入区域进行筛选以获取目标候选待植入区域。
在步骤S906中,对目标候选待植入区域进行均值滤波,以获取均匀 光滑的目标候选待植入区域。
在步骤S907中,在目标候选待植入区域中进行最大矩形搜索,以获取目标待植入区域。
这里,该目标待植入区域即为收银台上可以进行广告植入的空域,该目标待植入区域可以是收银台上靠近显示器的空域,也可以是收银台的靠近入口的角落,本申请实施例对此不做具体限定。
在步骤S908中,将广告植入到该目标待植入区域。
在植入某品牌饮品的广告时,该广告可以是饮品实体,也可以是包含有饮品宣传海报的3D模型,等等。
上述实施例描述了由终端设备执行信息植入区域的检测方法,同样的,该信息植入区域的检测方法还可以由服务器来执行,该服务器可以是专用于进行数据处理的服务器,相应地,在服务器中设置有实例分割模型,终端设备101接收到服务器105发送的待植入视频后,可以将该待植入视频发送至进行数据处理的服务器,服务器接收到待植入视频后,对该待植入视频进行镜头切分以获取视频分片;接着对视频分片中的目标帧进行识别和分割,以获取目标帧中对象的标注信息;然后根据标注信息确定目标对象,并对目标对象进行聚类,以获取多个候选待植入区域;最后从候选待植入区域中确定目标候选待植入区域,并对该目标待植入区域进行最大矩形搜索,以获取目标待植入区域,进而将目标待植入区域发送至终端设备101,以实现视频广告植入。
本申请实施例的技术方案通过实例分割模型对目标帧中的对象进行识别和分割,并结合蒙版内色块聚类和最大矩形搜索从目标对象中确定目标待植入区域,实现了自动化的植入广告位的检测,相对于原来人工需要的视频时长1.5倍的检测时间,本申请实施例的检测方法可以将时间压缩到0.2倍的视频时长,一方面减少了人工成本,另一方面提高了检测 效率;另外,在实际应用中,本申请实施例的基于人工智能的信息植入区域的检测方法的检测精准度可达0.91,大大提高了广告位检测的精准度,避免了不同人工筛选确定的广告位不同的情况。
以下介绍本申请的装置实施例,可以配置为执行本申请上述实施例中的信息植入区域的检测方法。对于本申请装置实施例中未披露的细节,请参照本申请上述的信息植入区域的检测方法的实施例。
图10示意性示出了本申请实施例提供的信息植入区域的检测装置的框图。
参照图10所示,本申请实施例提供的信息植入区域的检测装置1000,包括:镜头切分模块1001、对象标注模块1002、聚类模块1003和区域搜索模块1004。
其中,镜头切分模块1001,配置为获取待植入视频,对所述待植入视频进行切分以获取视频分片;
对象标注模块1002,配置为获取所述视频分片中的目标帧,并对所述目标帧中的对象进行识别和分割,以获取与所述对象对应的标注信息;
聚类模块1003,配置为根据所述标注信息确定目标对象,并对所述目标对象中的像素点进行聚类,以获取多个候选待植入区域;
区域搜索模块1004,配置为从所述候选待植入区域中确定目标候选待植入区域,并对所述目标待植入区域进行最大矩形搜索,以获取目标待植入区域。
在一些实施例中,所述检测装置1000还包括:
编号发送模块,配置为向服务器发送待植入视频获取请求,所述待植入视频获取请求包括带植入视频的视频编号;
视频接收模块,配置为接收所述服务器响应所述待植入视频获取请求返回的与所述视频编号对应的待植入视频。
在一些实施例中,所述镜头切分模块1001包括:
特征提取单元,配置为从所述待植入视频中提取目标特征;
相似度识别单元,配置为对相邻图像帧进行相似度算法识别,并根据识别结果对所述待植入视频进行切分,以获取所述视频分片。
在一些实施例中,所述对象标注模块1002包括:
模型处理单元,配置为将所述目标帧输入至实例分割模型中,通过所述实例分割模型对所述目标帧中的所述对象进行识别和分割,以获取所述标注信息。
在一些实施例中,所述模型处理单元配置为:通过所述实例分割模型对所述目标帧进行预处理,并对预处理后的所述目标帧进行特征提取,以获得特征图像;在所述特征图像上确定多个候选感兴趣区域,对所述多个候选感兴趣区域进行分类和回归,以获取目标感兴趣区域;对所述目标感兴趣区域进行对准操作,以将所述目标帧中的像素与所述目标感兴趣区域的像素对准;对所述目标感兴趣区域进行分类、边框回归和掩码生成,以获取所述对象的标注信息。
在一些实施例中,所述标注信息包括所述对象的分类信息、置信度、蒙版和标定框。
在一些实施例中,所述聚类模块1003包括:
目标对象确定单元,配置为根据所述标注信息中的分类信息及蒙版的面积确定所述目标对象;
聚类单元,配置为对所述目标对象进行均值偏移处理,以对所述目标对象中的像素点进行聚类,并获取所述多个候选待植入区域。
在一些实施例中,所述视频分片的数量为多个,各所述视频分片中包含一个或多个目标帧;
所述聚类模块1003还可以配置为:将各所述目标帧中所包含的对象 的置信度分别与预设置信度阈值进行比较,保留各所述目标帧中置信度大于所述预设置信度阈值的对象;接着,删除不包含待植入对象的目标帧,所述待植入对象与所述目标对象的分类信息相同。
在一些实施例中,所述聚类单元包括:
范围确定单元,配置为将所述目标对象中的任一像素点作为目标点,以所述目标点为圆心,根据预设半径确定目标范围;
移动单元,配置为根据所述目标点与所述目标范围中任一像素点之间的距离向量确定均值偏移向量,并根据所述均值偏移向量将所述目标点移动至所述均值偏移向量的终点;
重复单元,配置为将所述终点作为所述目标点,重复上述步骤,直至所述目标点的位置不再变化;
像素集合确定单元,配置为根据最终的目标点对应的像素点和所述预设半径范围内的像素点确定像素集合;
比较单元,配置为获取像素集合之间的距离,并将所述距离与预设距离阈值进行比较,以根据比较结果确定所述候选待植入区域。
在一些实施例中,所述比较单元配置为:当所述距离小于或等于所述预设距离阈值时,将与所述距离对应的两个像素集合合并,以形成所述候选待植入区域;当所述距离大于所述预设距离阈值时,将与所述距离对应的两个像素集合分别作为所述候选待植入区域。
在一些实施例中,所述区域搜索模块1004配置为:获取所述候选待植入区域中不包含非目标对象的候选待植入区域;计算所述不包含非目标对象的候选待植入区域的面积,并将面积最大的候选待植入区域作为所述目标候选待植入区域。
在一些实施例中,所述检测装置还包括:滤波模块,配置为对所述目标待植入区域进行均值滤波,以获取均匀光滑的目标待植入区域。
在一些实施例中,所述区域搜索模块1004配置为:将所述目标待植入区域中的任一像素点作为基准点,根据所述基准点的像素值查找是否存在具有相同像素值的相邻像素点;若存在,则以所述相邻像素点为所述基准点,重复上述步骤,直至获取所有具有相同像素值的相邻像素点;将所述任一像素点作为顶点,根据所述顶点和所述相邻像素点形成矩形;计算所述矩形的面积,选取具有最大面积的目标矩形,并将所述目标矩形对应的区域作为所述目标待植入区域。
图11为本申请实施例提供的电子设备的计算机系统的结构示意图,该电子设备用于实施本申请实施例提供的信息植入区域的检测方法。
需要说明的是,图11示出的电子设备的计算机系统1100仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图11所示,计算机系统1100包括中央处理单元(Central Processing Unit,CPU)1101,其可以根据存储在只读存储器(Read-Only Memory,ROM)1102中的程序或者从储存部分1108加载到随机访问存储器(Random Access Memory,RAM)1103中的程序而执行各种适当的动作和处理,实现上述实施例中所述的信息植入区域的检测方法。在RAM1103中,还存储有系统操作所需的各种程序和数据。CPU 1101、ROM 1102以及RAM 1103通过总线1104彼此相连。输入/输出(Input/Output,I/O)接口1105也连接至总线1104。
以下部件连接至I/O接口1105:包括键盘、鼠标等的输入部分1106;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分1107;包括硬盘等的存储部分1108;以及包括诸如局域网(Local Area Network,LAN)卡、调制解调器等的网络接口卡的通信部分1109。通信部分1109经由诸如因特网的网络执行通信处理。驱动器1110也根据需要连接至I/O接口1105。 可拆卸介质1111,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1110上,以便于从其上读出的计算机程序根据需要被安装入存储部分1108。
根据本申请的实施例,下文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分1109从网络上被下载和安装,和/或从可拆卸介质1111被安装。在该计算机程序被中央处理单元(CPU)1101执行时,执行本申请的系统中限定的各种功能。
需要说明的是,本申请实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请实施例中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请实施例中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读 的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。
本申请实施例还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该电子设备执行时,使得该电子设备实现本申请实施例提供的信息植入区域的检测方法。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若 干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、触控终端、或者网络设备等)执行根据本申请实施方式的方法。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。
工业实用性
本申请实施例中获取待植入视频,对所述待植入视频进行切分,得到视频分片;获取所述视频分片中的目标帧,并对所述目标帧中的对象进行识别和分割,以获取与所述对象对应的标注信息;根据所述标注信息确定目标对象,并对所述目标对象进行聚类,以获取多个候选待植入区域;从所述候选待植入区域中确定目标候选待植入区域,并对所述目标候选待植入区域进行最大矩形搜索,以获取目标待植入区域。如此,能够实现自动 检测视频中是否存在信息植入区域,避免了人工筛选标注,减少了人工成本;同时,能够大幅减少信息植入区域的检测时间,提高了视频广告的植入效率和精准度。

Claims (16)

  1. 一种信息植入区域的检测方法,所述方法由终端设备执行,所述方法包括:
    获取待植入视频,对所述待植入视频进行切分,得到视频分片;
    获取所述视频分片中的目标帧,并对所述目标帧中的对象进行识别和分割,以获取与所述对象对应的标注信息;
    根据所述标注信息确定目标对象,并对所述目标对象进行聚类,以获取多个候选待植入区域;
    从所述候选待植入区域中确定目标候选待植入区域,并对所述目标候选待植入区域进行最大矩形搜索,以获取目标待植入区域。
  2. 根据权利要求1所述的信息植入区域的检测方法,其中,所述获取待植入视频,包括:
    向服务器发送待植入视频获取请求,所述待植入视频获取请求包括待植入视频的视频编号;
    接收所述服务器返回的与所述视频编号对应的待植入视频。
  3. 根据权利要求1所述的信息植入区域的检测方法,其中,所述对所述待植入视频进行切分,得到视频分片,包括:
    从所述待植入视频中提取图像帧的目标特征;
    对相邻图像帧中的所述目标特征进行相似度识别,并根据识别结果对所述待植入视频进行切分,得到所述视频分片。
  4. 根据权利要求1所述的信息植入区域的检测方法,其中,所述对所述目标帧中的对象进行识别和分割,以获取与所述对象对应的标注信息,包括:
    将所述目标帧输入至实例分割模型中,通过所述实例分割模型对所述目标帧中的对象进行识别和分割,以获取与所述对象对应的所述标注 信息。
  5. 根据权利要求4所述的信息植入区域的检测方法,其中,所述通过所述实例分割模型对所述目标帧中的对象进行识别和分割,包括:
    通过所述实例分割模型对所述目标帧进行预处理,并对预处理后的所述目标帧进行特征提取,以获得特征图像;
    在所述特征图像上确定多个候选感兴趣区域,对所述多个候选感兴趣区域进行分类和回归,以获取目标感兴趣区域;
    对所述目标感兴趣区域进行对准操作,以将所述目标帧中的像素与所述目标感兴趣区域的像素对准;
    对所述目标感兴趣区域进行分类、边框回归和掩码生成,以获取与所述对象对应的所述对象的标注信息。
  6. 根据权利要求1至5任一项所述的信息植入区域的检测方法,所述标注信息包括所述对象的分类信息、置信度、蒙版和标定框。
  7. 根据权利要求6所述的信息植入区域的检测方法,其中,所述根据所述标注信息确定目标对象,并对所述目标对象进行聚类,以获取多个候选待植入区域,包括:
    根据所述标注信息中的分类信息及蒙版的面积确定所述目标对象;
    对所述目标对象进行均值偏移处理,以对所述目标对象进行聚类,并获取所述多个候选待植入区域。
  8. 根据权利要求7所述的信息植入区域的检测方法,其中,所述视频分片的数量为多个,各所述视频分片中包含一个或多个目标帧;
    在根据所述标注信息中的分类信息及蒙版的面积确定所述目标对象之前,所述方法还包括:
    将各所述目标帧中所包含的对象的置信度分别与预设置信度阈值进行比较,保留各所述目标帧中置信度大于所述预设置信度阈值的对象;
    删除不包含待植入对象的目标帧,所述待植入对象与所述目标对象的分类信息相同。
  9. 根据权利要求7所述的信息植入区域的检测方法,其中,对所述目标对象进行均值偏移处理,以对所述目标对象进行聚类,并获取所述多个候选待植入区域,包括:
    将所述目标对象中的任一像素点作为目标点,以所述目标点为圆心,根据预设半径确定目标范围;
    根据所述目标点与所述目标范围中任一像素点之间的距离向量,确定均值偏移向量,并根据所述均值偏移向量,将所述目标点移动至所述均值偏移向量的终点,并将所述终点作为所述目标点;
    重复上述步骤,直至所述目标点的位置不再变化;
    根据位置不再变化的目标点所对应的像素点和所述预设半径范围内的像素点,确定像素集合;
    获取像素集合之间的距离,并将所述距离与预设距离阈值进行比较,以根据比较结果确定所述候选待植入区域。
  10. 根据权利要求9所述的信息植入区域的检测方法,其中,将所述距离与预设距离阈值进行比较,以根据比较结果确定所述候选待植入区域,包括:
    当所述距离小于或等于所述预设距离阈值时,将与所述距离对应的两个像素集合合并,以形成所述候选待植入区域;
    当所述距离大于所述预设距离阈值时,将与所述距离对应的两个像素集合分别作为所述候选待植入区域。
  11. 根据权利要求1所述的信息植入区域的检测方法,其中,从所述候选待植入区域中确定目标候选待植入区域,包括:
    获取所述候选待植入区域中不包含非目标对象的候选待植入区域;
    计算所述不包含非目标对象的候选待植入区域的面积,并将面积最大的候选待植入区域作为所述目标候选待植入区域。
  12. 根据权利要求1所述的信息植入区域的检测方法,其中,所述对所述目标待植入区域进行最大矩形搜索之前,所述方法还包括:
    对所述目标待植入区域进行均值滤波,以获取均匀光滑的目标待植入区域。
  13. 根据权利要求12所述的信息植入区域的检测方法,其中,对所述目标待植入区域进行最大矩形搜索,以获取目标待植入区域,包括:
    将所述目标待植入区域中的任一像素点作为基准点,根据所述基准点的像素值查找是否存在具有相同像素值的相邻像素点;
    若存在,则以所述相邻像素点为所述基准点,重复上述步骤,直至获取所有具有相同像素值的相邻像素点;
    将所述任一像素点作为顶点,根据所述顶点和所述相邻像素点形成矩形;
    计算所述矩形的面积,选取具有最大面积的目标矩形,并将所述目标矩形对应的区域作为所述目标待植入区域。
  14. 一种信息植入区域的检测装置,所述装置包括:
    镜头切分模块,配置为获取待植入视频,对所述待植入视频进行切分,得到视频分片;
    对象标注模块,配置为获取所述视频分片中的目标帧,并对所述目标帧中的对象进行识别和分割,以获取与所述对象对应的标注信息;
    聚类模块,配置为根据所述标注信息确定目标对象,并对所述目标对象进行聚类,以获取多个候选待植入区域;
    区域搜索模块,配置为从所述候选待植入区域中确定目标候选待植入区域,并对所述目标待植入区域进行最大矩形搜索,以获取目标待植 入区域。
  15. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,配置为存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,实现如权利要求1至13中任一项所述的信息植入区域的检测方法。
  16. 一种计算机存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序用于执行权利要求1至13任一项所述的信息植入区域的检测方法。
PCT/CN2020/097782 2019-06-28 2020-06-23 信息植入区域的检测方法、装置、电子设备及存储介质 WO2020259510A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/370,764 US20210406549A1 (en) 2019-06-28 2021-07-08 Method and apparatus for detecting information insertion region, electronic device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910578322.7 2019-06-28
CN201910578322.7A CN112153483B (zh) 2019-06-28 2019-06-28 信息植入区域的检测方法、装置及电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/370,764 Continuation US20210406549A1 (en) 2019-06-28 2021-07-08 Method and apparatus for detecting information insertion region, electronic device, and storage medium

Publications (1)

Publication Number Publication Date
WO2020259510A1 true WO2020259510A1 (zh) 2020-12-30

Family

ID=73891682

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097782 WO2020259510A1 (zh) 2019-06-28 2020-06-23 信息植入区域的检测方法、装置、电子设备及存储介质

Country Status (3)

Country Link
US (1) US20210406549A1 (zh)
CN (1) CN112153483B (zh)
WO (1) WO2020259510A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269125A (zh) * 2021-06-10 2021-08-17 北京中科闻歌科技股份有限公司 一种人脸识别方法、装置、设备及存储介质
WO2022260803A1 (en) * 2021-06-08 2022-12-15 Microsoft Technology Licensing, Llc Target region extraction for digital content addition

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112752151B (zh) * 2020-12-30 2022-09-20 湖南快乐阳光互动娱乐传媒有限公司 一种动态广告植入位置的检测方法及装置
CN113676775A (zh) * 2021-08-27 2021-11-19 苏州因塞德信息科技有限公司 一种利用人工智能在视频和游戏中进行广告植入的方法
CN113691835B (zh) * 2021-10-21 2022-01-21 星河视效科技(北京)有限公司 视频植入方法、装置、设备及计算机可读存储介质
CN114449346B (zh) * 2022-02-14 2023-08-15 腾讯科技(深圳)有限公司 视频处理方法、装置、设备以及存储介质
CN118042217A (zh) * 2022-11-14 2024-05-14 北京字跳网络技术有限公司 一种视频处理方法、装置、电子设备和存储介质
CN116761037B (zh) * 2023-08-23 2023-11-03 星河视效科技(北京)有限公司 视频植入多媒体信息的方法、装置、设备及介质
CN116939293B (zh) * 2023-09-17 2023-11-17 世优(北京)科技有限公司 植入位置的检测方法、装置、存储介质及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101621636A (zh) * 2008-06-30 2010-01-06 北京大学 基于视觉注意力模型的广告标志插入和变换方法及系统
US20160133027A1 (en) * 2014-11-12 2016-05-12 Ricoh Company, Ltd. Method and apparatus for separating foreground image, and non-transitory computer-readable recording medium
US20170358092A1 (en) * 2016-06-09 2017-12-14 Lytro, Inc. Multi-view scene segmentation and propagation
CN109168034A (zh) * 2018-08-28 2019-01-08 百度在线网络技术(北京)有限公司 商品信息显示方法、装置、电子设备和可读存储介质
CN109302619A (zh) * 2018-09-18 2019-02-01 北京奇艺世纪科技有限公司 一种信息处理方法及装置
CN110458820A (zh) * 2019-08-06 2019-11-15 腾讯科技(深圳)有限公司 一种多媒体信息植入方法、装置、设备及存储介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4253335B2 (ja) * 2006-07-13 2009-04-08 株式会社東芝 カーネル関数値を用いた、画像の平均値シフトによるフィルタリングとクラスタリングの方法及び装置
WO2009012659A1 (en) * 2007-07-26 2009-01-29 Omron Corporation Digital image processing and enhancing system and method with function of removing noise
US8561106B1 (en) * 2007-12-21 2013-10-15 Google Inc. Video advertisement placement
US20120180084A1 (en) * 2011-01-12 2012-07-12 Futurewei Technologies, Inc. Method and Apparatus for Video Insertion
US9311550B2 (en) * 2013-03-06 2016-04-12 Samsung Electronics Co., Ltd. Device and method for image processing
CN105303163B (zh) * 2015-09-22 2019-03-01 深圳市华尊科技股份有限公司 一种目标检测的方法及检测装置
WO2017165538A1 (en) * 2016-03-22 2017-09-28 Uru, Inc. Apparatus, systems, and methods for integrating digital media content into other digital media content
CN107343225B (zh) * 2016-08-19 2019-04-09 北京市商汤科技开发有限公司 在视频图像中展示业务对象的方法、装置和终端设备
US10575033B2 (en) * 2017-09-05 2020-02-25 Adobe Inc. Injecting targeted ads into videos
CN107818302A (zh) * 2017-10-20 2018-03-20 中国科学院光电技术研究所 基于卷积神经网络的非刚性多尺度物体检测方法
CN109325502B (zh) * 2018-08-20 2022-06-10 杨学霖 基于视频渐进区域提取的共享单车停放检测方法和系统
CN109308456B (zh) * 2018-08-31 2021-06-08 北京字节跳动网络技术有限公司 目标对象的信息确定方法、装置、设备及存储介质
CN109886130B (zh) * 2019-01-24 2021-05-28 上海媒智科技有限公司 目标对象的确定方法、装置、存储介质和处理器

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101621636A (zh) * 2008-06-30 2010-01-06 北京大学 基于视觉注意力模型的广告标志插入和变换方法及系统
US20160133027A1 (en) * 2014-11-12 2016-05-12 Ricoh Company, Ltd. Method and apparatus for separating foreground image, and non-transitory computer-readable recording medium
US20170358092A1 (en) * 2016-06-09 2017-12-14 Lytro, Inc. Multi-view scene segmentation and propagation
CN109168034A (zh) * 2018-08-28 2019-01-08 百度在线网络技术(北京)有限公司 商品信息显示方法、装置、电子设备和可读存储介质
CN109302619A (zh) * 2018-09-18 2019-02-01 北京奇艺世纪科技有限公司 一种信息处理方法及装置
CN110458820A (zh) * 2019-08-06 2019-11-15 腾讯科技(深圳)有限公司 一种多媒体信息植入方法、装置、设备及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022260803A1 (en) * 2021-06-08 2022-12-15 Microsoft Technology Licensing, Llc Target region extraction for digital content addition
CN113269125A (zh) * 2021-06-10 2021-08-17 北京中科闻歌科技股份有限公司 一种人脸识别方法、装置、设备及存储介质
CN113269125B (zh) * 2021-06-10 2024-05-14 北京中科闻歌科技股份有限公司 一种人脸识别方法、装置、设备及存储介质

Also Published As

Publication number Publication date
US20210406549A1 (en) 2021-12-30
CN112153483B (zh) 2022-05-13
CN112153483A (zh) 2020-12-29

Similar Documents

Publication Publication Date Title
WO2020259510A1 (zh) 信息植入区域的检测方法、装置、电子设备及存储介质
US10522186B2 (en) Apparatus, systems, and methods for integrating digital media content
CN112101075B (zh) 信息植入区域的识别方法、装置、存储介质及电子设备
US9271035B2 (en) Detecting key roles and their relationships from video
RU2494566C2 (ru) Устройство и способ управления отображением
CN110390033B (zh) 图像分类模型的训练方法、装置、电子设备及存储介质
CN113010703B (zh) 一种信息推荐方法、装置、电子设备和存储介质
CN101489139B (zh) 基于视觉显著度的视频广告关联方法与系统
US10248865B2 (en) Identifying presentation styles of educational videos
Saba et al. Analysis of vision based systems to detect real time goal events in soccer videos
Chen et al. Visual storylines: Semantic visualization of movie sequence
CN113766330A (zh) 基于视频生成推荐信息的方法和装置
CN111314732A (zh) 确定视频标签的方法、服务器及存储介质
CN113761253A (zh) 视频标签确定方法、装置、设备及存储介质
CN112733666A (zh) 一种难例图像的搜集、及模型训练方法、设备及存储介质
CN113435438B (zh) 一种图像和字幕融合的视频报幕板提取及视频切分方法
CN112925905B (zh) 提取视频字幕的方法、装置、电子设备和存储介质
Jin et al. Network video summarization based on key frame extraction via superpixel segmentation
CN111581435B (zh) 一种视频封面图像生成方法、装置、电子设备及存储介质
Baber et al. Video segmentation into scenes using entropy and SURF
Subudhi et al. Automatic lecture video skimming using shot categorization and contrast based features
Kim et al. Automatic color scheme extraction from movies
CN110414471B (zh) 基于双模型的视频识别方法及系统
Li et al. An integration text extraction approach in video frame
Paliwal et al. A survey on various text detection and extraction techniques from videos and images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20833131

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20833131

Country of ref document: EP

Kind code of ref document: A1