CN114051159A

CN114051159A - Video image processing method and device and terminal equipment

Info

Publication number: CN114051159A
Application number: CN202111315145.7A
Authority: CN
Inventors: 栾青
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2016-08-19
Filing date: 2016-08-19
Publication date: 2022-02-15
Also published as: CN107770603A; CN107770603B

Abstract

The embodiment of the invention provides a video image processing method, a video image processing device and terminal equipment. The video image processing method comprises the following steps: acquiring a video image of video content and display information of a business object to be drawn, wherein the display information of the business object comprises information of a drawing position of the business object in the video image; and drawing the business object at the drawing position in the video image by adopting a computer drawing mode, and setting business content link for the drawn business object. The embodiment of the invention is beneficial to saving network resources and/or system resources of the client, and can enrich the content of the video picture and have the effect of augmented reality by adding and drawing the service object in the video picture and showing the service object; in addition, related service content links are set for the drawn service objects, an interaction mode with high fusion degree is provided for adding the drawn service objects, and watching and interaction effects of video contents are improved.

Description

Video image processing method and device and terminal equipment

The present application is a divisional application of an invention patent application having an application number of 201610697461.8, an application date of 2016, 8, and 19, and an invention name of "video image processing method, apparatus, and terminal device".

Technical Field

The embodiment of the invention relates to a video image processing technology, in particular to a video image processing method, a video image processing device and terminal equipment.

Background

With the development of internet technology, people increasingly use the internet to watch videos, and therefore, the internet videos provide business opportunities for many new services. Internet video is considered a premium resource for advertisement placement because it can become an important traffic portal.

The existing video advertisements are mainly inserted for a fixed time at a certain time of video playing in an implantation mode, or placed at fixed positions in a video playing area and a peripheral area thereof.

On one hand, however, the video advertisement mode not only occupies network resources, but also occupies system resources of the client; on the other hand, the video advertisement mode often disturbs the normal video watching experience of audiences, causes the audiences to feel dislike, and cannot achieve the expected advertisement effect.

Disclosure of Invention

The embodiment of the invention aims to provide a video image processing method, a video image processing device and terminal equipment, so as to add and draw the graphic data related to the service with the interactive function in a dynamic video page.

According to an aspect of the embodiments of the present invention, there is provided a video image processing method, including: acquiring a video image of video content and display information of a business object to be drawn, wherein the display information of the business object comprises information of a drawing position of the business object in the video image; and drawing the business object at the drawing position in the video image by adopting a computer drawing mode, and setting business content link for the drawn business object.

According to another aspect of the embodiments of the present invention, there is provided a video image processing apparatus including: the data acquisition unit is used for acquiring a video image of video content and display information of a service object to be drawn, wherein the display information of the service object comprises information of a drawing position of the service object in the video image; and the drawing unit is used for drawing the business object at the drawing position in the video image in a computer drawing mode and setting business content link for the drawn business object.

According to another aspect of the embodiments of the present invention, a terminal device is provided, which includes one or more processors, a memory, a communication interface, and a communication bus, where the one or more processors, the memory, and the communication interface complete communication with each other through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to any one of the video image processing methods.

According to the video image processing scheme provided by the embodiment of the invention, the content of the video picture can be enriched by adding the drawing service object in the video picture and displaying the drawing service object, so that the video image processing scheme has an augmented reality effect; in addition, related service content links are set for the drawn service objects, the user can further obtain and display the service contents related to the service objects through the operation of adding the drawn service objects, an interactive function with high fusion degree with video pictures is provided for the user, the normal video watching experience of a viewer is not influenced, the dislike of the viewer is not easily caused, and the watching and interactive effects of the video contents are improved. When the service object configured with the service content link is used for displaying the advertisement, compared with the traditional video advertisement mode, the service object is combined with video playing, additional advertisement video data irrelevant to the video does not need to be transmitted through a network, so that network resources and/or system resources of a client side are saved, an interaction function with high fusion degree with a video picture is provided for a user, and watching and interaction effects of the video content are improved.

Drawings

Fig. 1 is a flowchart illustrating a video image processing method according to a first embodiment of the present invention;

fig. 2 shows a flow chart of a video image processing method according to a second embodiment of the invention;

fig. 3 is a block diagram showing a configuration of a video image processing apparatus according to a third embodiment of the present invention;

fig. 4 is a block diagram showing a configuration of a video image processing apparatus according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of a terminal device according to a fifth embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the invention is provided in conjunction with the accompanying drawings (like numerals indicate like elements throughout the several views) and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present invention are used merely to distinguish one element, step, device, module, or the like from another element, and do not denote any particular technical or logical order therebetween.

Example one

Fig. 1 is a flowchart illustrating a video image processing method according to a first embodiment of the present invention. The method may be performed by an apparatus such as shown in fig. 5 or in a terminal device comprising the apparatus.

Referring to fig. 1, in step S110, a video image of a video content and display information of a service object to be rendered are obtained, where the display information of the service object includes information of a rendering position of the service object in the video image.

Here, the video content may be video content to be played, such as on-demand video content or live video content. The video image may be, for example, a video image with played video content received from another terminal device, or a video image decoded from video content downloaded in advance. Optionally, the video image and the display information of the service object to be drawn are obtained from a video code stream of the video content.

For example, the method can be connected with a video website, receive a video code stream of specified on-demand or live video content, decode the video code stream, and acquire video images and display information of a service object. That is, according to the present embodiment, a provider providing video content needs to provide a video image and presentation information related to the video image together, for example, the video image and the presentation information are encoded into a video code stream.

According to an optional implementation manner of the present invention, the video code stream is a live video stream based on an H264 standard with a strong compression capability, and the presentation information is carried in a network abstraction layer unit of the live video stream to obtain a video image and the presentation information from the live video stream.

According to the embodiment of the invention, the business object to be drawn is an object to be added and drawn in the video image so as to enrich the content of the video picture. For this purpose, information of the drawing position of the business object in the video image needs to be acquired.

Here, the business object to be drawn may be a special effect or an advertisement sticker having semantics. Specifically, the business object may include, but is not limited to, at least one of the following forms of special effects containing advertising information: two-dimensional paster special effects, three-dimensional special effects, particle special effects and the like. It is to be understood that the present invention is not limited to an application scenario in which special effects or advertisement stickers are applied as business objects, but can be applied to any case where drawing image data is added. In order to fuse the service object in the video picture of the video image, information of a drawing position of the service object in the video image needs to be acquired. In an optional implementation, presentation information including information of a drawing position of a business object in the video image may be acquired from a transport stream of video content; in another alternative embodiment, the rendering position of the business object in the video image may be determined. There are various ways to detect the drawing position of the business object from the video image, and an exemplary embodiment of detecting the drawing position of the business object from the video image will be described later.

In step S120, the business object is drawn at a drawing position in the video image in a computer drawing manner, and a business content link is set for the drawn business object.

In order to continuously present a rendered business object in a video frame, the business object may be rendered using image data of a sequence of frames. Specifically, the video images of the video content also have a time sequence, and thus frame data synchronized with the current video image can be acquired from the image data of the frame sequence according to the sequence number or time offset value of the acquired video image.

According to an optional implementation mode of the invention, the image data of the business object can be downloaded from the server side in advance, and the image data can be stored in a designated folder of the local computer. To render the business object, the image data can be read from the designated folder.

According to another optional embodiment of the present invention, the presentation information acquired at step S210 may further include location information or identification information of the business object. The location information may be, but is not limited to, storage location information in which image data of the business object is stored, such as information specifying a folder or a uniform resource identifier (URL) that acquires the image data. The identification information may be, but is not limited to, a file identification (such as a file name) or a resource label, etc. storing the image data.

Correspondingly, the video image processing method may further include acquiring image data of the business object according to the position information or the identification information of the business object, so as to draw the business object.

Specifically, frame data synchronized with the video image is acquired from image data of the business object, and the frame data is drawn at the drawing position in a computer drawing manner, so that a video picture with the drawn business object is displayed in the drawing process.

The business object may be rendered by an applicable graphics image rendering or rendering method, including, but not limited to, rendering based on an OpenGL graphics rendering engine. OpenGL defines a specialized graphical program interface with a cross-programming language, cross-platform programming interface specification, which is hardware-independent and can conveniently render 2D or 3D graphical images. By OpenGL, not only can 2D effects such as 2D stickers or special effects be realized, but also 3D special effects and particle special effects can be realized.

For the case that the business object is a sticker (e.g., an advertisement sticker), when the business object is drawn, the relevant information of the business object, such as the identification and size of the business object, may be obtained first. After the drawing position is determined, the service object may be adjusted by scaling, rotating, and the like according to the coordinates of the area where the drawing position is located (e.g., a rectangular area of the drawing position), and then drawn in a corresponding drawing manner, such as an OpenGL manner, so that the video picture with the drawn service object is displayed. In some cases, the advertisement may also be displayed in a three-dimensional special effect form, such as a text or LOGO displayed by a particle special effect manner.

It should be noted that as internet live broadcasting rises, more and more videos appear in a live broadcasting manner. The video has the characteristics of simple scene, real time, small video image size and the like because audiences mainly watch the video on mobile terminals such as mobile phones and the like. In this case, for the delivery of some business objects such as advertisement delivery, on one hand, since the screen display area of the mobile terminal is limited, if the advertisement is placed at a traditional fixed position, the mobile terminal occupies a main user experience area, which not only easily causes the user to feel dislike, but also may cause the live broadcaster to lose audience; on the other hand, for the live broadcast application of the anchor type, due to the instantaneity of live broadcast, the traditional advertisement inserted with fixed time length can obviously disturb the continuity of communication between the user and the anchor, and the watching experience of the user is influenced; on the other hand, since the duration of the live content is inherently short, it is difficult to insert a fixed duration advertisement in a conventional manner. And the advertisement is put in through the business object, the advertisement putting and the live video content are effectively fused, the mode is flexible, the effect is vivid, the live broadcast watching experience of the user is not influenced, and the advertisement putting effect is improved. The method is particularly suitable for scenes such as business object display, advertisement putting and the like by using a small display screen.

In addition, in step S120, in addition to rendering the business object, a business content link is set for the rendered business object.

The business content link is a link to business content associated with the business object. For example, assuming that the service object is a beverage bottle of a brand of sports drink, the corresponding service content link may be a link of an e-commerce selling the brand of sports drink or a link of a page on the e-commerce website selling the brand of sports drink. Through the link, the user may access the e-commerce website or a page that sells the brand sports drink. For another example, assuming that the service object is a book in the video image, the corresponding service content link may be a link of an encyclopedia page of the book, or a link of a page on which the book is sold on an e-commerce website.

Here, the service content link of the specified web page may be configured in advance for the service object to be rendered, or the service content link downloaded in advance may be read from a hard disk, a memory card, or a memory of the local device. And setting a business content link for the business object by interactively associating the drawn business object with the business content link.

Furthermore, according to an alternative embodiment of the present invention, the business content link may be stored together with the image data of the business object. In this case, the service content link related to the service object may be obtained according to the location information or the identification information of the service object in the presentation information.

According to another optional embodiment of the present invention, the presentation information further comprises a service content link associated with the service object. Correspondingly, the video image processing method further comprises the following steps: and acquiring the business content link related to the business object from the display information.

Specifically, in the process of setting a service content link for the drawn service object, first, a link trigger area of the service content link may be set according to an area occupied by the service object in a video image. The link trigger area may be an area associated with an area occupied by a business object in the video image, such as a circumscribed or inscribed rectangular area of a drawn beverage bottle; the link trigger area may be defined by four point coordinates of the circumscribed or inscribed rectangle. And then interactively associating the operation of the user in the link trigger area with the corresponding business content link.

In the process of playing the video image which is drawn with the service object and has the interactive function on the video playing interface, when a user clicks the service object drawn on the video playing interface, for example, the corresponding service content can be acquired and displayed through the service content link.

By the video image processing method provided by the embodiment, the service objects can be added and drawn in the video picture and displayed, so that the content of the video picture is enriched, and the effect of augmented reality is achieved; in addition, related service content links are set for the drawn service objects, the user can further obtain and display the service contents related to the service objects through the operation of adding the drawn service objects, an interactive function with high fusion degree with video pictures is provided for the user, the normal video watching experience of a viewer is not influenced, the dislike of the viewer is not easily caused, and the watching and interactive effects of the video contents are improved. When the service object configured with the service content link is used for displaying the advertisement, compared with the traditional video advertisement mode, the service object is combined with video playing, additional advertisement video data irrelevant to the video does not need to be transmitted through a network, so that network resources and/or system resources of a client side are saved, an interaction function with high fusion degree with a video picture is provided for a user, and watching and interaction effects of the video content are improved.

Example two

Fig. 2 shows a flow chart of a video image processing method according to a second embodiment of the invention.

Referring to fig. 2, in step S210, a video image of video content is acquired.

The video image may be a video image of video content continuously captured by an image capturing device such as a camera, a video camera, or the like, or the video content may be pre-recorded video content.

In step S220, a drawing position of the business object in the video image is determined.

In the embodiment of the present invention, the drawing position of the business object may be determined in at least the following two ways: determining feature points of a target object from a video, and determining a drawing position of a business object to be drawn in the video image by using a pre-trained convolution network model for determining the display position of the business object in the video image according to the feature points of the target object; determining the type of a target object from the video, and determining the type of the target object according to the characteristic points of the target object; and determining the drawing position of the business object to be drawn according to the type of the target object.

The two modes are described in detail below.

In a first mode

When the drawing position of a business object to be drawn in a video image is determined in the first use mode, a convolution network model needs to be trained in advance, and the trained convolution network model has the function of determining the drawing position of the business object in the video image; alternatively, a convolution network model which is trained by a third party and has the function of determining the drawing position of the business object in the video image can be directly used.

It should be noted that, in this embodiment, the training of the target object part may be realized by referring to the related art with emphasis on the training of the business object, and this is only briefly described in the embodiment of the present invention.

When the convolutional network model needs to be trained in advance, one possible training method includes the following processes:

(1) and acquiring a characteristic vector of a business object sample image to be trained.

The feature vector includes information of a target object in the business object sample image, and position information and/or confidence information of the business object. Wherein the information of the target object indicates image information of the target object; the position information of the service object indicates the position of the service object, and the position information can be the position information of the center point of the service object or the position information of the area where the service object is located; the confidence information of the business object indicates the probability of the effect (such as being concerned or being clicked or being watched) that can be achieved when the business object is displayed at the current position, and the probability can be set according to the statistical analysis result of the historical data, the result of the simulation experiment, and the manual experience. In practical application, when a target object is trained, only the position information of the business object can be trained according to actual needs, only the confidence information of the business object can be trained, and both the position information and the confidence information can be trained. The position information and the confidence information of the business object can be more effectively and accurately determined by training both the convolutional network model and the service object, so that a basis is provided for displaying the business object.

The convolution network model is trained through a large number of sample images, and the business objects in the business object sample images in the embodiment of the invention can be labeled with position information, confidence information or both information in advance. Of course, in practical applications, the information may be obtained through other ways. And by marking the corresponding information of the business object in advance, the data and the interaction times of data processing can be effectively saved, and the data processing efficiency is improved.

And taking the business object sample image with the target object information and the position information and/or confidence degree information of the business object as a training sample, and extracting the feature vector of the training sample to obtain the feature vector containing the target object information and the position information and/or confidence degree information of the business object.

The feature vector may be extracted in an appropriate manner in the related art, and the embodiment of the present invention is not described herein again.

(2) And carrying out convolution processing on the feature vector to obtain a feature vector convolution result.

The obtained feature vector convolution result contains information of the target object, and position information and/or confidence information of the service object.

The convolution processing times of the feature vectors can be set according to actual needs, that is, in the convolution network model, the number of layers of the convolution layers is set according to actual needs, and the final feature vector convolution result meets the standard that the error is within a certain range (for example, 1/20-1/5 of the length or width of an image, and preferably, 1/10 of the length or width of the image).

The convolution result is the result of extracting the features of the feature vector, and the result can effectively represent the features and classification of each related object in the video image.

In the embodiment of the invention, when the feature vector contains both the position information and the confidence information of the business object, namely under the condition of training both the position information and the confidence information of the business object, the convolution result of the feature vector is shared when convergence condition judgment is carried out subsequently and respectively, repeated processing and calculation are not needed, the resource loss caused by data processing is reduced, and the data processing speed and efficiency are improved.

(3) And respectively judging whether the information of the corresponding target object in the feature vector convolution result and the position information and/or the confidence information of the service object meet the convergence condition.

Wherein, the convergence condition is set by those skilled in the art according to the actual requirement. When the information meets the convergence condition, the parameter setting in the convolution network model can be considered to be appropriate; when the information cannot satisfy the convergence condition, the parameter setting in the convolutional network model is considered to be improper, and the parameter setting needs to be adjusted, wherein the adjustment is an iterative process until the result of performing convolution processing on the feature vector by using the adjusted parameter satisfies the convergence condition.

In a feasible manner, the convergence condition may be set according to a preset standard position and/or a preset standard confidence, for example, whether a distance between a position indicated by the position information of the service object in the feature vector convolution result and the preset standard position satisfies a certain threshold is taken as the convergence condition of the position information of the service object; and whether the difference between the confidence coefficient indicated by the confidence coefficient information of the business object in the feature vector convolution result and the preset standard confidence coefficient meets a certain threshold value is used as a convergence condition of the confidence coefficient information of the business object, and the like.

Preferably, the preset standard position may be an average position obtained after averaging the positions of the business objects in the business object sample image to be trained; the preset standard confidence may be an average confidence obtained after performing average processing on the confidence of the business object in the sample image of the business object to be trained. And setting a standard position and/or a standard confidence coefficient according to the position and/or the confidence coefficient of the business object in the sample image of the business object to be trained, wherein the sample image is the sample to be trained and has huge data volume, so the set standard position and the set standard confidence coefficient are more objective and accurate.

When specifically determining whether the position information and/or the confidence information of the corresponding service object in the feature vector convolution result satisfies the convergence condition, a feasible method includes:

acquiring position information of a corresponding service object in the feature vector convolution result; calculating a first distance between a position indicated by the position information of the corresponding service object and a preset standard position by using a first loss function; judging whether the position information of the corresponding service object meets a convergence condition or not according to the first distance;

and/or the presence of a gas in the gas,

obtaining confidence information of a corresponding business object in the feature vector convolution result; calculating a second distance between the confidence coefficient indicated by the confidence coefficient information of the corresponding business object and the preset standard confidence coefficient by using a second loss function; and judging whether the confidence information of the corresponding business object meets the convergence condition or not according to the second distance.

In an optional implementation manner, the first loss function may be a function for calculating a euclidean distance between a position indicated by the position information of the corresponding service object and a preset standard position; and/or the second loss function may be a function for calculating a euclidean distance between the confidence degree indicated by the confidence degree information of the corresponding business object and a preset standard confidence degree. By adopting the Euclidean distance mode, the method is simple to realize and can effectively indicate whether the convergence condition is met. But are not limited to, other means, such as horse-type distances, barytems, etc., are equally applicable.

Preferably, as mentioned above, the preset standard position is an average position obtained after averaging the positions of the business objects in the business object sample image to be trained; and/or the preset standard confidence coefficient is an average confidence coefficient obtained after the average processing is carried out on the confidence coefficient of the business object in the sample image of the business object to be trained.

For the information of the target object in the feature vector convolution result, the judgment on whether the information of the target object is converged may be performed by referring to the relevant convergence condition using the convolution network model, which is not described herein again. If the information of the target object meets the convergence condition, the target object can be classified, the category of the target object is determined, and reference and basis are provided for determining the drawing position of the subsequent business object.

(4) If the convergence condition is met, finishing the training of the convolution network model; if the convergence condition is not met, adjusting the parameters of the convolution network model according to the feature vector convolution result, and performing iterative training on the convolution network model according to the adjusted parameters of the convolution network model until the feature vector convolution result after the iterative training meets the convergence condition.

By performing the above training on the convolutional network model, the convolutional network model can perform feature extraction and classification on the drawing position of the business object displayed based on the target object, thereby having a function of determining the drawing position of the business object in the video image. When the drawing positions comprise a plurality of drawing positions, the convolution network model can also determine the quality sequence of the display effect in the plurality of drawing positions through the training of the business object confidence coefficient, so that the optimal drawing position is determined. In subsequent application, when a business object needs to be displayed, an effective drawing position can be determined according to a current image in a video.

In addition, before the convolutional network model is trained, preprocessing may be performed on the business object sample image in advance, including: acquiring a plurality of business object sample images, wherein each business object sample image contains the labeling information of a business object; determining the position of the business object according to the labeling information, and judging whether the distance between the determined position of the business object and the preset position is smaller than or equal to a set threshold value or not; and determining the business object sample image corresponding to the business object smaller than or equal to the set threshold value as the business object sample image to be trained. The preset position and the set threshold may be appropriately set by a person skilled in the art in any appropriate manner, for example, according to a data statistical analysis result, a related distance calculation formula, or manual experience, and the like, which is not limited in this embodiment of the present invention.

In one possible approach, the position of the business object determined according to the annotation information may be a central position of the business object. When the position of the business object is determined according to the labeling information and whether the distance between the determined position of the business object and the preset position is smaller than or equal to a set threshold value is judged, the central position of the business object can be determined according to the labeling information; and then judging whether the variance between the center position and the preset position is less than or equal to a set threshold value.

By preprocessing the sample images of the business objects in advance, sample images which do not meet the conditions can be filtered out, so that the accuracy of the training result is ensured.

The training of the convolutional network model is realized through the process, and the trained convolutional network model can be used for determining the drawing position of the business object in the video image. For example, in the process of live video, if the anchor clicks the service object indication to display the service object, after the convolutional network model obtains the facial feature point of the anchor in the live video image, the optimal position for displaying the service object, such as the forehead position of the anchor, can be indicated, and then the mobile terminal controls the live broadcast application to display the service object at the position; or, in the process of live video, if the anchor clicks the service object indication to display the service object, the convolution network model can directly determine the drawing position of the service object according to the live video image.

Mode two

In the second mode, firstly, the type of the target object is determined according to the characteristic points of the target object; and determining the drawing position of the business object to be drawn according to the type of the target object.

Wherein the types of the target object include, but are not limited to: face type, background type, hand type, and action type. The face type is used for indicating that a face occupies a main part in the video image, the background type is used for indicating that a background occupies a larger part in the video image, the hand type is used for indicating that a hand occupies a main part in the video image, and the action type is used for indicating that a person performs a certain action.

After the feature points of the target object are obtained, the type of the target object can be determined by adopting the existing related detection, classification or learning method. After the type of the target object is determined, the drawing position of the business object to be drawn can be determined according to a set rule, including:

when the type of the target object is a face type, determining the drawing position of the business object to be drawn includes at least one of the following: a hair region, a forehead region, a cheek region, a chin region, and a body region other than the head of the person in the video image; and/or the presence of a gas in the gas,

when the type of the target object is a background type, determining the drawing position of the business object to be drawn comprises: a background region in the video image; and/or the presence of a gas in the gas,

when the type of the target object is a hand type, determining the drawing position of the business object to be drawn comprises: the area in the set range, which takes the area where the hand is located in the video image as the center; and/or the presence of a gas in the gas,

when the type of the target object is the action type, determining the drawing position of the business object to be drawn comprises: a predetermined area in the video image.

The preset area in the video image may include: any region other than the person in the video image may be set as appropriate by a person skilled in the art according to actual conditions, for example, a region within a set range centered on the motion generation portion, a region within a set range other than the motion generation portion, or a background region, and the like, which is not limited in the embodiment of the present invention.

In an optional embodiment, the action corresponding to the action type includes at least one of: blinking, opening mouth, nodding head, shaking head, kissing, smiling, waving hand, scissor hand, fist making, holding hand, erecting thumb, swinging hand gun, swinging V-shaped hand, and swinging OK hand.

When the business object is drawn, the center point or any coordinate of the position can be drawn as the center point of the business object to draw the business object; for another example, a certain coordinate position in the drawing positions is determined as a center point, and the like, which is not limited in this embodiment of the present invention. In the embodiment of the present invention, the preset area in the video image may include: a region of a person in the video image or any region other than the person in the video image.

In addition, between video images of continuously acquired video contents, no jump occurs in the position of any object in a video frame, that is, in the current video image, any object (such as a reference object) is usually near the position of the previous video image. Therefore, according to an exemplary embodiment of the present invention, a predetermined tracking method is used to determine the drawing position of a business object in a video image according to the drawing position of the business object determined from a previous video image (e.g., a previous frame or two frames of a current video image, etc.) of the video image. By determining the drawing position of the business object in the current video image from the drawing position detected from the previous video image, the video image does not need to be detected in a full frame each time, the operation amount is reduced, and the speed and the efficiency are improved.

In step S230, the service object is drawn at a drawing position in the video image in a computer drawing manner, and a service content link is set for the drawn service object, so that a video picture with the drawn service object is displayed and the drawn service object has an interactive function.

Step S230 is similar to step S120, and is not repeated herein.

Thereafter, during the playing of the video content, the user may perform an operation on the interface where the video content is played, such as clicking a certain position of the interface, sliding on the interface, performing multi-point zooming on the interface, and the like. After the above operation of the user on the interface for playing the video content is detected, if the user performs the operation on the service object in the displayed video image, step S240 is performed.

In step S240, in response to the operation of the user on the service object displayed on the interface for playing the video content, the service content pointed by the service content link is obtained, and the service content is displayed.

Specifically, the interface for playing the video content may be, but is not limited to, an interface of a video playing application, an interface of a live application, and the like. The floating window can be popped up on the playing interface of the video content, and the service content is displayed in the floating window, so that the user can return to the playing interface of the video content after finishing viewing the service content. Or, the service content link may be skipped to in a playing interface of the video content, and the service content is displayed in the playing interface.

One application scenario of this embodiment is that, in a live application, an advertisement sticker of a certain brand of headwear is drawn on a head of a host on a live screen and a business content link of an official website of a headwear manufacturer is set for the advertisement sticker of the headwear. When the user clicks the headwear area of the live broadcast picture in the live broadcast watching process, through the processing of step S240, the live broadcast application acquires the corresponding official website of the headwear manufacturer through the service content link and displays the official website.

By the video image processing method provided by the embodiment, the drawing position of the business object to be drawn can be detected from the video content to be displayed, and various business objects such as advertisement stickers or special effects are added in the video picture to be drawn, so that the content of the video picture is enriched, and the augmented reality effect of virtual and real combination is achieved; in addition, related service content links are set for the drawn service objects, the user can further obtain and display the service contents related to the service objects through the operation of adding the drawn service objects, an interactive function with high fusion degree with video pictures is provided for the user, the normal video watching experience of a viewer is not influenced, the dislike of the viewer is not easily caused, and the watching and interactive effects of the video contents are improved. When the service object configured with the service content link is used for displaying the advertisement, compared with the traditional video advertisement mode, the service object is combined with video playing, additional advertisement video data irrelevant to the video does not need to be transmitted through a network, so that network resources and/or system resources of a client side are saved, an interaction function with high fusion degree with a video picture is provided for a user, and watching and interaction effects of the video content are improved.

EXAMPLE III

Fig. 3 is a block diagram showing a configuration of a video image processing apparatus according to a third embodiment of the present invention.

Referring to fig. 3, the video image processing apparatus according to the third embodiment includes a data acquisition unit 310 and a rendering unit 320.

The data obtaining unit 310 is configured to obtain a video image of video content and display information of a service object to be drawn, where the display information of the service object includes information of a drawing position of the service object in the video image.

Optionally, the data obtaining unit 310 is configured to obtain the video image and the presentation information from a video stream of the video content.

The drawing unit 320 is configured to draw the business object at a drawing position in the video image in a computer drawing manner, and set a business content link for the drawn business object.

Optionally, the display information further includes location information or identification information of the service object; the drawing unit 320 is further configured to obtain image data of the business object according to the location information or the identification information of the business object.

Optionally, the drawing unit 320 is further configured to obtain a service content link related to the service object according to the location information or the identification information of the service object.

Optionally, the presentation information further includes a service content link related to the service object; the drawing unit 320 is further configured to obtain the service content link related to the service object from the presentation information.

Optionally, the drawing unit 320 is configured to set a link triggering area of the service content link according to an area occupied by the service object.

Optionally, the image data of the business object is a frame sequence; the drawing unit 320 is configured to acquire frame data synchronized with the video image from the image data, and draw the frame data at the drawing position in a computer drawing manner.

Optionally, the business object is a special effect or an advertisement sticker with semantics.

Optionally, the business object includes a special effect containing advertisement information in at least one of the following forms: two-dimensional paster special effect, three-dimensional special effect and particle special effect.

Optionally, the video code stream is a live video stream based on the H264 standard, and the presentation information is carried in a network abstraction layer unit of the live video stream.

The video image processing apparatus of this embodiment is configured to implement the corresponding video image processing method in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Further, the video image processing apparatus of the present embodiment may be provided in a suitable terminal device, including but not limited to a mobile terminal, a PC, and the like.

Example four

Fig. 4 is a block diagram showing a configuration of a video image processing apparatus according to a fourth embodiment of the present invention.

Referring to fig. 4, the video image processing apparatus according to the fourth embodiment includes an operation detection unit 330 and a service content presentation unit 340 in addition to the data acquisition unit 310 and the rendering unit 320.

The operation detection unit 330 is configured to detect an operation performed by a user on the service object displayed on the interface for playing the video content. The service content presenting unit 340 is configured to, in response to the operation on the presented service object detected by the operation detecting unit, obtain the service content pointed by the service content link, and present the service content.

Optionally, the service content presenting unit 340 is configured to pop up a floating window on the playing interface of the video content, and present the service content in the floating window, or present the service content in the playing interface of the video content.

Optionally, the data obtaining unit 310 is configured to determine a drawing position of the business object in the video image.

Optionally, the data obtaining unit 310 is configured to determine a rendering position of the business object in the video image according to a rendering position of the business object determined from a previous video image of the video images using a predetermined tracking method.

Optionally, the data obtaining unit 310 is configured to: determining feature points of a target object from the video image, and determining the drawing position of the business object in the video image by using a pre-trained convolution network model for determining the drawing position of the business object in the video image according to the feature points of the target object; or, determining the type of the target object from the video image, and determining the drawing position of the business object according to the type of the target object.

Optionally, the data obtaining unit 310 is configured to: when the type of the target object is a face type, determining the drawing position of the business object comprises at least one of the following steps: a hair region, a forehead region, a cheek region, a chin region, and a body region other than the head of the person in the video image; and/or when the type of the target object is a background type, determining the drawing position of the business object comprises: a background region in the video image; and/or when the type of the target object is a hand type, determining the drawing position of the business object comprises: the area in the set range, which takes the area where the hand is located in the video image as the center; and/or when the type of the target object is an action type, determining the drawing position of the business object comprises: a predetermined area in the video image.

EXAMPLE five

Fig. 5 is a schematic structural diagram of a terminal device according to a fifth embodiment of the present invention. The specific embodiment of the present invention does not limit the specific implementation of the terminal device.

As shown in fig. 5, the terminal device may include: a processor (processor)502, a Communications Interface 504, a memory 506, and a communication bus 508.

Wherein:

the processor 502, communication interface 504, and memory 506 communicate with one another via a communication bus 508.

A communication interface 504 for communicating with network elements of other devices, such as other clients or servers.

The processor 502 is configured to execute the program 510, and may specifically perform the relevant steps in the above method embodiments.

In particular, program 510 may include program code that includes computer operating instructions.

Processor 510 may be a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more Integrated Circuits (ICs) configured to implement embodiments of the present invention, or a Graphics Processing Unit (GPU). The one or more processors included in the terminal device may be the same type of processor, such as one or more CPUs, or one or more GPUs; or may be different types of processors, such as one or more CPUs and one or more GPUs.

And a memory 506 for storing a program 510. The memory 506 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 510 may specifically be used to cause the processor 502 to perform the following operations: acquiring a video image of video content and display information of a business object to be drawn, wherein the display information of the business object comprises information of a drawing position of the business object in the video image; and drawing the business object at the drawing position in the video image by adopting a computer drawing mode, and setting business content link for the drawn business object.

In an alternative embodiment, the program 510 is further configured to cause the processor 502 to: and responding to the operation of the user on the business object displayed on the interface for playing the video content, acquiring the business content pointed by the business content link, and displaying the business content.

In an alternative embodiment, the program 510 is configured to cause the processor 502 to: popping up a floating window on a playing interface of the video content, and displaying the service content in the floating window; or, the service content is displayed in the playing interface of the video content.

In an optional embodiment, the presentation information further includes location information or identification information of the business object; program 510 is also operative to cause processor 502 to: and acquiring the image data of the business object according to the position information or the identification information of the business object.

In an alternative embodiment, the program 510 is further configured to cause the processor 502 to: and acquiring a service content link related to the service object according to the position information or the identification information of the service object.

In an optional embodiment, the presentation information further includes a service content link related to the service object; program 510 is also operative to cause processor 502 to: and acquiring the business content link related to the business object from the display information.

In an alternative embodiment, the program 510 is configured to cause the processor 502 to: and determining the drawing position of the business object in the video image.

In an alternative embodiment, the program 510 is configured to cause the processor 502 to perform the following operations: determining a rendering position of the business object in the video image from a rendering position of the business object determined from a previous video image of the video images using a predetermined tracking method.

In an alternative embodiment, the program 510 is configured to cause the processor 502 to perform the following operations: the determining the drawing position of the business object in the video image comprises: determining feature points of a target object from the video image, and determining the drawing position of the business object in the video image by using a pre-trained convolution network model for determining the drawing position of the business object in the video image according to the feature points of the target object; or, determining the type of the target object from the video image, and determining the drawing position of the business object according to the type of the target object.

In an alternative embodiment, the program 510 is further configured to cause the processor 502 to: when the type of the target object is a face type, determining the drawing position of the business object comprises at least one of the following steps: a hair region, a forehead region, a cheek region, a chin region, and a body region other than the head of the person in the video image; and/or when the type of the target object is a background type, determining the drawing position of the business object comprises: a background region in the video image; and/or when the type of the target object is a hand type, determining the drawing position of the business object comprises: the area in the set range, which takes the area where the hand is located in the video image as the center; and/or when the type of the target object is an action type, determining the drawing position of the business object comprises: a predetermined area in the video image.

In an alternative embodiment, the image data of the business object is a sequence of frames; the program 510 is used to cause the processor 502 to specifically perform the following operations: and acquiring frame data synchronized with the video image from the image data, and drawing the frame data at the drawing position by adopting a computer drawing mode.

In an alternative embodiment, the program 510 is configured to cause the processor 502 to: and when the video image and the display information are obtained from the video code stream of the video content.

In an alternative embodiment, the program 510 is configured to cause the processor 502 to: and setting a link triggering area of the business content link according to the area occupied by the business object.

In an alternative embodiment, the business object is a special effect or advertisement sticker with semantics.

In an alternative embodiment, the business object includes special effects including advertising information in at least one of the following forms: two-dimensional paster special effect, three-dimensional special effect and particle special effect.

In an optional implementation manner, the video code stream is a live video stream based on the H264 standard, and the presentation information is carried in a network abstraction layer unit of the live video stream.

The terminal device of the video image in this embodiment is configured to implement the corresponding video image processing method in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present invention may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present invention.

The above-described method according to an embodiment of the present invention may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein may be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the processing methods described herein. Further, when a general-purpose computer accesses code for implementing the processes shown herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the processes shown herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.

The above embodiments are only for illustrating the embodiments of the present invention and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.

Claims

1. A video image processing method, comprising:

acquiring a video image of video content and display information of a business object to be drawn, wherein the display information of the business object comprises information of a drawing position of the business object in the video image;

drawing the business object at a drawing position in the video image by adopting a computer drawing mode, and setting a business content link for the drawn business object;

wherein the determining the drawing position of the business object in the video image comprises:

determining a type of a target object from the video image, and determining a drawing position of the business object according to the type of the target object, wherein the type of the target object is used for indicating a part of the target object occupying a main part in the video image or indicating an action performed by the target object.

2. The method of claim 1, wherein the type of target object comprises at least any one of: face type, background type, hand type and action type; wherein the content of the first and second substances,

the face type is used to indicate that a face occupies a major part in a video image,

the background type is used to indicate that the background occupies a large portion in the video image,

the hand type is used to indicate that the hand occupies a major portion in the video image,

the action type is used to indicate an action performed by the character.

3. The method of claim 2, wherein the determining a rendering location of the business object according to a type of a target object comprises:

when the type of the target object is a face type, determining the drawing position of the business object comprises at least one of the following steps: a hair region, a forehead region, a cheek region, a chin region, and a body region other than the head of the person in the video image; and/or the presence of a gas in the gas,

when the type of the target object is a background type, determining the drawing position of the business object comprises: a background region in the video image; and/or the presence of a gas in the gas,

when the type of the target object is a hand type, determining the drawing position of the business object comprises: the area in the set range, which takes the area where the hand is located in the video image as the center; and/or the presence of a gas in the gas,

when the type of the target object is an action type, determining the drawing position of the business object comprises: a predetermined area in the video image.

4. The method of claim 3, wherein the predetermined area in the video image comprises:

a region within a set range centered on the motion generation region, or,

a region within a setting range other than the motion generation region, or

A background region.

5. The method of claim 3 or 4, wherein the action corresponding to the action type comprises at least one of: blinking, opening mouth, nodding head, shaking head, kissing, smiling, waving hand, scissor hand, fist making, holding hand, erecting thumb, swinging hand gun, swinging V-shaped hand, and swinging OK hand.

6. The method of any of claims 1-5, wherein the method further comprises:

and responding to the operation of the user on the business object displayed on the interface for playing the video content, acquiring the business content pointed by the business content link, and displaying the business content.

7. The method of claim 6, wherein the presenting the business content comprises:

popping up a floating window on a playing interface of the video content, and displaying the business content in the floating window, or

And displaying the service content in a playing interface of the video content.

8. The method according to any one of claims 1 to 7, wherein the presentation information further comprises location information or identification information of the business object,

the method further comprises the following steps: and acquiring the image data of the business object according to the position information or the identification information of the business object.

9. The method of any of claims 1-8, wherein the method further comprises:

and acquiring a service content link related to the service object according to the position information or the identification information of the service object.

10. The method according to any one of claims 1-9, wherein the presentation information further comprises a business content link related to the business object,

the method further comprises the following steps: and acquiring the business content link related to the business object from the display information.

11. The method according to any one of claims 1 to 10, wherein the determining a drawing position of a business object in the video image comprises:

determining a rendering position of the business object in the video image from a rendering position of the business object determined from a previous video image of the video images using a predetermined tracking method.

12. The method according to any one of claims 1 to 11, wherein the image data of the business object is a sequence of frames,

the step of drawing the business object at the drawing position in the video image by adopting a computer drawing mode comprises the following steps:

acquiring frame data synchronized with the video image from the image data,

and drawing the frame data at the drawing position by adopting a computer drawing mode.

13. The method according to any one of claims 1 to 12, wherein the acquiring of the video image of the video content and the display information of the business object to be drawn comprises:

acquiring the video image and the display information from the video code stream of the video content;

the video code stream of the video content is a live video stream based on the H264 standard, and the display information is carried in a network abstraction layer unit of the live video stream.

14. The method according to any one of claims 1 to 13, wherein the setting of the business content link for the drawn business object comprises:

and setting a link triggering area of the business content link according to the area occupied by the business object.

15. The method according to any one of claims 1 to 14, wherein the business object is a special effect or a sticker with semantics.

16. The method of any of claims 1-15, wherein the business object comprises a special effect in the form of at least one of: two-dimensional paster special effect, three-dimensional special effect and particle special effect.

17. A video image processing apparatus comprising:

the data acquisition unit is used for acquiring a video image of live video content and display information of a service object to be drawn, wherein the display information of the service object comprises information of a drawing position of the service object in the video image;

the drawing unit is used for drawing the business object at the drawing position in the video image in a computer drawing mode and setting business content link for the drawn business object;

the data acquisition unit is used for determining the drawing position of a business object in the video image;

wherein the data acquisition unit is configured to:

18. A terminal device comprises one or more processors, a memory, a communication interface and a communication bus, wherein the one or more processors, the memory and the communication interface are communicated with each other through the communication bus;

the memory is used for storing at least one computer program, and the computer program enables the processor to execute the operation corresponding to the video image processing method according to any one of claims 1-16.