WO2018033137A1 - 在视频图像中展示业务对象的方法、装置和电子设备 - Google Patents

在视频图像中展示业务对象的方法、装置和电子设备 Download PDF

Info

Publication number
WO2018033137A1
WO2018033137A1 PCT/CN2017/098027 CN2017098027W WO2018033137A1 WO 2018033137 A1 WO2018033137 A1 WO 2018033137A1 CN 2017098027 W CN2017098027 W CN 2017098027W WO 2018033137 A1 WO2018033137 A1 WO 2018033137A1
Authority
WO
WIPO (PCT)
Prior art keywords
business object
video image
type
location
area
Prior art date
Application number
PCT/CN2017/098027
Other languages
English (en)
French (fr)
Inventor
石建萍
栾青
许亲亲
王雷
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201610694812.XA external-priority patent/CN107343225B/zh
Priority claimed from CN201610694625.1A external-priority patent/CN107343211B/zh
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to US15/847,172 priority Critical patent/US11037348B2/en
Publication of WO2018033137A1 publication Critical patent/WO2018033137A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/225Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on a marking or identifier characterising the area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/23Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on positionally close patterns or neighbourhood relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present application relates to artificial intelligence technology, and more particularly to a method, device and electronic device for displaying a business object in a video image.
  • Internet video has become an important business traffic portal and is considered a premium resource for ad placement.
  • Existing video advertisements are mainly inserted into a fixed-time advertisement before the video is played, or at a certain time of the video playback, or placed in a fixed position in the area where the video is played and its surrounding area.
  • the embodiment of the present application provides a technical solution for displaying a business object in a video image.
  • a method of displaying a business object in a video image comprising: detecting at least one target object from a video image, and determining a feature point of the at least one target object; Determining a feature point of the at least one target object, determining a display position of the business object to be displayed in the video image; and drawing the business object by using a computer drawing manner at the display position.
  • an apparatus for displaying a business object in a video image comprising: a first determining module, configured to detect at least one target object from a video image, and determine the at least one a feature point of the target object; a second determining module, configured to determine, according to the feature point of the at least one target object, a display position of the business object to be displayed in the video image; and a drawing module, configured to be in the display position
  • the business object is drawn by computer drawing.
  • an electronic device including: a processor, a memory, a communication interface, and a communication bus, where the processor, the memory, and the communication interface are completed by using the communication bus Communication with each other; the memory is for storing at least one executable instruction that causes the processor to perform an operation corresponding to a method of presenting a business object in a video image in any of the embodiments of the present application.
  • another electronic device including: a processor and a device for displaying a service object in a video image according to any of the embodiments of the present application;
  • a processor and a device for displaying a service object in a video image according to any of the embodiments of the present application;
  • a text detection system a unit in a device that displays a business object in a video image as described in any of the embodiments of the present application is executed.
  • a computer program comprising computer readable code, the processor in the device executing the implementation of the present application when the computer readable code is run on a device
  • a computer readable storage medium for storing computer readable instructions that, when executed, implement the video image of any of the present applications The operation of each step in the method of presenting a business object.
  • the target object is detected from the video image and the feature points of the target object are determined.
  • the different target objects have different feature points, and the determined feature points of the target object are used as the business object to be displayed.
  • the basis of the display position determine the display position of the business object to be displayed, and draw the business object by computer drawing at the determined display position to display the business object.
  • the face may be used as a target object, and the video image may be detected to obtain a target object of the face and determine a feature point thereof, and may include, but is not limited to, an eyebrow corresponding to the eyebrow.
  • Some or all of the feature points in the eyes, mouth, nose, and facial contours are referenced to determine the display positions of the business objects to be displayed, such as displaying business objects on the forehead above the eyebrows.
  • the business object is drawn by using a computer drawing manner at the determined display position, and no additional advertising video data irrelevant to the video is transmitted through the network, thereby saving network resources.
  • the business object is closely combined with the target object in the video image, and the business object can be displayed in a manner that does not disturb the viewer, does not affect the normal video viewing experience of the viewer, and is not easy to cause the audience to resent, and can be effective Achieve the desired effect.
  • FIG. 1 is a flow chart of an embodiment of a method of presenting a business object in a video image in accordance with the present application
  • FIG. 2 is a flow diagram of another embodiment of a method of presenting a business object in a video image in accordance with the present application
  • FIG. 3 is a flow chart of still another embodiment of a method of presenting a business object in a video image in accordance with the present application
  • FIG. 4 is a flow chart of still another embodiment of a method of presenting a business object in a video image in accordance with the present application
  • FIG. 5 is a flow chart of still another embodiment of a method of presenting a business object in a video image in accordance with the present application
  • FIG. 6 is a flow diagram of yet another embodiment of a method of presenting a business object in a video image in accordance with the present application.
  • FIG. 7 is a structural block diagram of an embodiment of an apparatus for displaying a business object in a video image according to the present application.
  • FIG. 8 is a structural block diagram of another embodiment of an apparatus for displaying a business object in a video image according to the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of another embodiment of an electronic device according to the present application.
  • Embodiments of the invention may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • a method for displaying a service object in a video image in this embodiment includes:
  • Step S102 detecting at least one target object from the video image, and determining feature points of the at least one target object.
  • the video image is an image corresponding to a video data frame in the video, and may be an image in a live video, that is, a live video image, or may be an image in a video that is pre-recorded and played in advance. Wait.
  • Each video image may include certain target objects, such as characters, gestures, backgrounds, and the like.
  • the target object is an object that is present in the video image and is easily viewed by the viewer, and may include, but is not limited to, a human body (eg, a face, a body part, etc.), an action (eg, a gesture, a gesture, etc.), Background, etc.
  • the target object generally has a certain number of feature points, such as the traditional 68 feature points of the eyes, the nose, the mouth, and the contour of the face included in the face, and the feature points of the fingertip, the finger valley, and the hand contour included in the hand, for example.
  • the feature points of the background boundary and the like the embodiment of the present application does not specifically limit the feature points of the target object and the target object, and can be applied to any target object and any feature points thereof.
  • a corresponding feature extraction algorithm or a neural network model (such as a convolutional network) may be employed. Model), etc., when detecting a target object from a video image and determining a feature point of the target object, by detecting a target object in the video image and determining a feature point of the target object, the display position of the business object to be displayed may be subsequently determined. in accordance with. For example, after the feature points of the background boundary are determined, the business object can be displayed at the appropriate position of the background, or after the feature points of the face are determined, the business object can be displayed at the appropriate position of the face (such as forehead, cheek, etc.) .
  • step S102 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first determining module 702 that is executed by the processor.
  • Step S104 Determine, according to the feature points of the at least one target object, the display position of the business object to be displayed in the video image.
  • the business object may include, for example, but not limited to, a special effect including semantic information (such as advertisement, entertainment, weather forecast, traffic forecast, pet, etc.), wherein the special effect may be, for example, a three-dimensional (3D) form.
  • Special effects such as 3D advertising effects such as advertisements displayed in 3D special effects; or 2D (2D) forms of stickers, such as 2D advertising stickers such as advertisements displayed in the form of stickers; or particle effects, etc.
  • the invention is not limited thereto, and other forms of business objects are also applicable to the technical solutions of the embodiments of the present application, such as an application (APP) or an application description or introduction, or a certain form of an object (such as an electronic pet) that interacts with a video audience. Wait.
  • APP application
  • an application description or introduction or a certain form of an object (such as an electronic pet) that interacts with a video audience. Wait.
  • the display position may be determined according to the set rule according to the feature point of the target object; For example, it can be determined according to the feature points of the target object using a trained neural network model (such as a convolutional network model);
  • one or more display positions of the business object to be displayed in the video image may be determined, and the plurality includes two or more quantities.
  • step S104 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second determining module 704 being executed by the processor.
  • Step S106 The above-mentioned business object to be displayed is drawn by using a computer drawing manner at the determined display position.
  • the business object can be drawn graphically at the placement for business object presentation.
  • the sticker can be used for advertising and display, for example, displaying the name of a product through a virtual cap type, attracting viewers to watch, improving the advertising and display fun, and improving the advertising. And show efficiency.
  • the business object is a sticker, such as an advertisement sticker
  • the related information of the business object such as the identifier and size of the business object, may be acquired first. After the placement is determined, the business object can be scaled, rotated, etc.
  • ads can also be displayed in 3D special effects, such as text or logos (LOGOs) that display ads through particle effects.
  • 3D special effects such as text or logos (LOGOs) that display ads through particle effects.
  • the drawing of the business object by using a computer drawing manner may be implemented by appropriate computer graphics image drawing or rendering, for example, but not limited to: based on Open Graphics Language (OpenGL) graphics.
  • OpenGL Open Graphics Language
  • OpenGL defines a professional graphical program interface for cross-programming language and cross-platform programming interface specifications. It is hardware-independent and can easily draw 2D or 3D graphics images.
  • the OpenGL graphics rendering engine not only can 2D effects be drawn, but also 3D stickers can be drawn, and 3D effects can be drawn and particle effects can be drawn.
  • the present application is not limited to the drawing method based on the OpenGL graphics rendering engine, and other methods may be adopted.
  • the drawing manner based on the graphics rendering engine such as Unity or OpenCL is also applicable to the embodiments of the present application.
  • step S106 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a rendering module 706 executed by the processor.
  • the target object is detected from the video image and the feature points of the target object are determined, and different target objects have different feature points, and the determined feature points of the target object are used as Determining the basis of the display position of the business object to be displayed, determining the display position of the business object to be displayed, and drawing the business object by computer drawing in the determined display position to display the business object.
  • the business object is an advertisement to be displayed
  • the business object is drawn by using a computer drawing manner at the determined display position, and the business object is combined with the video playing, and no additional advertising video data irrelevant to the video is transmitted through the network, which is beneficial to Saving network resources and system resources of the client;
  • the business object is closely integrated with the target object in the video image, and the business object can be displayed in a manner that does not disturb the viewer, and does not affect the normal video viewing experience of the viewer, and is not easy to cause disgust to the viewer. It is beneficial to improve the delivery of business objects and achieve the desired efficiency and effectiveness of display.
  • the foregoing service object may include: multiple associated service objects.
  • determining the display position of the business object to be displayed in the video image may include: determining a corresponding display position of the plurality of associated business objects to be displayed in the video image.
  • Drawing a business object by means of computer graphics at the display location may include: drawing a plurality of associated business objects by computer drawing at a plurality of determined corresponding display locations.
  • the multiple associated service objects may include, but are not limited to, at least one of the following or Any number of multiple: multiple effects containing semantic information, multiple presentations containing the same effect of semantic information, multiple presentations containing semantic information provided by the same business object provider Special effects.
  • the special effect may include any one of two-dimensional sticker effects, three-dimensional effects, and particle effects of the advertisement information.
  • other forms of business objects are also applicable to the video image processing solutions provided by the embodiments of the present application, for example, the cheek sticker effects provided by the Coca-Cola Company, the forehead sticker effects, and the background sticker effects. For example, virtual headline sticker effects for game themes, virtual clothing sticker effects, background sticker effects for game scenes, and the like.
  • the plurality of corresponding display positions include at least one or any of the following: a hair area, a forehead area, a cheek area, a chin area, and a head other than a head in a video image.
  • a two-dimensional sticker effect when multiple associated business objects are multiple two-dimensional sticker effects containing advertisement information for displaying the same business object theme, or multiple display portions of the same two-dimensional sticker effect containing advertisement information.
  • the plurality of 2D sticker effects or multiple display portions of the same 2D sticker effect may be used for advertisement placement and display.
  • the name of a product is displayed in the mouth position of the anchor through a virtual cap type sticker effect, and the product is displayed in the anchor hand position by the virtual container type sticker effect, and in the live broadcast through the background type sticker effect.
  • the background of the video shows the product and its name, which greatly attracts the attention and attention of the audience, enhances the advertising and display fun, and improves the efficiency of advertising and display.
  • FIG. 2 is a flow diagram of another embodiment of a method of presenting a business object in a video image in accordance with the present application. As shown in FIG. 2, the method for displaying a service object in a video image in this embodiment includes:
  • Step S202 detecting at least one target object from the video image, and determining a feature point of the at least one target object.
  • the video image may be an image corresponding to a video data frame in the video, and each image has a certain target object, such as a character, a gesture, a background, and the like.
  • a target object such as a human face, or multiple target objects such as a face, a background, an action, and the like.
  • detecting a target object in a video image and determining a feature point may be implemented by any suitable related technology, for example, a linear feature extraction method, such as principal component analysis (PCA), may be adopted.
  • PCA principal component analysis
  • LDA Linear Discriminant Analysis
  • ICA Independent Component Analysis
  • nonlinear feature extraction methods such as Kernel Principal Component Analysis (Kernel PCA), manifold learning, etc., or trained neural networks can also be used.
  • the model is as follows: The convolutional network model in the embodiment of the present application performs the extraction of the feature points of the target object, which is not limited in this embodiment of the present application.
  • the target object is detected from the live video image and the feature points of the target object are determined; for example, the video image of the electronic device is played during the playback of a recorded video.
  • the target object is detected and the feature point of the target object is determined; for example, the electronic device detects the target object from the recorded video image and determines the feature point of the target object during recording of a certain video, and the like.
  • step S202 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first determining module 702 that is executed by the processor.
  • step S204 according to the feature points of the at least one target object, the corresponding display positions of the plurality of associated business objects to be displayed in the video image are determined.
  • step S204 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second determining module 704 being executed by the processor.
  • step S206 the plurality of associated business objects are respectively drawn by computer drawing in the determined corresponding display positions.
  • multiple associated business objects can be drawn in a computer drawing at the corresponding placements for associated business object presentations.
  • the associated business object is a sticker, such as an advertisement sticker
  • the related information of the associated business object such as the identifier and size of the associated business object, may be acquired first.
  • the associated business object can be scaled, rotated, etc. according to the coordinates of the area where the display is located (such as the rectangular area of the display), and then drawn by a corresponding drawing method such as an OpenGL graphics rendering engine.
  • the target object detected in this embodiment is a face and a background
  • the determined three display positions are the mouth, the hand, and the background in the face
  • An object provider such as the Coca-Cola Company
  • An object provider that contains multiple effects of advertising information, such as a sticker effect on a virtual container (such as a Coca-Cola beverage bottle) at the hand placement, and a background display, such as a Coca-Cola company poster. Background sticker effect.
  • step S206 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a rendering module 706 executed by the processor.
  • the length of the content of the live or short video is inherently short, and it is difficult to insert the fixed-length advertisement in the traditional way.
  • the advertisement is delivered by the business object, and the advertisement is effectively integrated with the live video content.
  • the method is flexible and the effect is vivid, which is not only beneficial to improving the live viewing experience of the user, but also beneficial to improving the advertisement effect.
  • advertising and other scenarios are more suitable. It can be understood that in addition to advertising, business objects can be widely applied to other aspects, such as education, consulting, services, etc., by providing entertainment, appreciation and other business information to improve interaction and improve user experience.
  • the video image processing method provided in this embodiment detects at least one target object from the video image and determines feature points of the at least one target object, the different target objects have different feature points; and the determined characteristics of the at least one target object
  • the point is determined by the display position of the plurality of related business objects to be displayed; and the associated business objects are respectively drawn by using computer graphics in the determined plurality of display positions to display the business objects.
  • the associated business object is an advertisement to be displayed
  • the associated business object and the target object in the video image are mutually set off and closely combined, and the associated business object can be displayed from multiple angles without affecting the normal video viewing of the viewer.
  • the operation of acquiring a video image may be further included.
  • the image in the currently playing video is obtained from the live application (ie, the live video image), or the video image is obtained from the video being recorded.
  • the processing of a video image is taken as an example, but it should be understood by those skilled in the art that video images of the video image sequence in multiple video images or video streams may be referred to the embodiment of the present application. deal with.
  • the possible implementation manners may include:
  • the pre-trained convolutional network model for determining the placement of the business object in the video image is used to determine the business object to be displayed or multiple related services to be displayed. The location of the object in the video image;
  • the second method is corresponding to: determining, according to the type of the at least one target object, a corresponding display area of the plurality of associated business objects; corresponding to the plurality of associated business objects a display area, determining a corresponding display position of the plurality of associated business objects in the video image;
  • the third method is to determine, according to the feature point of the at least one target object and the type of the business object to be displayed, the display position of the business object to be displayed in the video image; if the business object includes multiple associated business objects, the mode three correspondingly Determining, according to a feature point of the at least one target object and a plurality of associated business objects to be displayed, a position of the plurality of associated business objects in the video image;
  • the target display position corresponding to the feature point of the at least one target object is obtained from the correspondence between the feature point of the target object and the display position, and the obtained target display position is determined as the business object to be displayed.
  • the method 4 is corresponding to: acquiring the feature of the at least one target object from the correspondence between the feature points of the target object and the display position stored in advance
  • the corresponding target placement position of the point determines the plurality of target display positions obtained as corresponding placement positions of the plurality of associated business objects in the video image.
  • a convolutional network model is pre-trained, and the trained convolutional network model has the determined business object or multiple associated business objects in the video image.
  • the training of the business object is exemplarily described as a routine example, and the training of the target object part can be implemented by referring to related technologies, which is only briefly described in the embodiment of the present application.
  • a feasible training method includes the following processes:
  • the feature vector includes: information of the target object in the image of the business object sample, and location information and/or confidence information of the business object.
  • the information of the target object indicates the image information of the target object.
  • the location information of the business object indicates the location of the business object, which may be the location information of the central point of the business object, or the location information of the area where the business object is located.
  • the confidence information of the business object indicates the probability that the business object can achieve the display effect (such as being focused, clicked, or viewed) when the current location is displayed. The probability can be set according to the statistical analysis result of the historical data. It can also be set according to the results of the simulation experiment, and can also be set according to manual experience.
  • the location information of the business object may be trained according to actual needs, or only the confidence information of the business object may be trained, and both may be trained.
  • the training of the location information and the confidence information of the business object enables the trained convolutional network model to more effectively and accurately determine the location information and confidence information of the business object, so as to provide a basis for the display of the business object.
  • the convolutional network model is trained by a large number of sample images.
  • the business objects in the business object sample image in the embodiment of the present application may be pre-labeled with location information, or confidence information, or both types of information.
  • location information and confidence information can also be obtained by other means.
  • the extraction of the feature vector may be implemented in an appropriate manner in the related art, and details are not described herein again.
  • the target object and the business object may be separately trained by using the same convolutional network model.
  • the feature vector of the business object sample image may include information of the target object or include the business object. Location information and/or confidence information.
  • the obtained feature vector convolution result includes information of the target object, and location information and/or confidence information of the business object.
  • the number of convolution processing of the feature vector can be set according to actual needs, that is, in the convolution network model, the number of layers of the convolution layer can be set according to actual needs, so that the final feature vector convolution result can satisfy the error.
  • a certain range of standards can be.
  • the final feature vector convolution result is 1/20 to 1/5 of the length or width of the image.
  • the final feature vector convolution result may be 1/10 of the length or width of the image.
  • the convolution result is the result of feature extraction of the feature vector, and the convolution result can effectively characterize the classification and classification of each related object (for example, the target object, the business object) in the video image.
  • the feature vector convolution result is shared in the subsequent judgment of the convergence condition respectively, and no need for repeated processing and calculation, which reduces resource loss caused by data processing, and improves data processing speed and efficiency.
  • the operations (1) and (2) may be performed by a processor invoking a corresponding instruction stored in a memory or by a convolutional network model executed by the processor.
  • the convergence condition can be appropriately set by a person skilled in the art according to actual needs.
  • the parameter setting in the convolutional network model may be considered appropriate; when the above information cannot satisfy the convergence condition, the parameter setting in the convolutional network model may be considered inappropriate, and adjustment is needed. It is an iterative process that iteratively performs the operations (1) to (3) in the training mode until the information in the feature vector convolution result satisfies the convergence condition.
  • the convergence condition may be set according to a preset standard location and/or a preset standard confidence, for example, a location indicated by the location information of the service object in the feature vector convolution result and the preset
  • the distance between the standard positions satisfies a certain threshold as a convergence condition of the location information of the service object; the difference between the confidence level indicated by the confidence information of the service object in the feature vector convolution result and the preset standard confidence satisfies a certain
  • the threshold is used as a convergence condition of the confidence information of the business object.
  • the preset standard location may be an average position obtained by averaging the location of the service object in the sample image of the business object to be trained; and/or, the preset standard confidence may be the training to be performed.
  • the standard position and/or standard confidence is set according to the location and/or confidence of the business object in the sample image of the business object to be trained, because the sample image of the business object is a sample to be trained and the amount of data is large, thus setting the standard position And standard confidence is also more objective and accurate.
  • a feasible manner in determining whether the location information and/or the confidence information of the corresponding service object in the feature vector convolution result meets the convergence condition, includes:
  • the first loss function may be: a function of calculating a Euclidean distance between a location indicated by location information of the corresponding service object and a preset standard location; and/or a second loss function. It may be: a function of calculating the Euclidean distance between the confidence level indicated by the confidence information of the corresponding business object and the preset standard confidence.
  • the Euclidean distance method is simple to implement and can effectively indicate whether the convergence condition is satisfied.
  • the embodiment of the present application is not limited thereto, and other methods such as a horse distance and a bar distance may be used, and the same applies.
  • the target object For the information of the target object in the eigenvector convolution result, whether the information of the target object converges can be determined by referring to the convergence condition of the related convolutional network model, and details are not described herein. If the information of the target object satisfies the convergence condition, the target object may be classified, and the category of the target object is clarified, so as to provide a reference and basis for determining the position of the subsequent business object.
  • the convergence condition is satisfied, the training of the convolutional network model is completed; if the convergence condition is not satisfied, the parameters of the convolutional network model are adjusted according to the feature vector convolution result, and according to the adjusted convolutional network model The parameters are iteratively trained on the convolutional network model until the eigenvector convolution result obtained after the iterative training satisfies the convergence condition.
  • the operations (3) and (4) may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a training module 708 executed by the processor.
  • the convolutional network model can extract and classify the display positions of the business objects displayed based on the target object, thereby having the function of determining the display position of the business object in the video image.
  • the convolutional network model can also determine the order of the display effects in the plurality of display positions, thereby determining the optimal display position. In subsequent applications, when a business object needs to be displayed, a valid placement can be determined based on the current image in the video.
  • preprocessing the business object sample image may further include: acquiring a plurality of business object sample images, where each business object sample image includes a business object. Labeling information; determining a location of the business object according to the annotation information, determining whether the distance between the determined location of the business object and the preset location is less than or equal to a set threshold; and determining a distance between the determined location and the preset location is less than or equal to a set threshold.
  • the business object sample image corresponding to the business object is determined as the sample image of the business object to be trained, and participates in the above training process.
  • the preset position and the set threshold may be appropriately set by any suitable means by a person skilled in the art, for example, according to the statistical analysis result of the data, or the calculation formula of the related distance, or the artificial experience, etc., which is not limited by the embodiment of the present application.
  • the location of the business object determined according to the annotation information may be the central location of the business object.
  • the central location of the business object may be determined according to the annotation information; and then determining the central location and the pre-determination Set whether the variance of the position is less than or equal to the set threshold.
  • the sample image that does not meet the conditions can be filtered out to ensure the accuracy of the training result.
  • the training of the convolutional network model is implemented by the above process, and the trained convolutional network model can be used to determine the location of the business object in the video image. For example, in the live broadcast process, if the anchor clicks on the business object to indicate the display of the business object, after the convolutional network model obtains the facial feature point of the anchor in the live video image, the optimal location of the display service object may be indicated.
  • the convolutional network model can directly determine the display position of the business object according to the live video image.
  • the type of the at least one target object is first determined according to the feature points of the at least one target object; and the display area of the business object to be displayed is determined according to the type of the at least one target object; and then according to the display area Determine the placement of the business object to be displayed in the video image.
  • the type of the target object may include, for example but not limited to, a face type, a background type, a hand type, and an action type.
  • the face type indicates that the face occupies the main part in the video image
  • the background type indicates that the background occupies a larger part in the video image
  • the hand type indicates that the hand occupies the main part in the video image
  • the action type indicates the video image.
  • the character has performed some kind of action.
  • the related detection, classification or learning method may be used to determine the type of the target object.
  • the display area of the business object to be displayed may be determined according to the set rule, for example:
  • determining that the display area of the business object to be displayed includes at least one or any of the following: a hair area, a forehead area, a cheek area, a chin area, and a head other than the head in the video image Body area; and/or,
  • determining that the display area of the business object to be displayed includes: a background area in the video image; and/or,
  • determining that the display area of the business object to be displayed includes: an area within the set range centered on the area where the hand is located in the video image; and/or,
  • determining that the display area of the business object to be displayed includes: a preset area in the video image.
  • the predetermined area may be appropriately set according to an actual situation, for example, an area within a setting range centering on the motion generating portion, or an area within a setting range other than the motion generating portion, or a background area, or the like. The embodiment of the present application does not limit this.
  • the display area of the business object to be displayed determined by the hair area, the forehead area, the background area, the hand area, and the like may display a plurality of related business objects in a combined manner. , which shows multiple associated business objects on different placements.
  • multiple associated business objects to be displayed can also be displayed on the same placement (such as the hair area).
  • the augmented reality effect can be achieved.
  • a plurality of semantic virtual items such as 2D sticker effects (ie, business objects) containing advertisement information are added to the related areas such as characters and backgrounds in the picture.
  • business objects ie, business objects
  • the action corresponding to the action type includes at least one or any of the following: blinking, opening, nodding, shaking, kissing, smiling, waving, scissors, fist, hand, vertical Thumb, swing pistol posture, swing V-hand, swing OK hand.
  • the display position of the business object to be displayed in the video image can be further determined.
  • the center point of the display area is the center point of the display position of the business object, and the display of the business object is performed; for example, a certain coordinate position in the display area is determined as the center point of the display position, etc. No restrictions.
  • the third manner when determining the display position of the business object to be displayed in the video image, not only according to the feature point of the target object, but also The placement of the business object to be displayed in the video image is determined according to the type of the business object to be displayed.
  • the type of the business object includes at least one or any of the following: a forehead patch type, a cheek patch type, a chin patch type, a virtual hat type, a virtual clothing type, a virtual makeup type, a virtual headwear type, and a virtual hair accessory.
  • the type of the business object may be other suitable types, such as a virtual cap type, a virtual cup type, a text type, and the like.
  • an appropriate placement location can be selected for the business object with reference to the feature points of the target object.
  • At least one display position may be selected from the plurality of display positions as The final placement.
  • a text type business object it can be displayed in the background area, or it can be displayed on the person's forehead or body area.
  • the target display position corresponding to the feature point of the at least one target object is acquired from the correspondence between the feature point of the target object and the display position stored in advance; and the obtained target display position is determined to be displayed The location of the business object in the video image.
  • the correspondence between the feature points of the target object and the display position may be preset and stored in the form of a mapping table or the like. The embodiment of the present application does not limit the storage form of the corresponding relationship.
  • a service object is a sticker including semantic information
  • an advertisement sticker is taken as an example to describe a scheme for displaying a business object in a video image according to an embodiment of the present application. As shown in Figure 3,
  • Step S302 Acquire a business object sample image and perform pre-processing to determine a business object sample image to be trained.
  • sample image of the business object There may be some sample images in the sample image of the business object that do not meet the training standard of the convolutional network model.
  • the sample image of the training standard that does not conform to the convolutional network model can be filtered out.
  • each business object sample image includes a labeled target object and an annotated business object, and the business object is labeled with location information and confidence information.
  • the location information of the central point of the business object is used as the location information of the business object.
  • the sample image is filtered according to the location information of the business object. After obtaining the coordinates of the location indicated by the location information, the coordinates are compared with the preset location coordinates of the business object of the type, and the position variance of the two is calculated. If the location variance is less than or equal to the set threshold, the business object sample image may be used as the sample image to be trained; if the location variance is greater than the set threshold, the business object sample image is filtered out.
  • the preset position coordinates and the set thresholds may be appropriately set by a person skilled in the art according to actual conditions.
  • the set threshold may be an image. 1/20 to 1/5 of the length or width, for example, the set threshold may be 1/10 of the length or width of the image.
  • the average position and the confidence of the service object in the determined sample image of the business object to be trained may be averaged to obtain an average position and an average confidence, and the average position and the average confidence may be used as subsequent convergence conditions. Basis.
  • the business object sample image used for training in this embodiment is labeled with the coordinates of the optimal advertisement position and the confidence of the advertisement space.
  • the coordinates of the optimal advertisement position can be marked in the face, the gesture, the front background, etc., and the joint training of the advertisement points of the facial feature points, the gesture, the front background, and the like can be realized, which is relative to the face, the gesture, etc.
  • a separate training program for advertising locations and their confidence helps save computing resources.
  • the size of the confidence indicates the probability that this ad slot is the best ad slot. For example, if this ad slot is mostly occluded, the confidence is low.
  • the operation S302 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a pre-processing module 7080 executed by the processor.
  • Step S304 Training the convolutional network model using the determined sample of the business object sample to be trained.
  • the operation S302 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a training module 708 executed by the processor.
  • the feature vector of the business object sample image to be trained may be input, and the feature vector includes both the information of the target object and the information of the business object: location information of the business object and confidence information.
  • the first training branch of the second stage performs regression analysis on the position of the business object, that is, the advertisement sticker, in the convolution result of the first stage feature vector, and predicts the position coordinates of the optimal advertising sticker.
  • the second training branch of the second stage performs regression analysis on the confidence of the business object, that is, the advertisement sticker, in the convolution result of the first stage feature vector, and predicts the confidence of the advertisement sticker.
  • the output of the output layer may be a predicted value of 35 layers (ie, convolution layer 6_2 (1x1x2)) and 42 layers (ie, convolution layer cls_7_1 (1x1x1)).
  • the first training branch of the second phase and the second training branch of the second phase share the feature vector convolution result of the first phase, which is beneficial to saving computing resources;
  • the training of the first training branch of the second stage and the second training branch of the second stage may be performed in any order, or may be performed in parallel, or may be performed in any time sequence;
  • the feature vector convolution result of the first stage may include the feature extraction and classification result of the target object, the feature extraction and classification result of the business object, and the location of the business object.
  • Feature extraction and classification results of information and confidence information may include the feature extraction and classification result of the target object, the feature extraction and classification result of the business object, and the location of the business object.
  • the prediction of the position of the optimal advertising sticker may be iteratively performed multiple times, and each time the position of the optimal advertising sticker is predicted, the convolution network is adjusted according to the prediction result.
  • the network parameters of the model (such as the value of the convolution kernel, the weight of the linear change of the inter-layer output, etc.) are predicted based on the parameter-adjusted convolutional network model and iterated multiple times until the convergence condition is met.
  • the loss layer 36 uses the first loss function to determine whether the position of the advertisement sticker trained in the first phase satisfies the convergence condition, and if the convergence condition is not satisfied, the convolution network model will be reversed.
  • the first loss function may use a function min x,y (x ⁇ x gt ) 2 +(y ⁇ y gt ) 2 that measures the Euclidean distance, where (x, y) is the advertisement to be optimized.
  • the coordinates of the sticker, (x gt , y gt ), are the coordinates of the preset standard position.
  • the preset standard location may be an averaged average position of the location of the service object in the sample image of the business object to be trained obtained in step S302;
  • the convergence condition may be, for example, that the coordinates of the advertisement sticker to be optimized are the same as the coordinates of the preset standard position, or the difference between the coordinates of the advertisement sticker to be optimized and the coordinates of the preset standard position is less than a certain threshold (such as an image). 1/20 to 1/5 of length or width, optionally 1/10), or the number of iterations of parameter optimization reaches a predetermined number of times (such as 10 to 20 times), and so on;
  • the prediction of the confidence of the advertisement sticker may be iteratively performed multiple times.
  • the convolutional network model is adjusted according to the prediction result.
  • Network parameters such as the value of the convolution kernel, the weight of the linear change of the inter-layer output, etc.
  • the convergence condition is met.
  • the loss layer 43 determines whether the confidence level of the advertisement sticker trained in the first phase satisfies the convergence condition using the second loss function, and if the convergence condition is not satisfied, the convolution network model will Perform backpropagation, adjust the training parameters of the convolutional network model, and realize the regression calculation of the advertising sticker confidence.
  • the second loss function may use, for example, a function min p (p ⁇ p gt ) 2 that measures the Euclidean distance, where p is the confidence of the advertisement sticker to be optimized, and p gt is the preset standard confidence. degree.
  • the preset standard confidence may be an average confidence after averaging the confidence of the service object in the sample image of the business object to be trained obtained in step S302;
  • the confidence to be optimized is the same as the preset standard confidence, or the difference between the confidence to be optimized and the preset standard confidence is less than a certain threshold (eg less than or equal to 25%), or parameter optimization
  • the number of iterations reaches a predetermined number of times (such as 10 to 20 times), etc.
  • each convolutional layer has a nonlinear response unit followed by a Rectified Linear Units (ReLU) by adding the above-mentioned corrected linear unit after the convolutional layer.
  • the mapping result of the convolutional layer can be as sparse as possible so as to be closer to the human visual response, so that the image processing effect is better.
  • the convolutional layer of the convolutional layer is set to 3x3, which can better integrate the local information in the video image.
  • the step stride of the pooling layer has the feature of enhancing spatial invariance, that is, allowing The same input appears at different image locations, and the output results are the same.
  • the size of the convolution kernel, the number of channels, the size of the pooled core, the interval, and the number of layers of the convolution layer are all illustrative. In practical applications, those skilled in the art The adaptive adjustment can be made according to actual needs, and the embodiment of the present application does not limit this. In addition, combinations and parameters of all layers in the convolutional network model in this embodiment are optional and can be arbitrarily combined.
  • the position of the optimal advertising sticker is predicted using the first training branch, and the confidence of the position is predicted using the second training branch, thereby realizing effective prediction of the position of the advertising sticker in the video image.
  • Step S306 Acquire a current video image, take the current video image as an input, use the trained convolutional network model to detect at least one target object from the video image, and determine a feature point of the at least one target object.
  • step S306 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first determining module 702 that is executed by the processor.
  • Step S308 Determine, by using the trained convolutional network model, the display position of the business object to be displayed in the current video image according to the feature points of the at least one target object.
  • step S306 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second determining module 704 being executed by the processor.
  • Step S310 The business object to be displayed is drawn by using a computer drawing manner in the display position in the current video image.
  • step S306 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a rendering module 706 executed by the processor.
  • the method for displaying a service object in the video image in this embodiment may be implemented on any suitable terminal device, such as a mobile terminal or a personal computer (PC), which has the functions of data collection, processing, and transmission. limit.
  • a mobile terminal or a personal computer (PC)
  • PC personal computer
  • the service object is a sticker containing the semantic information
  • the advertisement sticker is taken as an example to describe a scheme for displaying the business object in the video image in the embodiment of the present application.
  • the method for displaying a service object in a video image in this embodiment includes:
  • Step S402 Detecting at least one target object from the video image, and determining feature points of the at least one target object.
  • step S402 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first determining module 702 being executed by the processor.
  • Step S404 Determine a type of the at least one target object according to the feature points of the at least one target object.
  • Each target object has a certain feature point, such as a feature point of a face or a hand, and a boundary point of the background, etc.
  • correlation detection, classification, or learning may be employed. The method determines the type of the target object.
  • Step S406 Determine a display area of the business object to be displayed according to the type of the at least one target object.
  • determining that the display area of the business object to be displayed includes at least one or any of the following: a hair area, a forehead area, a cheek area, a chin area, a body area other than the head; and/or,
  • determining that the display area of the business object to be displayed includes: a background area in the video image; and/or,
  • determining that the display area of the business object to be displayed includes: an area within the set range centered on the area where the hand is located in the video image; and/or,
  • determining that the display area of the business object to be displayed includes: a preset area in the video image.
  • the scene usually includes common scenes of live broadcast and short video sharing.
  • the main body of the scene is often a main character plus a simple background, and the characters often occupy more in the screen.
  • the area that the viewer mainly focuses on is the subject's face area and the body movement, in order to enable the viewer to notice the content of the advertisement without affecting the subject of the video, the augmented reality effect can be achieved.
  • adding semantic virtual items such as advertising stickers (ie, business objects) to the relevant areas of the picture characters, and achieving commercial value through the display effects and information on the virtual items.
  • a display area of a forehead patch type business object may be an area of the anchor forehead; a display area of a cheek patch type business object may be an area of the cheeks on both sides of the anchor; The display area of the business object of the cheek patch type may be the area of the cheek on both sides of the anchor and the area above the anchor forehead in the background area; the display area of the business object of the chin patch type may be the area of the anchor chin; The display area of the business object of the virtual headdress type may be the anchor hair and the area in the background; a display area of the business object displayed in the background area may be displayed when the business object does not cover the background; The display area of the business object triggered by the action trigger may be the area at the anchor eye; the display area of the business object triggered by the kiss action may be the area at the mouth of the anchor; a display area of the business object triggered by the smile action Can be multiple areas; a wave of action triggers the display of business objects Area may be an area anchor hand; one
  • Step S408 Determine, according to the display area, a display position of the business object to be displayed in the video image.
  • the determined display area may include only one area, and may also include multiple areas, and one or more display areas may be determined according to the type of the business object for business object drawing and display.
  • the display area of the business object in the video image is the corresponding forehead area, and the center point of the forehead area is the placement position.
  • the center draws and displays business objects.
  • the display area of the business object in the video image may include a body area, a forehead area, a cheek area, a background area, etc., from which a determination may be made.
  • steps S404-S08 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a rendering module 706 executed by the processor.
  • Step S410 drawing a business object and displaying it by using a computer drawing method at the display position.
  • step S410 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second determining module 704 that is executed by the processor.
  • the business objects in the above examples may be a text form or a picture form or a combination of the two.
  • the method for displaying a service object in a video image in this embodiment can effectively determine an appropriate advertisement delivery and display position in an anchor type video scene, and effectively integrate the video playback without additional network resources and client system resources. Improve the effectiveness and efficiency of advertising while not affecting the user's video viewing experience.
  • step S406 determining, according to the type of the at least one target object, a corresponding display area of the plurality of associated business objects to be displayed;
  • step S408 the corresponding display positions of the plurality of associated business objects to be displayed in the video image are determined according to the corresponding display areas of the plurality of associated business objects to be displayed.
  • the center point of the display area is used as the center point of the display point of the business object to display the business object; for example, a certain coordinate position in the display area is determined as the center point of the display position, etc., which is not used in this embodiment of the present application. limit;
  • step S410 a plurality of associated business objects to be displayed are respectively drawn by using a computer drawing manner at corresponding display positions.
  • the video image processing method provided in this embodiment can effectively determine the display position of the associated service object in the video image, thereby respectively drawing multiple associated business objects by using computer graphics in the determined display position, thereby implementing the associated service.
  • the delivery and display of objects Combined display between multiple related business objects, and effective display combined with video playback, improving business pairs
  • the efficiency and effect of image delivery and display also eliminates the need for additional data transmission, saving network resources and system resources of the client.
  • FIG. 5 is a flow diagram of still another embodiment of a method of presenting a business object in a video image in accordance with the present application.
  • the method for displaying a service object in a video image in this embodiment may be performed by any device having data acquisition, processing, and transmission functions, including but not limited to a mobile terminal.
  • Electronic devices such as personal computers (PCs).
  • PCs personal computers
  • This embodiment is described by taking a business object including a plurality of related business objects as an example, and is applicable to a separate business object.
  • the method for displaying a business object in a video image in this embodiment includes:
  • Step S502 detecting at least one target object from the video image, and determining feature points of the at least one target object.
  • step S502 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first determining module 702 that is executed by the processor.
  • Step S504 determining, according to the feature points of the at least one target object, a pre-trained convolution network model for determining a display position of the business object in the video image, determining that the plurality of associated business objects to be displayed correspond in the video image Placement.
  • step S502 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second determining module 704 being executed by the processor.
  • Step S506 in the determined corresponding display position, the plurality of associated business objects are respectively drawn and displayed by using a computer drawing manner.
  • step S502 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a rendering module 706 executed by the processor.
  • the pre-trained convolutional network model can effectively determine the display position of the associated business object in the video image, thereby respectively drawing multiple images in the determined display position by computer graphics. Associate the business object, and then realize the delivery and display of the associated business object.
  • the combined display of multiple related business objects and the effective combination with video playback are beneficial to improve the efficiency and effectiveness of the delivery and display of business objects, and do not require additional data transmission, which is conducive to saving network resources and system resources of the client.
  • FIG. 6 is a flow diagram of yet another embodiment of a method of presenting a business object in a video image in accordance with the present application.
  • multiple associated business objects are provided as the same business object topic or belong to the same business object provider, and multiple effects including the semantic information, or multiple display parts containing the same special effect of the semantic information.
  • the special effect is specifically a two-dimensional sticker special effect including advertisement information, and the video image processing scheme of the embodiment of the present application is described, and the same applies to a single business object.
  • the method for displaying a business object in a video image in this embodiment includes: Step S602, detecting at least one target object from the video image, and determining feature points of the at least one target object.
  • step S602 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first determining module 702 that is executed by the processor.
  • Step S604 Determine, according to the feature points of the at least one target object and the types of the plurality of associated service objects to be displayed, the corresponding display positions of the plurality of associated business objects to be displayed in the video image.
  • the determination is to be performed according to not only the feature points of the at least one target object but also the type of the associated business object to be displayed.
  • the placement of multiple associated business objects in the video image can be selected for the associated business object with reference to the feature point of the target object.
  • the corresponding display positions in the video image may be obtained from the plurality of displays. Select at least one placement in the location. For example, for a text type of associated business object, it can be displayed in the background area, or in the forehead or body area of the character.
  • Step S606 The plurality of associated business objects are respectively drawn and displayed by computer drawing in the determined corresponding display positions.
  • the associated business objects in the above examples may be in the form of a text or a picture or a combination of the two.
  • the feature points of the target object and the type of the associated business object are comprehensively considered, and the display position of the associated business object in the video image is determined, so that the corresponding display position is respectively determined by computer graphics.
  • Draw multiple associated business objects which in turn enables the delivery and display of associated business objects.
  • the combined display of multiple related business objects and the effective combination with video playback are beneficial to improve the efficiency and effectiveness of the delivery and display of business objects, and do not require additional data transmission, which is conducive to saving network resources and system resources of the client.
  • any of the methods for displaying a service object in a video image provided by the embodiments of the present invention may be performed by any suitable device having data processing capability, including but not limited to: a terminal device, a server, and the like.
  • any method for displaying a service object in a video image provided by an embodiment of the present invention may be executed by a processor, such as the processor performing any one of the embodiments mentioned in the embodiments of the present invention by calling a corresponding instruction stored in the memory. The method of displaying business objects in an image. This will not be repeated below.
  • a computer, processor, microprocessor controller or programmable hardware includes storage components (eg, RAM, ROM, flash memory, etc.) that can store or receive software or computer code, when the software or computer code is The processing methods described herein are implemented when the processor or hardware is accessed and executed. Moreover, when a general purpose computer accesses code for implementing the processing shown herein, the execution of the code converts the general purpose computer into a special purpose computer for performing the processing shown herein.
  • FIG. 7 is a structural block diagram of an apparatus for displaying a service object in a video image according to the present application.
  • the apparatus of the embodiments of the present application may be used to implement the foregoing method embodiments of the present invention.
  • the apparatus for displaying a service object in a video image in this embodiment includes:
  • the first determining module 702 is configured to detect at least one target object from the video image, and determine a feature point of the at least one target object.
  • the second determining module 704 is configured to determine, according to feature points of the at least one target object, a display position of the business object to be displayed in the video image.
  • the drawing module 706 is configured to draw a business object by using a computer drawing manner at the display position.
  • the apparatus for displaying a business object in a video image by the present embodiment detects a target object from a video image and determines feature points of the target object, and different target objects have different feature points, and the determined feature points of the target object are determined as Demonstrate the display position of the business object, determine the display position of the business object to be displayed, and draw the business object by computer drawing at the determined display position to display the business object.
  • the business object is an advertisement to be displayed
  • the business object is drawn by using a computer drawing manner at the determined display position, and the business object is combined with the video playing, and no additional advertising video data irrelevant to the video is transmitted through the network, which is beneficial to Saving network resources and system resources of the client;
  • the business object is closely integrated with the target object in the video image, and the business object can be displayed in a manner that does not disturb the viewer, and does not affect the normal video viewing experience of the viewer, and is not easy to cause disgust to the viewer. It is beneficial to improve the delivery of business objects and achieve the desired efficiency and effectiveness of display.
  • the second determining module 704 may pass a pre-trained convolutional network for determining a placement of a business object in a video image.
  • the model implementation, that is, the second determining module 804 is configured to determine a business object to be displayed according to a feature point of the at least one target object, using a pre-trained convolutional network model for determining a placement of the business object in the video image. The placement in the video image.
  • FIG. 8 is a structural block diagram of another embodiment of an apparatus for presenting a business object in a video image in accordance with the present application.
  • the embodiment further includes a training module 708 for pre-training the convolutional network model.
  • the training module 708 includes:
  • a first obtaining module 7082 configured to acquire, by using a convolutional network model, a feature vector of a business object sample image to be trained, where the feature vector includes information of a target object in a business object sample image to be trained, and a service Location information and/or confidence information of the object;
  • a second obtaining module 7084 configured to perform convolution processing on the feature vector by using a convolutional network model to obtain a feature vector convolution result
  • the determining module 1086 is configured to respectively determine information about a corresponding target object in the feature vector convolution result, and whether the location information and/or the confidence information of the service object meet the convergence condition of the preset setting;
  • the execution module 7088 is configured to: if the judgment result of the determination module 1086 is that the convergence condition is satisfied, complete the training of the convolutional network model; if the judgment result of the determination module 1086 is that the convergence condition is not satisfied, adjust the result according to the feature vector convolution result.
  • the parameters of the convolutional network model are such that the training module 708 iteratively trains the convolutional network model according to the parameters of the adjusted convolutional network model until the eigenvector convolution result after the iterative training satisfies the convergence condition.
  • the determining module 1086 may include: a first determining module, configured to acquire location information of a corresponding service object in the feature vector convolution result; and use a first loss function to calculate a location indicated by the location information of the corresponding service object a first distance between the preset standard positions; determining, according to the first distance, whether the location information of the corresponding service object satisfies a convergence condition; and/or, the second determining module is configured to obtain a corresponding service in the feature vector convolution result Confidence information of the object; using the second loss function, calculating a second distance between the confidence level indicated by the confidence information of the corresponding business object and the preset standard confidence; determining the confidence of the corresponding business object according to the second distance Whether the degree information satisfies the convergence condition.
  • the first loss function may be: a function of calculating a Euclidean distance between the location indicated by the location information of the corresponding service object and the preset standard location; and/or the second loss function may be: calculating the corresponding A function of the Euclidean distance between the confidence level indicated by the confidence information of the business object and the preset standard confidence.
  • the preset standard location may be: an average position obtained after averaging the location of the business object in the sample image of the business object to be trained; and/or, the preset standard confidence may be: Business object in the business object sample image The average confidence obtained after the average processing of the reliability.
  • the training module 708 may further include: a pre-processing module 7080, configured to acquire a plurality of business object sample images, where each service is acquired, before the first obtaining module 7082 acquires the feature vector of the business object sample image to be trained
  • the object sample image includes the annotation information of the business object; determining the location of the business object according to the annotation information, determining whether the distance between the determined location of the business object and the preset location is less than or equal to a set threshold; determining the location and the preset location
  • the business object sample image corresponding to the business object whose distance is less than or equal to the set threshold is determined as the business object sample image to be trained.
  • the pre-processing module 7080 determines, according to the location information, the location of the service object, and determines whether the distance between the determined location of the service object and the preset location is less than or equal to a set threshold: determining a central location of the service object according to the annotation information; It is determined whether the variance of the center position and the preset position is less than or equal to a set threshold.
  • the second determining module 704 may include: a type determining module 7042, configured to determine a type of the at least one target object according to information of feature points of the at least one target object; and an area determining module 7044, configured to use the at least one The type of the target object is determined by the display area of the business object to be displayed.
  • the location determining module 7046 is configured to determine the display position of the business object to be displayed in the video image according to the display area.
  • the area determining module 7044 may include: a first area determining module, configured to: when the type of the at least one target object is a face type, determine that the display area of the business object to be displayed includes at least one or any of the following a body area of the character in the video image, a forehead area, a cheek area, a chin area, a body area other than the head; and/or a second area determining module for when the type of the at least one target object is a background type Determining a display area of the business object to be displayed includes: a background area in the video image; and/or a third area determining module, configured to determine a business object to be displayed when the type of the at least one target object is a hand type
  • the display area includes: an area within the set range centered on the area where the hand is located in the video image; and/or a fourth area determining module, configured to determine when the type of the at least target object is an action type
  • the display area of the business object to be displayed includes: a prese
  • the action corresponding to the action type includes at least one or any of the following: blinking, opening, nodding, shaking, kissing, smiling, waving, scissors, fist, hand, thumbs up, pistol posture, pendulum V-hand, swinging OK hand.
  • the second determining module 704 is specifically configured to determine, according to the feature point of the at least one target object and the type of the business object to be displayed, a display position of the business object to be displayed in the video image.
  • the second determining module 704 is specifically configured to obtain, according to the feature point of the at least one target object and the type of the business object to be displayed, multiple display positions of the business object to be displayed in the video image; Select at least one placement in the location as the placement of the business object to be displayed in the video image.
  • the type of the business object may include, but is not limited to, at least one or any of the following: a forehead patch type, a cheek patch type, a chin patch type, a virtual hat type, a virtual clothing type, a virtual makeup type, Virtual headwear type, virtual hair accessory type, virtual jewelry type, background type, virtual pet type, virtual container type, and the like.
  • the second determining module 704 is configured to acquire a target display location corresponding to the feature point of the at least one target object from a correspondence between the feature point and the display location of the target object stored in advance; and the acquired The target placement is determined as the placement of the business object to be displayed at the video image.
  • the foregoing service object may be a special effect including semantic information;
  • the video image may be a live video type image.
  • the foregoing special effect including the semantic information may include at least one of the following special effects including the advertisement information: a two-dimensional sticker effect, a three-dimensional special effect, and a particle special effect.
  • the foregoing service object may include: multiple associated service objects.
  • the second determining module 704 is specifically configured to determine, according to the feature points of the at least one target object, a corresponding display position of the plurality of associated business objects to be displayed in the video image; the drawing module 706 is specifically used Multiple associated business objects are drawn separately by computer drawing at the corresponding placements.
  • the plurality of associated business objects may include, but are not limited to, at least one or any of the following: a plurality of special effects containing semantic information for displaying the same business object theme, and containing semantic information. Multiple presentations of the same effect, multiple effects provided by the same business object provider that contain semantic information.
  • the special effect may include any one of two-dimensional sticker effects, three-dimensional effects, and particle effects of the advertisement information.
  • other forms of business objects are also applicable to the video image processing solution provided by the embodiments of the present application.
  • the plurality of corresponding display positions include at least one or any of the following: a hair area of a person in a video image, a forehead area, a cheek area, a chin area, a body area other than the head, and a background area in the video image In the video image, the area within the setting range centering on the area where the hand is located, and the area preset in the video image.
  • the device for displaying the service object in the video image in the embodiments of the present application can be used to implement the foregoing method embodiments for displaying the service object in the video image, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.
  • the apparatus for displaying a business object in a video image in various embodiments of the present application may be disposed in an appropriate electronic device, such as Mobile terminals, PCs, servers, etc.
  • FIG. 9 is a schematic structural diagram of an embodiment of an electronic device according to the present application.
  • the electronic device can include a processor 902, a communications interface 904, a memory 906, and a communications bus 908. among them:
  • Processor 902, communication interface 904, and memory 906 complete communication with one another via communication bus 908.
  • the communication interface 904 is configured to communicate with network elements of other devices, such as other clients or servers.
  • the processor 702 may be a central processing unit (CPU), or an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application, or a graphics processor ( Graphics Processing Unit, GPU).
  • the one or more processors included in the terminal device may be the same type of processor, such as one or more CPUs, or one or more GPUs; or may be different types of processors, such as one or more CPUs and One or more GPUs.
  • the memory 906 is for at least one executable instruction that causes the processor 902 to perform operations corresponding to a method of presenting a business object in a video image as in any of the above-described embodiments of the present application.
  • the memory 906 may include a high speed random access memory (RAM), and may also include a non-volatile memory such as at least one disk memory.
  • FIG. 10 is a schematic structural diagram of another embodiment of an electronic device according to the present invention.
  • the electronic device includes one or more processors, a communication unit, etc., such as one or more central processing units (CPUs) 1001, and/or one or more An image processor (GPU) 1013 or the like, the processor may execute various kinds according to executable instructions stored in the read only memory (ROM) 1002 or executable instructions loaded from the storage portion 1008 into the random access memory (RAM) 1003. Proper action and handling.
  • processors such as one or more central processing units (CPUs) 1001, and/or one or more An image processor (GPU) 1013 or the like
  • the processor may execute various kinds according to executable instructions stored in the read only memory (ROM) 1002 or executable instructions loaded from the storage portion 1008 into the random access memory (RAM) 1003. Proper action and handling.
  • ROM read only memory
  • RAM random access memory
  • the communication portion 1012 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card, and the processor can communicate with the read only memory 1002 and/or the random access memory 1003 to execute executable instructions through the bus 1004.
  • the operation corresponding to the method for displaying the business object in the video image provided by the embodiment of the present application is completed, and the communication unit 1012 is connected to the other target device, for example, detecting at least one target from the video image. And determining a feature point of the at least one target object; determining, according to the feature point of the at least one target object, a display position of the business object to be displayed in the video image; and using a computer drawing manner at the display position Draw the business object.
  • RAM 1003 various programs and data required for the operation of the device can be stored.
  • the CPU 1001, the ROM 1002, and the RAM 1003 are connected to each other through a bus 1004.
  • ROM 1002 is an optional module.
  • the RAM 1003 stores executable instructions, or writes executable instructions to the ROM 1002 at runtime, the executable instructions causing the processor 1001 to perform the operations corresponding to the method of presenting the business objects in the video image described above.
  • An input/output (I/O) interface 1005 is also coupled to bus 1004.
  • the communication unit 1012 may be integrated or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and on the bus link.
  • the following components are connected to the I/O interface 1005: an input portion 1006 including a keyboard, a mouse, etc.; an output portion 1007 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk or the like And a communication portion 1009 including a network interface card such as a LAN card, a modem, or the like.
  • the communication section 1009 performs communication processing via a network such as the Internet.
  • the driver 1011 is also connected to the I/O interface 1005 as needed.
  • a removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 1011 as needed so that a computer program read therefrom is installed into the storage portion 1008 as needed.
  • FIG. 10 is only an optional implementation manner.
  • the number and type of components in the foregoing FIG. 10 may be selected, deleted, added, or replaced according to actual needs;
  • Different function components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, the communication part can be separated, or integrated on the CPU or GPU. and many more.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing an instruction corresponding to the method step provided by the embodiment of the present application, for example, detecting at least one target object from a video image, and determining a feature point of the at least one target object; determining, according to the feature point of the at least one target object, a display location of the displayed business object in the video image; the business object is drawn in a computer drawing manner at the display location.
  • the embodiment of the present application further provides a computer program, the computer program comprising computer readable code, the program code includes computer operating instructions, when the computer readable code is run on the device, the processor in the device executes An instruction for implementing the steps in the method of presenting a business object in a video image in any of the embodiments of the present application.
  • the embodiment of the present application further provides a computer readable storage medium, configured to store computer readable instructions, where the instructions are executed to implement a method for displaying a business object in a video image according to any embodiment of the present application. The operation of the steps.
  • the various components/steps described in the embodiments of the present application may be split into more components/steps according to the needs of the implementation, or two or more components/steps or partial operations of the components/steps may be combined into one. New components/steps to achieve the objectives of the embodiments of the present application.
  • One of ordinary skill in the art will recognize that the methods and apparatus of the present invention may be implemented in a number of ways.
  • the methods and apparatus of the present invention can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution.
  • the invention may also be embodied as a program recorded in a recording medium, the program comprising machine readable instructions for implementing the method according to the invention.
  • the invention also covers a recording medium storing a program for performing the method according to the invention.

Abstract

本申请实施例提供了一种在视频图像中展示业务对象的方法、装置和电子设备,其中,在视频图像中展示业务对象的方法包括:从视频图像中检测至少一个目标对象,并确定所述至少一个目标对象的特征点;根据所述至少一个目标对象的特征点,确定待展示的业务对象在所述视频图像中的展示位置;在所述展示位置采用计算机绘图方式绘制所述业务对象。通过本申请实施例,有利于节约网络资源和客户端的系统资源,以一种不打扰观众的方式展示业务对象,不影响观众的正常视频观看体验,不易引起观众反感,可以有效实现预想的效果。

Description

在视频图像中展示业务对象的方法、装置和电子设备
本申请要求在2016年08月19日提交中国专利局、申请号为CN201610694812.X、发明名称为“在视频图像中展示业务对象的方法、装置和终端设备”、和2016年08月19日提交中国专利局、申请号为CN201610694625.1、发明名称为“视频图像处理方法、装置和终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本申请涉及人工智能技术,尤其涉及一种在视频图像中展示业务对象的方法、装置和电子设备。
背景技术
随着互联网技术的发展,人们越来越多地使用互联网观看视频,互联网视频为许多新的业务提供了商机。互联网视频已成为重要的业务流量入口,并且被认为是广告植入的优质资源。
现有视频广告主要通过植入的方式,在视频播放之前、或者视频播放的某个时间插入固定时长的广告,或者在视频播放的区域及其周边区域固定位置放置广告。
发明内容
本申请实施例提供一种在视频图像中展示业务对象的技术方案。
根据本申请实施例的一个方面,提供了一种在视频图像中展示业务对象的方法,包括:从视频图像中检测至少一个目标对象,并确定所述至少一个目标对象的特征点;根据所述至少一个目标对象的特征点,确定待展示的业务对象在所述视频图像中的展示位置;在所述展示位置采用计算机绘图方式绘制所述业务对象。
根据本申请实施例的另一方面,还提供了一种在视频图像中展示业务对象的装置,包括:第一确定模块,用于从视频图像中检测至少一个目标对象,并确定所述至少一个目标对象的特征点;第二确定模块,用于根据所述至少一个目标对象的特征点,确定待展示的业务对象在所述视频图像中的展示位置;绘制模块,用于在所述展示位置采用计算机绘图方式绘制所述业务对象。
根据本申请实施例的又一方面,还提供了一种电子设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请任一实施例在视频图像中展示业务对象的方法对应的操作。
根据本申请实施例的再一方面,还提供了另一种电子设备,包括:处理器和本申请任一实施例所述的在视频图像中展示业务对象的装置;在处理器运行所述结构化文本检测系统时,本申请任一实施例所述的在视频图像中展示业务对象的装置中的单元被运行。
根据本申请实施例的又一方面,还提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请任一实施例所述的在视频图像中展示业务对象的方法中各步骤的指令。
根据本申请实施例的又一方面,还提供了一种计算机可读存储介质,用于存储计算机可读取的指令,所述指令被执行时实现本申请任一项所述的在视频图像中展示业务对象的方法中各步骤的操作。。
根据本申请实施例提供的技术方案,从视频图像中检测目标对象并确定目标对象的特征点,不同的目标对象具有不同的特征点,将确定的目标对象的特征点作为确定待展示的业务对象的展示位置的依据,确定待展示的业务对象的展示位置,在确定的展示位置采用计算机绘图方式绘制业务对象,以进行业务对象的展示。例如,在人脸加简单背景的视频图像中,可以以人脸作为目标对象,对该视频图像进行检测获得人脸这一目标对象并确定其特征点,例如可以包括但不限于对应于眉毛、眼睛、嘴巴、鼻子、脸部轮廓中的部分或全部的特征点,以这些特征点为参考,确定待展示的业务对象的展示位置,如在眉毛以上的额头部位展示业务对象等。本申请实施例中,当业务对象为待展示的广告时,一方面,在确定的展示位置采用计算机绘图方式绘制业务对象,无须通过网络传输与视频无关的额外广告视频数据,有利于节约网络资源和客户端的系统资源;另一方面,业务对象与视频图像中的目标对象紧密结合,可以一种不打扰观众的方式展示业务对象,不影响观众的正常视频观看体验,不易引起观众反感,可以有效实现预想的效果。
下面通过附图和实施例,对本发明的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本发明的实施例,并且连同描述一起用于解释本发明的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本发明,其中:
图1是根据本申请在视频图像中展示业务对象的方法一实施例的流程图;
图2是根据本申请在视频图像中展示业务对象的方法另一实施例的流程图;
图3是根据本申请在视频图像中展示业务对象的方法又一实施例的流程图;
图4是根据本申请在视频图像中展示业务对象的方法再一实施例的流程图;
图5是根据本申请在视频图像中展示业务对象的方法还一实施例的流程图;
图6是根据本申请在视频图像中展示业务对象的方法又一实施例的流程图;
图7是根据本申请在视频图像中展示业务对象的装置一实施例的结构框图;
图8是根据本申请在视频图像中展示业务对象的装置另一实施例的结构框图;
图9是根据本申请电子设备一实施例的结构示意图;
图10为本申请电子设备另一实施例的结构示意图。
具体实施方式
现在将参照附图来详细描述本发明的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本发明的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本发明及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本领域技术人员可以理解,本申请实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等,既不代表任何特定技术含义,也不表示它们之间的必然逻辑顺序。
本发明实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
下面结合附图(若干附图中相同的标号表示相同的元素)和实施例,对本申请实施例的具体实施方式作进一步详细说明。以下实施例用于说明本申请,但不用来限制本申请的范围。
图1是根据本申请在视频图像中展示业务对象的方法一实施例的流程图。参照图1,本实施例在视频图像中展示业务对象的方法包括:
步骤S102:从视频图像中检测至少一个目标对象,并确定该至少一个目标对象的特征点。
本申请各实施例中,视频图像是视频中的视频数据帧对应的图像,可以是正在直播的视频中的图像,即:直播类视频图像,也可以是预先完成录制后期播放的视频中的图像等。每个视频图像中可以包括一定的目标对象,例如人物、手势、背景等。
本申请各实施例中,目标对象是存在于视频图像中的、易被观众查看的对象,可以包括但不限于:人体(例如人脸、身体部位等)、动作(例如姿势、手势等)、背景等。目标对象一般具有一定数量的特征点,例如人脸中包括的眼睛、鼻子、嘴巴、脸部轮廓的传统68个特征点,再例如手部包括的指尖、指谷、手部轮廓的特征点,再例如背景边界的特征点等等,本申请实施例不对目标对象及目标对象的特征点作具体限制,可以适用于任意目标对象及其任意特征点。
在本操作的一个可选示例中,可以采用相应的特征提取算法或者使用神经网络模型(如卷积网络 模型)等,从视频图像中检测目标对象并确定该目标对象的特征点时,通过检测视频图像中的目标对象并确定目标对象的特征点,可以为后续确定待展示的业务对象的展示位置提供依据。例如,确定了背景边界的特征点后,可以在背景的适当位置展示业务对象,或者,确定了人脸的特征点后,则可以在人脸的适当位置(如额头、脸颊等)展示业务对象。
在一个可选示例中,步骤S102可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一确定模块702执行。
步骤S104:根据上述至少一个目标对象的特征点,确定待展示的业务对象在视频图像中的展示位置。
本申请各实施例中,业务对象例如可以包括但不限于:包含有语义信息(例如广告、娱乐、天气预报、交通预报、宠物等信息)的特效,其中特效例如可以是三维(3D)形式的特效,如使用3D特效形式展示的广告等3D形式的广告特效;也可以是二维(2D)形式的贴纸,如使用贴纸形式展示的广告等2D形式的广告贴纸特效;还可以是粒子特效等。但不限于此,其它形式的业务对象也同样适用本申请各实施例的技术方案,如应用(APP)或应用的文字说明或介绍,或者一定形式的与视频观众交互的对象(如电子宠物)等。
其中,根据目标对象的特征点,确定待展示的业务对象在视频图像中的展示位置的方式将在后文详细描述,例如,可以根据目标对象的特征点,按照设定规则确定展示位置;再如,可以根据目标对象的特征点,使用训练好的神经网络模型(如卷积网络模型)确定;等等。
在目标对象的特征点确定后,可以以此为依据,确定待展示的业务对象在视频图像中的一个或多个展示位置,其中的多个包括两个和两个以上的数量。
在一个可选示例中,步骤S104可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二确定模块704执行。
步骤S106:在确定的展示位置采用计算机绘图方式绘制上述待展示的业务对象。
在确定了展示位置后,即可在该展示位置采用计算机绘图方式绘制业务对象,以进行业务对象展示。当业务对象为包含有语义信息的贴纸,可以使用该贴纸进行广告投放和展示,例如,通过虚拟瓶盖类型展示某一产品的名称,吸引观众观看,提升广告投放和展示趣味性,提高广告投放和展示效率。当业务对象为贴纸,如广告贴纸时,在采用计算机绘图方式进行业务对象的绘制时,可以先获取业务对象的相关信息,如业务对象的标识、大小等。在确定了展示位置之后,可以根据展示位置所在区域(如展示位置的矩形区域)的坐标,对业务对象进行缩放、旋转等调整,进而通过相应的绘图方式进行绘制。在某些情况下,广告还可以以3D特效形式展示,如通过粒子特效方式展示广告的文字或商标(LOGO)等等。
在本申请各实施例的一个可选示例中,采用计算机绘图方式绘制业务对象,可以通过适当的计算机图形图像绘制或渲染等方式实现,例如可以包括但不限于:基于开放图形语言(OpenGL)图形绘制引擎进行绘制等。OpenGL定义了一个跨编程语言、跨平台的编程接口规格的专业的图形程序接口,其与硬件无关,可以方便地进行2D或3D图形图像的绘制。通过OpenGL图形绘制引擎,不仅可以实现2D效果如2D贴纸的绘制,还可以实现3D特效的绘制及粒子特效的绘制等等。但本申请不限于基于OpenGL图形绘制引擎的绘制方式,还可以采取其它方式,例如基于Unity或OpenCL等图形绘制引擎的绘制方式也同样适用于本申请各实施例。
在一个可选示例中,步骤S106可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的绘制模块706执行。
通过本实施例提供的在视频图像中展示业务对象的方法,从视频图像中检测目标对象并确定目标对象的特征点,不同的目标对象具有不同的特征点,将确定的目标对象的特征点作为确定待展示的业务对象的展示位置的依据,确定待展示的业务对象的展示位置,在确定的展示位置采用计算机绘图方式绘制业务对象,以进行业务对象的展示。当业务对象为待展示的广告时,一方面,在确定的展示位置采用计算机绘图方式绘制业务对象,该业务对象与视频播放相结合,无须通过网络传输与视频无关的额外广告视频数据,有利于节约网络资源和客户端的系统资源;另一方面,业务对象与视频图像中的目标对象紧密结合,可以一种不打扰观众的方式展示业务对象,不影响观众的正常视频观看体验,不易引起观众反感,有利于提高业务对象的投放、实现预想的展示效率和效果。
在本发明上述各实施例的一个可选示例中,上述的业务对象可以包括:多个关联业务对象。相应地,上述各实施例中,确定待展示的业务对象在视频图像中的展示位置,可以包括:确定多个待展示的关联业务对象在视频图像中相应的展示位置。在展示位置采用计算机绘图方式绘制业务对象,可以包括:在确定的多个相应的展示位置采用计算机绘图方式分别绘制多个关联业务对象。
在本申请各实施例的一个可选示例中,多个关联业务对象例如可以包括但不限于以下至少之一或 任意多项:用于展示同一业务对象主题的、包含有语义信息的多个特效、包含有语义信息的同一特效的多个展示部分、属于同一业务对象提供者提供的包含有语义信息的多个特效。作为其中一个示例,该特效可以包含广告信息的二维贴纸特效、三维特效、粒子特效中任意一种。此外,其它形式的业务对象也同样适用本申请实施例提供的视频图像处理方案,例如,可口可乐公司提供的脸颊贴纸特效、额头贴纸特效、背景贴纸特效等。再例如,游戏主题的虚拟头饰贴纸特效、虚拟服装贴纸特效、有关游戏场景的背景贴纸特效等。
在本申请各实施例的一个可选示例中,上述多个相应的展示位置包括以下至少一个或任意多个:视频图像中人物的头发区域、额头区域、脸颊区域、下巴区域、头部以外的身体区域、视频图像中的背景区域、视频图像中以手部所在的区域为中心的设定范围内的区域、视频图像中预先设定的区域。由此,多个待展示的关联业务对象可在同一展示位置上展示,也可以在不同展示位置上展示。
以二维贴纸特效为例,当多个关联业务对象为用于展示同一业务对象主题的包含广告信息的多个二维贴纸特效,或者包含广告信息的同一二维贴纸特效的多个展示部分,再或者属于同一业务对象提供者提供的包含广告信息的多个二维贴纸特效时,可以使用该多个二维贴纸特效或同一二维贴纸特效的多个展示部分进行广告投放和展示。例如,在直播视频中,通过虚拟瓶盖类型贴纸特效在主播的嘴巴位置展示某一产品的名称,同时通过虚拟容器类型贴纸特效在主播手部位置展示该产品,以及通过背景类型贴纸特效在直播视频的背景展示该产品及其名称,极大地吸引观众的注意力和关注度,提升广告投放和展示趣味性,提高广告投放和展示效率。
图2是根据本申请在视频图像中展示业务对象的方法另一实施例的流程图。如图2所示,本实施例在视频图像中展示业务对象的方法包括:
步骤S202,从视频图像中检测至少一个目标对象,并确定该至少一个目标对象的特征点。
视频图像可以是视频中的视频数据帧对应的图像,每个图像中都具有一定的目标对象,如人物、手势、背景等。以视频图像为直播类视频图像为例,直播视频大多以主播和主播身后的背景(如主播的家或者其他视频录制场地等)组成。检测直播类视频图像可得到一个目标对象如人脸,或者多个目标对象如人脸、背景、动作等。
本申请各实施例中,对视频图像中的目标对象进行检测并确定特征点,可以采用任意适当的相关技术中的方式实现,例如,可以采用线性特征提取方式,如主成分分析(PCA)、线性判别分析(LDA)、独立成分分析(ICA)等;再例如,还可以采用非线性特征提取方式,如核主成分分析(Kernel PCA)、流形学习等;也可以使用训练完成的神经网络模型如本申请实施例中的卷积网络模型进行目标对象特征点的提取,本申请实施例对此不作限制。
例如,在直播应用进行视频直播的过程中,从直播的视频图像中检测目标对象并确定目标对象的特征点;再例如,电子设备在某一录制的视频的播放过程中,从播放的视频图像中检测目标对象并确定目标对象的特征点;又例如,电子设备在某一视频的录制过程中,从录制的视频图像中检测目标对象并确定目标对象的特征点等等。
在一个可选示例中,步骤S202可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一确定模块702执行。
在步骤S204,根据上述至少一个目标对象的特征点,确定待展示的多个关联业务对象在视频图像中相应的展示位置。
在一个可选示例中,步骤S204可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二确定模块704执行。
在步骤S206,在确定的上述相应的展示位置采用计算机绘图方式分别绘制上述多个关联业务对象。
在确定了多个展示位置后,即可在相应的展示位置采用计算机绘图方式绘制多个关联业务对象,以进行关联业务对象展示。当关联业务对象为贴纸,例如广告贴纸时,在采用计算机绘图方式绘制业务对象时,可以先获取关联业务对象的相关信息,例如关联业务对象的标识、大小等。在确定了展示位置之后,可以根据展示位置所在区域(如展示位置的矩形区域)的坐标,对关联业务对象进行缩放、旋转等调整,进而通过相应的绘图方式如OpenGL图形绘制引擎进行绘制。
举例来说,假设本实施例中检测出的目标对象是人脸和背景,确定的三个展示位置是人脸中嘴巴、手部、背景,就可以在嘴巴的展示位置绘制带有某一业务对象提供者(如可口可乐公司)的包含广告信息的多个特效,例如在手部的展示位置绘制虚拟容器(如可口可乐的饮料瓶)的贴纸特效,在背景的展示位置绘制如以可口可乐公司海报为背景的贴纸特效。
在一个可选示例中,步骤S206可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的绘制模块706执行。
随着互联网直播的兴起,越来越多的视频以直播的方式出现。这类视频具有场景简单、实时、因 观众主要在手机等电子设备上观看而视频图像尺寸较小等特点。在此情况下,对于某些业务对象的投放如广告投放来说,一方面,由于电子设备的屏幕展示区域有限,如果以传统的固定位置放置广告,可能会占用主要的用户体验区域,不仅容易引起用户反感,还可能导致直播的主播者丢失观众;另一方面,对于主播类直播应用,由于直播的即时性,传统的插入固定时长的广告可能会打扰用户与主播交流的连贯性,影响用户观看体验;再一方面,对于短视频广告,由于直播或者短视频的内容时长本来就较短,也给采用传统方式插入固定时长的广告带来了困难。而本申请实施例通过业务对象投放广告,将广告投放与视频直播内容有效融合,方式灵活,效果生动,不仅有利于改善用户的直播观看体验,且有利于提升广告的投放效果。对于使用较小的显示屏幕进行业务对象展示,广告投放等场景较为适用。可以理解,业务对象的投放除了广告之外,还可广泛应用到其他方面,比如教育、咨询、服务等行业,可通过投放娱乐性、赞赏性等业务信息来提高交互效果,改善用户体验。
通过本实施例提供的视频图像处理方法,从视频图像中检测至少一个目标对象并确定该至少一个目标对象的特征点,不同的目标对象具有不同的特征点;将确定的至少一个目标对象的特征点作为确定待展示的业务对象的展示位置的依据,确定多个待展示的关联业务对象的展示位置;在确定的多个展示位置采用计算机绘图方式分别绘制关联业务对象,以进行业务对象的展示。当关联业务对象为待展示的广告时,一方面,关联业务对象与视频图像中的目标对象相互衬托、紧密结合,可在不影响观众的正常视频观看的情况下,多角度展示关联业务对象,从而吸引了观众的注意力,进而提高了业务对象的影响力;另一方面,在确定的展示位置采用计算机绘图方式绘制关联业务对象,该业务对象与视频播放相结合,无须通过网络传输与视频无关的额外广告视频数据,有利于节约网络资源和客户端的系统资源。
另外,在本申请上述图1或图2所示各实施例之前,还可以包括获取视频图像的操作。例如,从直播应用中获取当前正在播放的视频中的图像(即直播类视频图像),或者,从正在录制的视频中获取视频图像,本申请实施例对视频图像的获取方式不作限制。
本申请各实施例中,以对一张视频图像的处理为例,但本领域技术认员应当明了,对于多张视频图像或视频流中的视频图像序列均可参照本申请实施例进行视频图像处理。在本申请各实施例的可选示例中,在根据至少一个目标对象的特征点确定待展示的业务对象或关联业务对象在视频图像中的展示位置时,可行的实现方式例如可以包括:
方式一,根据至少一个目标对象的特征点,使用预先训练好的、用于确定业务对象在视频图像中的展示位置的卷积网络模型,确定待展示的业务对象或待展示的多个关联业务对象在视频图像中的展示位置;
方式二,根据至少一个目标对象的特征点,确定目标对象的类型;根据该至少一个目标对象的类型,确定待展示的业务对象的展示区域;根据展示区域,确定待展示的业务对象在视频图像中的展示位置;若业务对象包括多个关联业务对象,则该方式二相应为:根据至少一个目标对象的类型,确定多个关联业务对象相应的展示区域;根据该多个关联业务对象相应的展示区域,确定该多个关联业务对象在视频图像中相应的展示位置;
方式三,根据至少一个目标对象的特征点和待展示的业务对象的类型,确定待展示的业务对象在视频图像中的展示位置;若业务对象包括多个关联业务对象,则该方式三相应为:根据至少一个目标对象的特征点和待展示的多个关联业务对象的类型,确定该多个关联业务对象在视频图像中的展示位置;
方式四,从预先存储的目标对象的特征点与展示位置的对应关系中,获取与上述至少一个目标对象的特征点相对应的目标展示位置,将获取的目标展示位置确定为待展示的业务对象在视频图像的展示位置;若业务对象包括多个关联业务对象,则该方式四相应为:从预先存储的目标对象的特征点与展示位置的对应关系中,获取与上述至少一个目标对象的特征点相对应的目标展示位置,将获取的多个目标展示位置确定为上述多个关联业务对象在视频图像中相应的展示位置。
以下,分别对上述四种方式进行示例性说明。
方式一
在使用方式一确定待展示的业务对象在视频图像中的展示位置时,预先训练一个卷积网络模型,训练完成的该卷积网络模型具有确定业务对象或多个关联业务对象在视频图像中的展示位置的功能;或者,也可以直接使用第三方已训练完成的、具有确定业务对象或多个关联业务对象在视频图像中的展示位置的功能的卷积网络模型。
需要说明的是,本实施例中,示例性地以对业务对象的训练进为例行说明,对目标对象部分的训练可以参照相关技术实现,本申请实施例对此仅做简要说明。
训练卷积网络模型时,一种可行的训练方式包括以下过程:
(1)利用卷积网络模型获取待训练的业务对象样本图像的特征向量。
其中,该特征向量中包括:业务对象样本图像中的目标对象的信息,以及,业务对象的位置信息和/或置信度信息。其中,目标对象的信息指示了目标对象的图像信息。业务对象的位置信息指示了业务对象的位置,可以是业务对象中心点的位置信息,也可以是业务对象所在区域的位置信息。业务对象的置信度信息指示了业务对象展示在当前位置时,能够达到的展示效果(如被关注、或被点击、或被观看)的概率,该概率可以根据对历史数据的统计分析结果设定,也可以根据仿真实验的结果设定,还可以根据人工经验进行设定。在实际应用中,在对目标对象进行训练的同时,可以根据实际需要,仅对业务对象的位置信息进行训练,也可以仅对业务对象的置信度信息进行训练,还可以对二者均进行训练。对业务对象的位置信息和置信度信息均进行训练,能够使得训练后的卷积网络模型更为有效和精准地确定业务对象的位置信息和置信度信息,以便为业务对象的展示提供依据。
卷积网络模型通过大量的样本图像进行训练,本申请实施例中的业务对象样本图像中的业务对象可以被预先标注位置信息、或者置信度信息,或者二种信息都有。当然,在实际应用中,位置信息、置信度信息也可以通过其它途径获取。而通过预先在对业务对象进行位置信息、置信度信息的标注,可以有效节约数据处理的数据量和交互次数,提高数据处理效率。
将具有目标对象信息、以及业务对象的位置信息和/或置信度信息的业务对象样本图像作为训练样本,对其进行特征向量提取,获得包含有目标对象信息、以及业务对象的位置信息和/或置信度信息的特征向量。
其中,对特征向量的提取可以采用相关技术中的适当方式实现,本发明实施例在此不再赘述。
可选地,也可以使用相同的卷积网络模型对目标对象和业务对象分别进行训练,在此情况下,业务对象样本图像的特征向量中,可以分别包括目标对象的信息,或者包括业务对象的位置信息和/或置信度信息。
(2)利用卷积网络模型对该特征向量进行卷积处理,获取特征向量卷积结果。
获取的特征向量卷积结果中包含有目标对象的信息、以及业务对象的位置信息和/或置信度信息。
对特征向量的卷积处理次数可以根据实际需要进行设定,也即:卷积网络模型中,卷积层的层数可以根据实际需要进行设置,使得最终的特征向量卷积结果可以满足误差在一定范围内的标准即可。例如,最终的特征向量卷积结果为图像长或宽的1/20~1/5,在其中一个示例中,最终的特征向量卷积结果可以为图像长或宽的1/10。
卷积结果是对特征向量进行了特征提取后的结果,该卷积结果能够有效表征视频图像中各相关对象(例如,目标对象、业务对象)的特征和分类。
本申请实施例中,当特征向量中既包含业务对象的位置信息,又包含业务对象的置信度信息时,也即:在对业务对象的位置信息和置信度信息均进行了训练的情况下,该特征向量卷积结果在后续分别进行收敛条件判断时共享,无须进行重复处理和计算,减少了由数据处理引起的资源损耗,提高了数据处理速度和效率。
在一个可选示例中,该操作(1)和(2)可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的卷积网络模型执行。
(3)分别判断特征向量卷积结果中对应的目标对象的信息,以及业务对象的位置信息和/或置信度信息是否满足收敛条件。
其中,收敛条件可以由本领域技术人员根据实际需求适当设定。当上述信息满足收敛条件时,可以认为卷积网络模型中的参数设置适当;当上述信息不能满足收敛条件时,可以认为卷积网络模型中的参数设置不适当,需要对其进行调整,该调整是一个迭代的过程,即迭代执行该训练方式中的操作(1)~(3),直至特征向量卷积结果中的信息满足收敛条件。
一种可行方式中,收敛条件可以根据预设的标准位置和/或预设的标准置信度进行设定,例如,将特征向量卷积结果中业务对象的位置信息指示的位置与该预设的标准位置之间的距离满足一定阈值作为业务对象的位置信息的收敛条件;将特征向量卷积结果中业务对象的置信度信息指示的置信度与该预设的标准置信度之间的差别满足一定阈值作为业务对象的置信度信息的收敛条件等。
其中,可选地,预设的标准位置可以是对待训练的业务对象样本图像中的业务对象的位置进行平均处理后获得的平均位置;和/或,预设的标准置信度可以是对待训练的业务对象样本图像中的业务对象的置信度进行平均处理后获取的平均置信度。依据待训练的业务对象样本图像中的业务对象的位置和/或置信度设定标准位置和/或标准置信度,因业务对象样本图像为待训练样本且数据量庞大,因而设定的标准位置和标准置信度也更为客观和精确。
示例性地,在判断特征向量卷积结果中对应的业务对象的位置信息和/或置信度信息是否满足收敛条件时,一种可行的方式包括:
获取特征向量卷积结果中对应的业务对象的位置信息;使用第一损失函数,计算对应的业务对象的位置信息指示的位置与预设的标准位置之间的第一距离;根据该第一距离判断对应的业务对象的位置信息是否满足收敛条件;
和/或,
获取特征向量卷积结果中对应的业务对象的置信度信息;使用第二损失函数,计算对应的业务对象的置信度信息指示的置信度与预设的标准置信度之间的第二距离;根据该第二距离判断对应的业务对象的置信度信息是否满足收敛条件。
在一种可选的实施方式中,第一损失函数可以为:计算对应的业务对象的位置信息指示的位置与预设的标准位置之间的欧式距离的函数;和/或,第二损失函数可以为:计算对应的业务对象的置信度信息指示的置信度与预设的标准置信度之间的欧式距离的函数。采用欧式距离的方式,实现简单且能够有效指示收敛条件是否被满足。但本申请实施例并不限于此,还可以采用马式距离、巴式距离等其它方式,也同样适用。
对于特征向量卷积结果中的目标对象的信息,对目标对象的信息是否收敛的判断方式,可以参照相关使用卷积网络模型的收敛条件进行判断,在此不再赘述。若目标对象的信息满足收敛条件,则可对目标对象进行分类,明确目标对象的所属类别,以为后续业务对象的展示位置确定提供参考和依据。
(4)若满足收敛条件,则完成对卷积网络模型的训练;若不满足收敛条件,则根据特征向量卷积结果,调整卷积网络模型的参数,并根据调整后的卷积网络模型的参数对卷积网络模型进行迭代训练,直至迭代训练后获得的特征向量卷积结果满足收敛条件。
在一个可选示例中,该操作(3)和(4)可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的训练模块708执行。
通过上述训练方式对卷积网络模型训练完成后,卷积网络模型可以对基于目标对象进行展示的业务对象的展示位置进行特征提取和分类,从而具有确定业务对象在视频图像中的展示位置的功能。其中,当展示位置包括多个时,通过上述业务对象置信度的训练,卷积网络模型还可以确定出多个展示位置中的展示效果的优劣顺序,从而确定最优的展示位置。在后续应用中,当需要展示业务对象时,根据视频中的当前图像即可确定出有效的展示位置。
此外,在对卷积网络模型进行上述训练之前,还可以预先对业务对象样本图像进行预处理,可以包括:获取多个业务对象样本图像,其中,每个业务对象样本图像中包含有业务对象的标注信息;根据该标注信息确定业务对象的位置,判断确定的业务对象的位置与预设位置的距离是否小于或等于设定阈值;将确定的位置与预设位置的距离小于或等于设定阈值的业务对象对应的业务对象样本图像,确定为待训练的业务对象样本图像,参与上述训练过程。其中,预设位置和设定阈值均可以由本领域技术人员采用任意适当方式进行适当设置,例如根据数据统计分析结果、或者相关距离计算公式、或者人工经验等,本申请实施例对此不作限制。
在一种可行方式中,根据标注信息确定的业务对象的位置可以是业务对象的中心位置。在根据标注信息确定业务对象的位置,判断确定的业务对象的位置与预设位置的距离是否小于或等于设定阈值时,可以根据标注信息确定业务对象的中心位置;进而判断该中心位置与预设位置的方差是否小于或等于设定阈值。
通过预先对业务对象样本图像进行预处理,可以过滤掉不符合条件的样本图像,以保证训练结果的准确性。
通过上述过程实现了卷积网络模型的训练,训练完成的该卷积网络模型可以用来确定业务对象在视频图像中的展示位置。例如,在视频直播过程中,若主播点击业务对象指示进行业务对象展示时,在卷积网络模型获得了直播的视频图像中主播的面部特征点后,可以指示出展示业务对象的最优位置如主播的额头位置、主播的嘴巴位置、直播视频中背景位置等,进而控制直播应用在该位置展示业务对象或关联业务对象(如同一对象主题的、包含有语义信息的多张贴纸);或者,在视频直播过程中,若主播点击业务对象指示进行业务对象展示时,卷积网络模型可以直接根据直播的视频图像确定业务对象的展示位置。
方式二
在方式二中,首先根据上述至少一个目标对象的特征点,确定该至少一个目标对象的类型;再根据该至少一个目标对象的类型,确定待展示的业务对象的展示区域;然后根据该展示区域,确定待展示的业务对象在视频图像中的展示位置。
其中,目标对象的类型例如可以包括但不限于:人脸类型、背景类型、手部类型和动作类型。其中,人脸类型表示人脸在视频图像中占据主要部分,背景类型表示背景在视频图像中占据较大部分,手部类型表示手部在视频图像中占据主要部分,而动作类型则表示视频图像中人物进行了某种动作。
其中,在获取了上述至少一个目标对象的特征点之后,可以采用相关检测、分类或学习方法确定目标对象的类型。在确定了该至少一个目标对象的类型之后,可以按照设定的规则确定待展示的业务对象的展示区域,例如:
当目标对象的类型为人脸类型时,确定待展示的业务对象的展示区域包括以下至少之一或任意多项:视频图像中人物的头发区域、额头区域、脸颊区域、下巴区域、头部以外的身体区域;和/或,
当目标对象的类型为背景类型时,确定待展示的业务对象的展示区域包括:视频图像中的背景区域;和/或,
当目标对象的类型为手部类型时,确定待展示的业务对象的展示区域包括:视频图像中以手部所在的区域为中心的、设定范围内的区域;和/或,
当目标对象的类型为动作类型时,确定待展示的业务对象的展示区域包括:视频图像中预先设定的区域。其中,该预先设定的区域可以由根据实际情况适当设置,例如,以动作生成部位为中心的设定范围内的区域,或者,动作生成部位以外的设定范围内的区域,或者背景区域等等,本申请实施例对此不作限制。
也就是说,业务对象包括多个关联业务对象时,上述头发区域、额头区域、背景区域、手部区域等确定的待展示的业务对象的展示区域可以通过组合的方式来展示多个关联业务对象,即在不同的展示位置上展示多个关联业务对象。另外,也可以在同一展示位置(如头发区域)上展示多个待展示的关联业务对象。以主播型直播视频场景为例,该场景通常包括直播和短视频分享的常见场景,该场景的主体常常为一个主要人物(如主播)和简单背景(如主播的家),人物常常在画面中占比较多。例如,当视频主体为人物时,观众主要关注的区域为主体的脸部区域和肢体动作,为了能够既让观众注意到广告的内容,同时不会影响到视频的主体,可以通过增强现实感效果,给画面中的人物和背景等一些相关区域相应地加上多个有语义的虚拟物品如包含广告信息的2D贴纸特效(即业务对象)。并通过多个虚拟物品上组合的展示效果和信息达到商业价值。通过这种方式,既保留了视频主体的主要形象和动作,也通过增强现实的特效为视频增加了趣味性,有利于减少观众对广告投放引起的可能的反感、并吸引到观众的注意力,形成商业的价值。
在一种可选的实施方式中,上述动作类型对应的动作包括以下至少之一或任意多个:眨眼、张嘴、点头、摇头、亲吻、微笑、挥手、剪刀手、握拳、托手、竖大拇指、摆手枪姿势、摆V字手、摆OK手。
在确定了展示区域后,可以进一步确定待展示的业务对象在视频图像中的展示位置。例如,以展示区域的中心点为业务对象的展示位置中心点,进行业务对象的展示;再例如,将展示区域中的某一坐标位置确定为展示位置的中心点等,本申请实施例对此不作限制。
方式三
在另一种可选的实施方案中,与上述方式二相比,该方式三中,在确定待展示的业务对象在视频图像中的展示位置时,不仅可以根据目标对象的特征点,还可以根据待展示的业务对象的类型,确定该待展示的业务对象在视频图像中的展示位置。其中,业务对象的类型包括以下至少之一或任意多项:额头贴片类型、脸颊贴片类型、下巴贴片类型、虚拟帽子类型、虚拟服装类型、虚拟妆容类型、虚拟头饰类型、虚拟发饰类型、虚拟首饰类型、背景类型、虚拟宠物类型、虚拟容器类型。但不限于此,业务对象的类型还可以为其它适当类型,如虚拟瓶盖类型,虚拟杯子类型、文字类型等等。
由此,根据业务对象的类型,可以以目标对象的特征点为参考,为业务对象选择适当的展示位置。
此外,在根据目标对象的特征点和待展示的业务对象的类型,获得待展示的业务对象在视频图像中的多个展示位置的情况下,可以从多个展示位置中选择至少一个展示位置作为最终的展示位置。例如,对于文字类型的业务对象,可以展示在背景区域,也可以展示在人物的额头或身体区域等。
方式四
在该方式四中,从预先存储的目标对象的特征点与展示位置的对应关系中,获取与上述至少一个目标对象的特征点相对应的目标展示位置;将获取的目标展示位置确定为待展示的业务对象在视频图像的展示位置。这里,目标对象的特征点与展示位置的对应关系可以预先设置并以映射表等形式存储起来,本申请实施例对对应关系的存储形式不作限制。
图3是根据本申请在视频图像中展示业务对象的方法又一实施例的流程图。本实施例以业务对象为包含有语义信息的贴纸,具体为广告贴纸为例,对本申请实施例的在视频图像中展示业务对象的方案进行说明。如图3所示,
本实施例在视频图像中展示业务对象的方法包括:
步骤S302:获取业务对象样本图像并进行预处理,确定待训练的业务对象样本图像。
业务对象样本图像中可能存在一些不符合卷积网络模型的训练标准的样本图像,本实施例中,通 过对业务对象样本图像的预处理,可以将这部分不符合卷积网络模型的训练标准的样本图像过滤掉。
本实施例中,每个业务对象样本图像中包含有进行了标注的目标对象和标注的业务对象,业务对象标注有位置信息和置信度信息。一种可行的实施方案中,将业务对象的中心点的位置信息作为该业务对象的位置信息。本步骤中,根据业务对象的位置信息对样本图像进行过滤。获得位置信息指示的位置的坐标后,将该坐标与预设的该类型的业务对象的位置坐标进行比对,计算二者的位置方差。若该位置方差小于或等于设定的阈值,则该业务对象样本图像可以作为待训练的样本图像;若该位置方差大于设定的阈值,则过滤掉该业务对象样本图像。其中,预设的位置坐标和设定的阈值均可以由本领域技术人员根据实际情况适当设置,例如,由于一般用于卷积网络模型训练的样本图像具有相同的大小,设定的阈值可以为图像长或宽的1/20~1/5,例如,设定的阈值可以为图像长或宽的1/10。
此外,还可以对确定的待训练的业务对象样本图像中的业务对象的位置和置信度分别求取平均值,获取平均位置和平均置信度,该平均位置和平均置信度可以作为后续确定收敛条件的依据。
当以业务对象为广告贴纸为实例时,本实施例中用于训练的业务对象样本图像标注有最优广告位置的坐标和该广告位的置信度。其中,该最优广告位置的坐标可以在人脸、手势、前背景等地方标注,可以实现面部特征点、手势、前背景等地方的广告位的联合训练,这相对于基于面部、手势等某一项广告位置及其置信度单独训练的方案,有利于节省计算资源。置信度的大小表示了这个广告位是最优广告位的概率,例如,如果这个广告位是被遮挡多,则置信度低。
在一个可选示例中,该操作S302可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的预处理模块7080执行。
步骤S304:使用确定的待训练的业务对象样本图像,对卷积网络模型进行训练。
在一个可选示例中,该操作S302可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的训练模块708执行。
本实施例中,一种可选的卷积网络模型结构的简要说明如下:
(1)输入层
例如,可以输入待训练的业务对象样本图像的特征向量,该特征向量中既包含目标对象的信息,也包含业务对象的信息:业务对象的位置信息和置信度信息。
(2)卷积层
//第一阶段,对待训练的业务对象样本图像的特征向量进行卷积处理,获得特征向量卷积结果并共享该特征向量卷积层结果。
2.<=1卷积层1_1(3x3x64)
3.<=2非线性响应ReLU层
4.<=3卷积层1_2(3x3x64)
5.<=4非线性响应ReLU层
6.<=5池化层(3x3/2)
7.<=6卷积层2_1(3x3x128)
8.<=7非线性响应ReLU层
9.<=8卷积层2_2(3x3x128)
10.<=9非线性响应ReLU层
11.<=10池化层(3x3/2)
12.<=11卷积层3_1(3x3x256)
13.<=12非线性响应ReLU层
14.<=13卷积层3_2(3x3x256)
15.<=14非线性响应ReLU层
16.<=15卷积层3_3(3x3x256)
17.<=16非线性响应ReLU层
18.<=17池化层(3x3/2)
19.<=18卷积层4_1(3x3x512)
20.<=19非线性响应ReLU层
21.<=20卷积层4_2(3x3x512)
22.<=21非线性响应ReLU层
23.<=22卷积层4_3(3x3x512)
24.<=23非线性响应ReLU层
25.<=24池化层(3x3/2)
26.<=25卷积层5_1(3x3x512)
27.<=26非线性响应ReLU层
28.<=27卷积层5_2(3x3x512)
29.<=28非线性响应ReLU层
30.<=29卷积层5_3(3x3x512)
31.<=30非线性响应ReLU层
32.<=31池化层(3x3/2)
//第二阶段第一训练分支,对第一阶段特征向量卷积结果中业务对象即广告贴纸的位置进行回归分析,预测最优广告贴纸的位置坐标。
33.<=32卷积层6_1(1x1x2304)
34.<=33非线性响应ReLU层
35.<=34卷积层6_2(1x1x2)
36.<=35损失层,进行最优广告位坐标回归
//第二阶段第二训练分支,对第一阶段特征向量卷积结果中业务对象即广告贴纸的置信度进行回归分析,预测广告贴纸的置信度。
37.<=31池化层(3x3/2)
38.<=37卷积层cls_6_1(1x1x4096)
39.<=38非线性响应ReLU层
40.<=39卷积层cls_6_2(1x1x4096)
41.<=40非线性响应ReLU层
42.<=41卷积层cls_7_1(1x1x1)
43.<=42损失层,进行置信度回归
(3)输出层
本实施例中,输出层的输出可以为35层(即:卷积层6_2(1x1x2))和42层(即:卷积层cls_7_1(1x1x1))的预测值。
需要说明的是:
第一,上述第二阶段第一训练分支和第二阶段第二训练分支共享第一阶段的特征向量卷积结果,有利于节省计算资源;
第二,上述第二阶段第一训练分支和第二阶段第二训练分支的训练可以不分先后顺序,也可以并行执行,也可以以任意时间顺序执行;
第三,本实施例中,第一阶段的特征向量卷积结果中可以既包含有目标对象的特征提取和分类结果,也包含有业务对象的特征提取和分类结果,还包含有业务对象的位置信息和置信度信息的特征提取和分类结果;
第四,在第二阶段第一训练分支中,对最优广告贴纸的位置的预测可以是迭代多次进行的,每完成一次最优广告贴纸的位置的预测,就根据预测结果调整卷积网络模型的网络参数(如卷积核的值、层间输出线性变化的权重,等等),基于参数调整后的卷积网络模型再进行预测,迭代多次,直至满足收敛条件。具体地,在第一训练分支中,损失层36使用第一损失函数确定第一阶段训练出的广告贴纸的位置是否满足收敛条件,在不满足收敛条件的情况下,卷积网络模型将进行反向传播,调整该卷积网络模型的训练参数,实现最优广告贴纸位置的回归计算。其中,本实施例中,第一损失函数可以使用度量欧式距离的函数min x,y(x–xgt)2+(y–ygt)2,其中,(x,y)为待优化的广告贴纸的坐标,(xgt,ygt)为预设的标准位置的坐标。一种可选的实施方案中,该预设的标准位置可以为步骤S302中获得的对待训练的业务对象样本图像中的业务对象的位置进行平均后的平均位置;
其中,收敛条件例如可以是,待优化的广告贴纸的坐标和预设的标准位置的坐标相同,或者,待优化的广告贴纸的坐标和预设的标准位置的坐标的差异小于一定阈值(如图像长或宽的1/20~1/5,可选为1/10),或者,参数优化的迭代次数达到预定次数(如10~20次),等等;
第五,在第二阶段第二训练分支中,对广告贴纸的置信度的预测可以是迭代多次进行的,每完成一次广告贴纸的置信度的预测,就根据预测结果调整卷积网络模型的网络参数(如卷积核的值、层间输出线性变化的权重,等等),基于参数调整后的卷积网络模型再进行预测,迭代多次,直至满足收敛条件。示例性地,在第二训练分支中,损失层43使用第二损失函数确定第一阶段训练出的广告贴纸的置信度是否满足收敛条件,在不满足收敛条件的情况下,卷积网络模型将进行反向传播,调整该卷积 网络模型的训练参数,实现广告贴纸置信度的回归计算。其中,本实施例中,第二损失函数例如可以使用度量欧式距离的函数min p(p–pgt)2,其中,p为待优化的广告贴纸的置信度,pgt为预设的标准置信度。一种可选的实施方案中,该预设的标准置信度可以为步骤S302中获得的对待训练的业务对象样本图像中的业务对象的置信度进行平均后的平均置信度;
收敛条件例如,待优化的置信度和预设的标准置信度相同,或者,待优化的置信度和预设的标准置信度的差异小于一定阈值(如小于或等于25%),或者,参数优化的迭代次数达到预定次数(如10~20次)等等;
第六,上述卷积网络结构的说明中,2.<=1表明当前层为第二层,输入为第一层;卷积层后面括号为卷积层参数(3x3x64)表明卷积核大小为3x3,通道数为64;池化层后面括号(3x3/2)表明池化核大小为3x3,间隔为2。其它依此类推,不再赘述。
在上述卷积网络结构中,每个卷积层之后都有一个非线性响应单元,该非线性响应单元采用纠正线性单元(Rectified Linear Units,ReLU),通过在卷积层后增加上述纠正线性单元,可以将卷积层的映射结果尽量稀疏,以便更接近人的视觉反应,从而使图像处理效果更好。
示例性地,将卷积层的卷积核设为3x3,可以更好的综合视频图像中的局部信息。
设定池化层(Max pooling)的步长stride,使上层特征在不增加计算量的前提下获得更大的视野,同时池化层的步长stride还有增强空间不变性的特征,即允许同样的输入出现在不同的图像位置上,而输出结果响应相同。
但本领域技术人员应当明了的是,上述卷积核的大小、通道数、池化核的大小、间隔以及卷积层的层数数量均为示例性说明,在实际应用中,本领域技术人员可以根据实际需要进行适应性调整,本申请实施例对此不作限制。此外,本实施例中的卷积网络模型中的所有层的组合及参数都是可选的,可以任意组合。
通过本实施例中的卷积网络模型,使用第一训练分支预测最优广告贴纸的位置,使用第二训练分支预测这个位置的置信度,实现了对视频图像中广告贴纸的位置的有效预测。
步骤S306:获取当前视频图像,将当前视频图像作为输入,使用训练后的卷积网络模型从视频图像中检测至少一个目标对象,并确定该至少一个目标对象的特征点。
在一个可选示例中,步骤S306可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一确定模块702执行。
步骤S308:使用训练后的卷积网络模型根据上述至少一个目标对象的特征点,确定待展示的业务对象在当前视频图像中的展示位置。
在一个可选示例中,步骤S306可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二确定模块704执行。
步骤S310:在当前视频图像中的展示位置采用计算机绘图方式绘制待展示的业务对象。
在一个可选示例中,步骤S306可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的绘制模块706执行。
随着互联网直播和短视频分享的兴起,越来越多的视频以直播或者短视频的方式出现。这类视频常常以人物为主角(单一人物或少量人物),以人物加简单背景为主要场景,观众主要在手机等移动终端上观看。通过本实施例提供的方案,可以实时对视频播放过程中的视频图像进行检测,给出效果较好的广告投放位置,不影响用户的观看体验,投放效果更好。当然,除上述场景之外的其它场景也同样适用本申请实施例提供的方案,如视频录制场景等等。
此外,本实施例在视频图像中展示业务对象的方法可以在任意适当的具有数据采集、处理和传输功能的终端设备如移动终端或个人电脑(PC)上实现,本申请实施例对实现设备不作限制。
图4是根据本申请在视频图像中展示业务对象的方法再一实施例的流程图。本实施例仍以业务对象为包含有语义信息的贴纸,具体为广告贴纸为例,对本申请实施例的在视频图像中展示业务对象的方案进行说明。
参照图4,本实施例在视频图像中展示业务对象的方法包括:
步骤S402:从视频图像中检测至少一个目标对象,并确定该至少一个目标对象的特征点。
在一个可选示例中,步骤S402可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一确定模块702执行。
步骤S404:根据该至少一个目标对象的特征点,确定该至少一个目标对象的类型。
每个目标对象都有一定的特征点,比如人脸或者手部的特征点,再比如背景的边界点等,本实施例中,获得目标对象的特征点之后,可以采用相关检测、分类或学习方法确定目标对象的类型。
步骤S406:根据该至少一个目标对象的类型,确定待展示的业务对象的展示区域。
示例性地,当目标对象的类型为人脸类型时,确定待展示的业务对象的展示区域包括以下至少之一或任意多个:视频图像中人物的头发区域、额头区域、脸颊区域、下巴区域、头部以外的身体区域;和/或,
当目标对象的类型为背景类型时,确定待展示的业务对象的展示区域包括:视频图像中的背景区域;和/或,
当目标对象的类型为手部类型时,确定待展示的业务对象的展示区域包括:视频图像中以手部所在的区域为中心的、设定范围内的区域;和/或,
当目标对象的类型为动作类型时,确定待展示的业务对象的展示区域包括:视频图像中预先设定的区域。
以主播型视频场景为例,该场景通常包括直播和短视频分享的常见场景,该场景的主体常常为一个主要人物加简单背景,人物常常在画面中占比较多。例如,当视频主体为人物时,观众主要关注的区域为主体的脸部区域和肢体动作,为了能够既让观众注意到广告的内容,同时不会影响到视频的主体,可以通过增强现实感效果,给画面人物相关区域加上有语义的虚拟物品如广告贴纸(即业务对象),并通过虚拟物品上的展示效果和信息达到商业价值。通过这种方式,既保留了视频主体的主要形象和动作,同时通过增强现实的特效为视频增加了趣味性,有利于减少观众对广告投放引起的可能的反感,并吸引到观众的注意力,形成商业的价值。
例如,在视频直播场景中,一种额头贴片类型的业务对象的展示区域可以是主播额头的区域;一种脸颊贴片类型的业务对象的展示区域可以是主播两侧脸颊的区域;另一种脸颊贴片类型的业务对象的展示区域可以是主播两侧脸颊的区域和背景区域中主播额头上头的区域;一种下巴贴片类型的业务对象的展示区域可以是主播下巴的区域;一种虚拟头饰类型的业务对象的展示区域可以是主播头发及背景中的区域;一种在背景区域展示的业务对象的展示区域可以是在该业务对象在不遮盖背景的情况下进行展示;一种眨眼动作触发展示的业务对象的展示区域可以是主播眼睛处的区域;一种亲吻动作触发展示的业务对象的展示区域可以是主播嘴部处的区域;一种微笑动作触发展示的业务对象的展示区域可以是多个区域;一种挥手动作触发展示的业务对象的展示区域可以是主播手部的区域;一种托手动作触发展示的业务对象的展示区域可以是主播手部上方的区域。
步骤S408:根据展示区域,确定待展示的业务对象在视频图像中的展示位置。
确定的展示区域可能仅包括一个区域,也可能包括多个区域,可以根据业务对象的类型,从中确定出一个或多个展示区域进行业务对象绘制和展示。
例如,当目标对象的类型为人脸类型,而业务对象的类型为额头贴片类型时,则可以确定业务对象在视频图像中的展示区域为相应的额头区域,以额头区域的中心点为展示位置中心绘制并展示业务对象。再例如,当目标对象的类型为人脸类型,而业务对象的类型为文字类型时,则业务对象在视频图像中的展示区域可以包括身体区域、额头区域、脸颊区域以及背景区域等,可以从中确定一个或多个区域,以相应的区域的中心点为展示位置中心,进行业务对象的绘制和展示。
在一个可选示例中,步骤S404~S08可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的绘制模块706执行。
步骤S410:在展示位置采用计算机绘图方式绘制业务对象并展示。
在一个可选示例中,步骤S410可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二确定模块704执行。
可选地,上述示例中的业务对象均可以为文字形式或图片形式或二者结合形式的贴纸。
通过本实施例的在视频图像中展示业务对象的方法,能够在主播型视频场景中,有效确定合适的广告投放和展示位置,且于视频播放有效融合,无须额外的网络资源和客户端系统资源,在不影响用户视频观看体验的同时,提高了广告投放效果和效率。
在上述图4所示实施例中,若业务对象包括多个关联业务对象,则在步骤S406中,根据至少一个目标对象的类型,确定待展示的多个关联业务对象相应的展示区域;
在步骤S408中,根据待展示的多个关联业务对象相应的展示区域,确定该待展示的多个关联业务对象在视频图像中相应的展示位置。例如,以展示区域的中心点为业务对象的展示位置中心点进行业务对象的展示;再例如,将展示区域中的某一坐标位置确定为展示位置的中心点等,本申请实施例对此不作限制;
在步骤S410,在相应的展示位置采用计算机绘图方式分别绘制上述待展示的多个关联业务对象。
通过本实施例提供的视频图像处理方法,可以有效实现关联业务对象在视频图像中的展示位置的确定,从而在确定的展示位置采用计算机绘图方式分别绘制多个关联业务对象,进而实现了关联业务对象的投放和展示。多个关联业务对象之间组合展示,以及与视频播放有效结合展示,提高了业务对 象的投放和展示效率和效果,也无须额外的数据传输,节约了网络资源和客户端的系统资源。
图5是根据本申请在视频图像中展示业务对象的方法还一实施例的流程图。本实施例在视频图像中展示业务对象的方法可以由任意具有数据采集、处理和传输功能的设备执行,包括但不限于移动终端。个人电脑(PC)等电子设备。该实施例以业务对象包括多个关联业务对象为例进行说明,对于单独的业务对象同样适用。如图5所示,该实施例在视频图像中展示业务对象的方法包括:
步骤S502,从视频图像中检测至少一个目标对象,并确定至少一个目标对象的特征点。
在一个可选示例中,步骤S502可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一确定模块702执行。
步骤S504,根据至少一个目标对象的特征点,使用预先训练好的、用于确定业务对象在视频图像中的展示位置的卷积网络模型,确定待展示的多个关联业务对象在视频图像中相应的展示位置。
在一个可选示例中,步骤S502可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二确定模块704执行。
步骤S506,在确定的相应的展示位置,采用计算机绘图方式分别绘制上述多个关联业务对象进行展示。
在一个可选示例中,步骤S502可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的绘制模块706执行。
通过本实施例提供的视频图像处理方法,基于预先训练好的卷积网络模型可以有效实现关联业务对象在视频图像中的展示位置的确定,从而在确定的展示位置采用计算机绘图方式分别绘制多个关联业务对象,进而实现了关联业务对象的投放和展示。多个关联业务对象之间组合展示,以及与视频播放有效结合展示,有利于提高业务对象的投放和展示效率和效果,也无须额外的数据传输,有利于节约网络资源和客户端的系统资源。
图6是根据本申请在视频图像中展示业务对象的方法又一实施例的流程图。本实施例仍以多个关联业务对象为同一业务对象主题或属于同一业务对象提供者提供包含有语义信息的多个特效,再或者包含有语义信息的同一特效的多个展示部分。其中,特效具体为包含广告信息的二维贴纸特效为例,对本申请实施例的视频图像处理方案进行说明,对于单独的业务对象同样适用。如图6所示,该实施例在视频图像中展示业务对象的方法包括:步骤S602,从视频图像中检测至少一个目标对象,并确定至少一个目标对象的特征点。
在一个可选示例中,步骤S602可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一确定模块702执行。
步骤S604,根据至少一个目标对象的特征点和待展示的多个关联业务对象的类型,确定该待展示的多个关联业务对象在视频图像中相应的展示位置。
与前述实施例不同的是,在待展示的多个关联业务对象在视频图像中的展示位置时,不仅根据至少一个目标对象的特征点,还根据待展示的关联业务对象的类型,确定待展示的多个关联业务对象在视频图像中的展示位置。由此,根据关联业务对象的类型,可以以目标对象的特征点为参考,为关联业务对象选择适当的展示位置。
此外,在根据至少一个目标对象的特征点和待展示的多个关联业务对象的类型,获得待展示的多个关联业务对象在视频图像中的相应的展示位置的情况下,可以从多个展示位置中选择至少一个展示位置。例如,对于文字类型的关联业务对象,可以展示在背景区域,也可以展示在人物的额头或身体区域等。
步骤S606,在确定的相应的展示位置采用计算机绘图方式分别绘制上述多个关联业务对象进行展示。
需要说明的是,上述示例中的关联业务对象均可以为文字形式或图片形式或二者结合形式的贴纸。
通过本实施例提供的视频图像处理方法,综合考虑目标对象的特征点和关联业务对象的类型,实现关联业务对象在视频图像中的展示位置的确定,从而在相应的展示位置采用计算机绘图方式分别绘制多个关联业务对象,进而实现了关联业务对象的投放和展示。多个关联业务对象之间组合展示,以及与视频播放有效结合展示,有利于提高业务对象的投放和展示效率和效果,也无须额外的数据传输,有利于节约网络资源和客户端的系统资源。
本发明实施例提供的任一种在视频图像中展示业务对象的方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本发明实施例提供的任一种在视频图像中展示业务对象的方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本发明实施例提及的任一种在视频图像中展示业务对象的方法。下文不再赘述。
本领域普通技术人员可以理解:上述本申请实施例的方法的全部或部分步骤可在硬件、固件中实 现,或者被实现为可存储在记录介质(诸如CD ROM、RAM、软盘、硬盘或磁光盘)中的软件或计算机代码,或者被实现通过网络下载的原始存储在远程记录介质或非暂时机器可读介质中并将被存储在本地记录介质中的计算机代码,从而在此描述的方法可被存储在使用通用计算机、专用处理器或者可编程或专用硬件(诸如ASIC或FPGA)的记录介质上的这样的软件处理。可以理解,计算机、处理器、微处理器控制器或可编程硬件包括可存储或接收软件或计算机代码的存储组件(例如,RAM、ROM、闪存等),当所述软件或计算机代码被计算机、处理器或硬件访问且执行时,实现在此描述的处理方法。此外,当通用计算机访问用于实现在此示出的处理的代码时,代码的执行将通用计算机转换为用于执行在此示出的处理的专用计算机。
图7是根据本申请在视频图像中展示业务对象的装置一实施例的结构框图本申请各实施例的装置可用于实现本发明上述各方法实施例。如图7所示,本实施例在视频图像中展示业务对象的装置包括:
第一确定模块702,用于从视频图像中检测至少一个目标对象,并确定该至少一个目标对象的特征点。
第二确定模块704,用于根据该至少一个目标对象的特征点,确定待展示的业务对象在视频图像中的展示位置。
绘制模块706,用于在展示位置采用计算机绘图方式绘制业务对象。
通过本实施例在视频图像中展示业务对象的装置,从视频图像中检测目标对象并确定目标对象的特征点,不同的目标对象具有不同的特征点,将确定的目标对象的特征点作为确定待展示的业务对象的展示位置的依据,确定待展示的业务对象的展示位置,在确定的展示位置采用计算机绘图方式绘制业务对象,以进行业务对象的展示。当业务对象为待展示的广告时,一方面,在确定的展示位置采用计算机绘图方式绘制业务对象,该业务对象与视频播放相结合,无须通过网络传输与视频无关的额外广告视频数据,有利于节约网络资源和客户端的系统资源;另一方面,业务对象与视频图像中的目标对象紧密结合,可以一种不打扰观众的方式展示业务对象,不影响观众的正常视频观看体验,不易引起观众反感,有利于提高业务对象的投放、实现预想的展示效率和效果。
在本申请各在视频图像中展示业务对象的装置实施例的一个可选示例中,第二确定模块704可以通过预先训练好的、用于确定业务对象在视频图像中的展示位置的卷积网络模型实现,即:第二确定模块804用于根据至少一个目标对象的特征点,使用预先训练的、用于确定业务对象在视频图像中的展示位置的卷积网络模型,确定待展示的业务对象在视频图像中的展示位置。
图8是根据本申请在视频图像中展示业务对象的装置另一实施例的结构框图。如图8所示,与图7所示的实施例相比,该实施例中还包括训练模块708,用于对上述卷积网络模型进行预先训练。在其中一个可选示例中,训练模块708包括:
第一获取模块7082,用于通过卷积网络模型获取待训练的业务对象样本图像的特征向量,其中,所述特征向量中包含有待训练的业务对象样本图像中的目标对象的信息,以及,业务对象的位置信息和/或置信度信息;
第二获取模块7084,用于通过卷积网络模型对所述特征向量进行卷积处理,获得特征向量卷积结果;
判断模块7086,用于分别判断特征向量卷积结果中对应的目标对象的信息,以及,业务对象的位置信息和/或置信度信息是否满足预设设置的收敛条件;
执行模块7088,用于若判断模块7086的判断结果为满足收敛条件,则完成对卷积网络模型的训练;若判断模块7086的判断结果为不满足收敛条件,则根据特征向量卷积结果,调整卷积网络模型的参数,以便训练模块708根据调整后的卷积网络模型的参数对卷积网络模型进行迭代训练,直至迭代训练后的特征向量卷积结果满足收敛条件。
可选地,判断模块7086可以包括:第一判断模块,用于获取特征向量卷积结果中对应的业务对象的位置信息;使用第一损失函数,计算对应的业务对象的位置信息指示的位置与预设的标准位置之间的第一距离;根据第一距离判断对应的业务对象的位置信息是否满足收敛条件;和/或,第二判断模块,用于获取特征向量卷积结果中对应的业务对象的置信度信息;使用第二损失函数,计算对应的业务对象的置信度信息指示的置信度与预设的标准置信度之间的第二距离;根据第二距离判断对应的业务对象的置信度信息是否满足收敛条件。
可选地,第一损失函数可为:计算对应的业务对象的位置信息指示的位置与预设的标准位置之间的欧式距离的函数;和/或,第二损失函数可为:计算对应的业务对象的置信度信息指示的置信度与预设的标准置信度之间的欧式距离的函数。
可选地,预设的标准位置可为:对待训练的业务对象样本图像中的业务对象的位置进行平均处理后获得的平均位置;和/或,预设的标准置信度可为:对待训练的业务对象样本图像中的业务对象的置 信度进行平均处理后获取的平均置信度。
可选地,训练模块708还可以包括:预处理模块7080,用于在第一获取模块7082获取待训练的业务对象样本图像的特征向量之前,获取多个业务对象样本图像,其中,每个业务对象样本图像中包含有业务对象的标注信息;根据标注信息确定业务对象的位置,判断确定的业务对象的位置与预设位置的距离是否小于或等于设定阈值;将确定的位置与预设位置的距离小于或等于设定阈值的业务对象对应的业务对象样本图像,确定为待训练的业务对象样本图像。
可选地,预处理模块7080在根据标注信息确定业务对象的位置,判断确定的业务对象的位置与预设位置的距离是否小于或等于设定阈值时:根据标注信息确定业务对象的中心位置;判断中心位置与预设位置的方差是否小于或等于设定阈值。
可选地,第二确定模块704可以包括:类型确定模块7042,用于根据至少一个目标对象的特征点的信息,确定该至少一个目标对象的类型;区域确定模块7044,用于根据该至少一个目标对象的类型,确定待展示的业务对象的展示区域;位置确定模块7046,用于根据展示区域,确定待展示的业务对象在视频图像中的展示位置。
可选地,区域确定模块7044可以包括:第一区域确定模块,用于当上述至少一个目标对象的类型为人脸类型时,确定待展示的业务对象的展示区域包括以下至少之一或任意多个:视频图像中人物的头发区域、额头区域、脸颊区域、下巴区域、头部以外的身体区域;和/或,第二区域确定模块,用于当上述至少一个目标对象的类型为背景类型时,确定待展示的业务对象的展示区域包括:视频图像中的背景区域;和/或,第三区域确定模块,用于当上述至少一个目标对象的类型为手部类型时,确定待展示的业务对象的展示区域包括:视频图像中以手部所在的区域为中心的、设定范围内的区域;和/或,第四区域确定模块,用于当上述至少目标对象的类型为动作类型时,确定待展示的业务对象的展示区域包括:视频图像中预先设定的区域。
可选地,动作类型对应的动作包括以下至少之一或任意多个:眨眼、张嘴、点头、摇头、亲吻、微笑、挥手、剪刀手、握拳、托手、竖大拇指、摆手枪姿势、摆V字手、摆OK手。
可选地,第二确定模块704具体用于根据上述至少一个目标对象的特征点和待展示的业务对象的类型,确定待展示的业务对象在视频图像中的展示位置。
可选地,第二确定模块704具体用于根据上述至少一个目标对象的特征点和待展示的业务对象的类型,获得待展示的业务对象在视频图像中的多个展示位置;从多个展示位置中选择至少一个展示位置作为待展示的业务对象在视频图像中的展示位置。
可选地,业务对象的类型例如可以包括但不限于以下至少之一或任意多个:额头贴片类型、脸颊贴片类型、下巴贴片类型、虚拟帽子类型、虚拟服装类型、虚拟妆容类型、虚拟头饰类型、虚拟发饰类型、虚拟首饰类型、背景类型、虚拟宠物类型、虚拟容器类型等。
可选地,第二确定模块704具体用于从预先存储的目标对象的特征点与展示位置的对应关系中,获取与上述至少一个目标对象的特征点相对应的目标展示位置;以及将获取的目标展示位置确定为待展示的业务对象在所述视频图像的展示位置。
可选地,上述业务对象可为包含有语义信息的特效;视频图像可为直播类视频图像。
可选地,上述包含有语义信息的特效可以包括包含广告信息的以下至少一种形式的特效:二维贴纸特效、三维特效、粒子特效。
在本发明上述各装置实施例的一个可选示例中,上述的业务对象可以包括:多个关联业务对象。相应地,上述各装置实施例中,第二确定模块704具体用于根据至少一个目标对象的特征点,确定多个待展示的关联业务对象在视频图像中相应的展示位置;绘制模块706具体用于在相应的展示位置采用计算机绘图方式分别绘制多个关联业务对象。
在一个可选示例中,多个关联业务对象例如可以包括但不限于以下至少之一或任意多项:用于展示同一业务对象主题的、包含有语义信息的多个特效、包含有语义信息的同一特效的多个展示部分、属于同一业务对象提供者提供的包含有语义信息的多个特效。示例性地,该特效可以包含广告信息的二维贴纸特效、三维特效、粒子特效中任意一种。此外,其它形式的业务对象也同样适用本申请实施例提供的视频图像处理方案
示例性地,上述多个相应的展示位置包括以下至少一个或任意多个:视频图像中人物的头发区域、额头区域、脸颊区域、下巴区域、头部以外的身体区域、视频图像中的背景区域、视频图像中以手部所在的区域为中心的设定范围内的区域、视频图像中预先设定的区域。
本申请各实施例在视频图像中展示业务对象的装置可用于实现前述各在视频图像中展示业务对象的方法实施例,并具有相应的方法实施例的有益效果,在此不再赘述。
此外,本申请各实施例在视频图像中展示业务对象的装置可以设置于适当的电子设备中,例如移 动终端、PC、服务器等。
图9是根据本申请电子设备一实施例的结构示意图。本申请实施例并不对电子设备的实现做限定。如图9所示,该电子设备可以包括:处理器(processor)902、通信接口(Communications Interface)904、存储器(memory)906、以及通信总线908。其中:
处理器902、通信接口904、以及存储器906通过通信总线908完成相互间的通信。
通信接口904,用于与其它设备比如其它客户端或服务器等的网元通信。
处理器702可能是中央处理器(CPU),或者是特定集成电路(Application Specific Integrated Circuit,ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路,或者是图形处理器(Graphics Processing Unit,GPU)。终端设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU,或者,一个或多个GPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个GPU。
存储器906,用于至少一可执行指令,该可执行指令使处理器902执行如本申请上述任一实施例在视频图像中展示业务对象的方法对应的操作。存储器906可能包含高速随机存取存储器(random access memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
图10为本发明电子设备另一实施例的结构示意图。下面参考图10,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备的结构示意图。如图10所示,该电子设备包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)1001,和/或一个或多个图像处理器(GPU)1013等,处理器可以根据存储在只读存储器(ROM)1002中的可执行指令或者从存储部分1008加载到随机访问存储器(RAM)1003中的可执行指令而执行各种适当的动作和处理。通信部1012可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡,处理器可与只读存储器1002和/或随机访问存储器1003中通信以执行可执行指令,通过总线1004与通信部1012相连、并经通信部1012与其他目标设备通信,从而完成本申请实施例提供的任一在视频图像中展示业务对象的方法对应的操作,例如,从视频图像中检测至少一个目标对象,并确定所述至少一个目标对象的特征点;根据所述至少一个目标对象的特征点,确定待展示的业务对象在所述视频图像中的展示位置;在所述展示位置采用计算机绘图方式绘制所述业务对象。
此外,在RAM 1003中,还可存储有装置操作所需的各种程序和数据。CPU1001、ROM1002以及RAM1003通过总线1004彼此相连。在有RAM1003的情况下,ROM1002为可选模块。RAM1003存储可执行指令,或在运行时向ROM1002中写入可执行指令,可执行指令使处理器1001执行上述在视频图像中展示业务对象的方法对应的操作。输入/输出(I/O)接口1005也连接至总线1004。通信部1012可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口1005:包括键盘、鼠标等的输入部分1006;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分1007;包括硬盘等的存储部分1008;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分1009。通信部分1009经由诸如因特网的网络执行通信处理。驱动器1011也根据需要连接至I/O接口1005。可拆卸介质1011,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1011上,以便于从其上读出的计算机程序根据需要被安装入存储部分1008。
需要说明的,如图10所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图10的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU和CPU可分离设置或者可将GPU集成在CPU上,通信部可分离设置,也可集成设置在CPU或GPU上,等等。这些可替换的实施方式均落入本发明公开的保护范围。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令,例如,从视频图像中检测至少一个目标对象,并确定所述至少一个目标对象的特征点;根据所述至少一个目标对象的特征点,确定待展示的业务对象在所述视频图像中的展示位置;在所述展示位置采用计算机绘图方式绘制所述业务对象。
另外,本申请实施例还提供了一种计算机程序,该计算机程序包括计算机可读代码,该程序代码包括计算机操作指令,当计算机可读代码在设备上运行时,设备中的处理器执行用于实现本申请任一实施例在视频图像中展示业务对象的方法中各步骤的指令。
另外,本申请实施例还提供了一种计算机可读存储介质,用于存储计算机可读取的指令,该指令被执行时实现本申请任一实施例在视频图像中展示业务对象的方法中各步骤的操作。
本申请实施例中,计算机程序、计算机可读取的指令被执行时各步骤的具体实现可以参见上述实施例中的相应步骤和模块中对应的描述,在此不赘述。所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的设备和模块的具体工作过程,可以参考前述方法实施例中的对应过程描述,在此不再赘述。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于装置、电子设备、程序、存储介质等实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
需要指出,根据实施的需要,可将本申请实施例中描述的各个部件/步骤拆分为更多部件/步骤,也可将两个或多个部件/步骤或者部件/步骤的部分操作组合成新的部件/步骤,以实现本申请实施例的目的。本领域普通技术人员可以意识到,可能以许多方式来实现本发明的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本发明的方法和装置。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。用于所述方法的步骤的上述顺序仅是为了进行说明,本发明的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本发明实施为记录在记录介质中的程序,这些程序包括用于实现根据本发明的方法的机器可读指令。因而,本发明还覆盖存储用于执行根据本发明的方法的程序的记录介质。
以上实施方式仅用于说明本申请实施例,而并非对本申请实施例的限制,有关技术领域的普通技术人员,在不脱离本申请实施例的精神和范围的情况下,还可以做出各种变化和变型,因此所有等同的技术方案也属于本申请实施例的范畴,本申请实施例的专利保护范围应由权利要求限定。

Claims (44)

  1. 一种在视频图像中展示业务对象的方法,包括:
    从视频图像中检测至少一个目标对象,并确定所述至少一个目标对象的特征点;
    根据所述至少一个目标对象的特征点,确定待展示的业务对象在所述视频图像中的展示位置;
    在所述展示位置采用计算机绘图方式绘制所述业务对象。
  2. 根据权利要求1所述的方法,其特征在于,根据所述至少一个目标对象的特征点,确定待展示的业务对象在所述视频图像中的展示位置,包括:
    根据所述至少一个目标对象的特征点,使用预先训练的、用于确定业务对象在视频图像中的展示位置的卷积网络模型,确定待展示的业务对象在所述视频图像中的展示位置。
  3. 根据权利要求2所述的方法,其特征在于,对所述卷积网络模型的预先训练包括:
    利用所述卷积网络模型获取待训练的业务对象样本图像的特征向量,其中,所述特征向量中包括:所述待训练的业务对象样本图像中的目标对象的信息、以及业务对象的位置信息和/或置信度信息;
    利用所述卷积网络模型对所述特征向量进行卷积处理,获得特征向量卷积结果;
    分别判断所述特征向量卷积结果中对应的目标对象的信息,以及所述业务对象的位置信息和/或置信度信息是否满足收敛条件;
    若满足收敛条件,则完成对所述卷积网络模型的训练;
    否则,若不满足收敛条件,则根据所述特征向量卷积结果,调整所述卷积网络模型的参数,并根据调整后的所述卷积网络模型的参数对所述卷积网络模型进行迭代训练,直至迭代训练后的特征向量卷积结果满足所述收敛条件。
  4. 根据权利要求3所述的方法,其特征在于,分别判断所述特征向量卷积结果中对应的业务对象的位置信息和/或置信度信息是否满足收敛条件,包括:
    获取所述特征向量卷积结果中对应的业务对象的位置信息;使用第一损失函数,计算所述对应的业务对象的位置信息指示的位置与预设的标准位置之间的第一距离;根据所述第一距离判断所述对应的业务对象的位置信息是否满足收敛条件;
    和/或,
    获取所述特征向量卷积结果中对应的业务对象的置信度信息;使用第二损失函数,计算所述对应的业务对象的置信度信息指示的置信度与预设的标准置信度之间的第二距离;根据所述第二距离判断所述对应的业务对象的置信度信息是否满足收敛条件。
  5. 根据权利要求4所述的方法,其特征在于,所述第一损失函数包括:计算所述对应的业务对象的位置信息指示的位置与预设的标准位置之间的欧式距离的函数;
    和/或,
    所述第二损失函数包括:计算所述对应的业务对象的置信度信息指示的置信度与预设的标准置信度之间的欧式距离的函数。
  6. 根据权利要求4或5所述的方法,其特征在于,所述预设的标准位置包括:对所述待训练的业务对象样本图像中的业务对象的位置进行平均处理后获得的平均位置;
    和/或,
    所述预设的标准置信度包括:对所述待训练的业务对象样本图像中的业务对象的置信度进行平均处理后获取的平均置信度。
  7. 根据权利要求3-6任一所述的方法,其特征在于,所述获取待训练的业务对象样本图像的特征向量之前,还包括:
    获取多个业务对象样本图像,其特征在于,每个所述业务对象样本图像中包含有业务对象的标注信息;
    根据所述标注信息确定业务对象的位置,判断确定的所述业务对象的位置与预设位置的距离是否小于或等于设定阈值;
    将确定的位置与预设位置的距离小于或等于所述设定阈值的业务对象对应的业务对象样本图像,确定为所述待训练的业务对象样本图像。
  8. 根据权利要求7所述的方法,其特征在于,根据所述标注信息确定业务对象的位置,判断确定的所述业务对象的位置与预设位置的距离是否小于或等于设定阈值,包括:
    根据所述标注信息确定业务对象的中心位置;
    判断所述中心位置与预设位置的方差是否小于或等于所述设定阈值。
  9. 根据权利要求1所述的方法,其特征在于,根据所述至少一个目标对象的特征点,确定待展示的业务对象在所述视频图像中的展示位置,包括:
    根据所述至少一个目标对象的特征点,确定所述至少一个目标对象的类型;
    根据所述至少一个目标对象的类型,确定待展示的业务对象的展示区域;
    根据所述展示区域,确定待展示的业务对象在所述视频图像中的展示位置。
  10. 根据权利要求9所述的方法,其特征在于,根据所述至少一个目标对象的类型,确定待展示的业务对象的展示区域,包括:
    当所述目标对象的类型为人脸类型时,确定待展示的业务对象的展示区域包括以下至少之一或任意多个:视频图像中人物的头发区域、额头区域、脸颊区域、下巴区域、头部以外的身体区域;和/或,
    当所述目标对象的类型为背景类型时,确定待展示的业务对象的展示区域包括:视频图像中的背景区域;和/或,
    当所述目标对象的类型为手部类型时,确定待展示的业务对象的展示区域包括:视频图像中以手部所在的区域为中心的、设定范围内的区域;和/或,
    当所述目标对象的类型为动作类型时,确定待展示的业务对象的展示区域包括:视频图像中预先设定的区域。
  11. 根据权利要求10所述的方法,其特征在于,所述动作类型对应的动作包括以下至少之一:眨眼、张嘴、点头、摇头、亲吻、微笑、挥手、剪刀手、握拳、托手、竖大拇指、摆手枪姿势、摆V字手、摆OK手。
  12. 根据权利要求1-11任一项所述的方法,其特征在于,根据所述至少一个目标对象的特征点,确定待展示的业务对象在所述视频图像中的展示位置,包括:
    根据所述至少一个目标对象的特征点和所述待展示的业务对象的类型,确定待展示的业务对象在所述视频图像中的展示位置。
  13. 根据权利要求12所述的方法,其特征在于,根据所述至少一个目标对象的特征点和所述待展示的业务对象的类型,确定待展示的业务对象在所述视频图像中的展示位置,包括:
    根据所述至少一个目标对象的特征点和所述待展示的业务对象的类型,获得待展示的业务对象在所述视频图像中的多个展示位置;
    从所述多个展示位置中选择至少一个展示位置作为待展示的业务对象在所述视频图像中的展示位置。
  14. 根据权利要求12或13所述的方法,其特征在于,所述业务对象的类型包括以下任意一项或任意多项:额头贴片类型、脸颊贴片类型、下巴贴片类型、虚拟帽子类型、虚拟服装类型、虚拟妆容类型、虚拟头饰类型、虚拟发饰类型、虚拟首饰类型、背景类型、虚拟宠物类型、虚拟容器类型。
  15. 根据权利要求1所述的方法,其特征在于,所述根据所述至少一个目标对象的特征点,确定待展示的业务对象在所述视频图像中展示位置,包括:
    从预先存储的目标对象的特征点与展示位置的对应关系中,获取与所述至少一个目标对象的特征点相对应的目标展示位置;
    将获取的所述目标展示位置确定为所述待展示的业务对象在所述视频图像的展示位置。
  16. 根据权利要求1-15任一项所述的方法,其特征在于,所述业务对象包括:包含有语义信息的特效;所述视频图像包括:直播类视频图像。
  17. 根据权利要求16所述的方法,其特征在于,所述包含有语义信息的特效包括包含广告信息的以下至少一种形式的特效:二维贴纸特效、三维特效、粒子特效。
  18. 根据权利要求1-17任一所述的方法,其特征在于,所述业务对象包括:多个关联业务对象;
    所述确定待展示的业务对象在所述视频图像中的展示位置,包括:确定多个待展示的关联业务对象在所述视频图像中相应的展示位置;
    在所述展示位置采用计算机绘图方式绘制所述业务对象,包括:在所述相应的展示位置采用计算机绘图方式分别绘制所述多个关联业务对象。
  19. 根据权利要求18所述的方法,其特征在于,所述多个关联业务对象包括以下至少一项或任意多项:用于展示同一业务对象主题的、包含有语义信息的多个特效,包含有语义信息的同一特效的多个展示部分,同一业务对象提供者提供的包含有语义信息的多个特效。
  20. 根据权利要求18或19所述的方法,其特征在于,所述相应的展示位置包括以下至少一个或任意多个:视频图像中人物的头发区域、额头区域、脸颊区域、下巴区域、头部以外的身体区域、视频图像中的背景区域、视频图像中以手部所在的区域为中心的设定范围内的区 域、视频图像中预先设定的区域。
  21. 一种在视频图像中展示业务对象的装置,包括:
    第一确定模块,用于从视频图像中检测至少一个目标对象,并确定所述至少一个目标对象的特征点;
    第二确定模块,用于根据所述至少一个目标对象的特征点,确定待展示的业务对象在所述视频图像中的展示位置;
    绘制模块,用于在所述展示位置采用计算机绘图方式绘制所述业务对象。
  22. 根据权利要求21所述的装置,其特征在于,所述第二确定模块包括预先训练的卷积网络模型。
  23. 根据权利要求22所述的装置,其特征在于,所述装置还包括:训练模块,用于对所述卷积网络模型进行预先训练;
    所述训练模块包括:
    第一获取模块,用于获取待训练的业务对象样本图像的特征向量,所述特征向量中包括:所述待训练的业务对象样本图像中的目标对象的信息、以及业务对象的位置信息和/或置信度信息;
    第二获取模块,用于对所述特征向量进行卷积处理,获得特征向量卷积结果;
    判断模块,用于分别判断所述特征向量卷积结果中对应的目标对象的信息,以及所述业务对象的位置信息和/或置信度信息是否满足收敛条件;
    执行模块,用于若所述判断模块的判断结果为满足收敛条件,则完成对所述卷积网络模型的训练;若所述判断模块的判断结果为不满足收敛条件,则根据所述特征向量卷积结果,调整所述卷积网络模型的参数,以便所述训练模块根据调整后的所述卷积网络模型的参数对所述卷积网络模型进行迭代训练,直至迭代训练后的特征向量卷积结果满足所述收敛条件。
  24. 根据权利要求23所述的装置,其特征在于,所述判断模块包括:
    第一判断模块,用于获取所述特征向量卷积结果中对应的业务对象的位置信息;使用第一损失函数,计算所述对应的业务对象的位置信息指示的位置与预设的标准位置之间的第一距离;根据所述第一距离判断所述对应的业务对象的位置信息是否满足收敛条件;
    和/或,
    第二判断模块,用于获取所述特征向量卷积结果中对应的业务对象的置信度信息;使用第二损失函数,计算所述对应的业务对象的置信度信息指示的置信度与预设的标准置信度之间的第二距离;根据所述第二距离判断所述对应的业务对象的置信度信息是否满足收敛条件。
  25. 根据权利要求24所述的装置,其特征在于,所述第一损失函数包括:计算所述对应的业务对象的位置信息指示的位置与预设的标准位置之间的欧式距离的函数;
    和/或,
    所述第二损失函数包括:计算所述对应的业务对象的置信度信息指示的置信度与预设的标准置信度之间的欧式距离的函数。
  26. 根据权利要求24或25所述的装置,其特征在于,所述预设的标准位置包括:对所述待训练的业务对象样本图像中的业务对象的位置进行平均处理后获得的平均位置;
    和/或,
    所述预设的标准置信度包括:对所述待训练的业务对象样本图像中的业务对象的置信度进行平均处理后获取的平均置信度。
  27. 根据权利要求23-26任一所述的装置,其特征在于,所述训练模块还包括:
    预处理模块,用于在所述第一获取模块获取待训练的业务对象样本图像的特征向量之前,获取多个业务对象样本图像,其中,每个所述业务对象样本图像中包含有业务对象的标注信息;根据所述标注信息确定业务对象的位置,判断确定的所述业务对象的位置与预设位置的距离是否小于或等于设定阈值;将确定的位置与预设位置的距离小于或等于所述设定阈值的业务对象对应的业务对象样本图像,确定为待训练的业务对象样本图像。
  28. 根据权利要求27所述的装置,其特征在于,所述预处理模块在根据所述标注信息确定业务对象的位置,判断确定的所述业务对象的位置与预设位置的距离是否小于或等于设定阈值时:根据所述标注信息确定业务对象的中心位置;判断所述中心位置与预设位置的方差是否小于或等于所述设定阈值。
  29. 根据权利要求21所述的装置,其特征在于,所述第二确定模块包括:
    类型确定模块,用于根据所述至少一个目标对象的特征点的信息,确定所述至少一个目 标对象的类型;
    区域确定模块,用于根据所述至少一个目标对象的类型,确定待展示的业务对象的展示区域;
    位置确定模块,用于根据所述展示区域,确定待展示的业务对象在所述视频图像中的展示位置。
  30. 根据权利要求29所述的装置,其特征在于,所述区域确定模块包括:
    第一区域确定模块,用于当所述目标对象的类型为人脸类型时,确定待展示的业务对象的展示区域包括以下至少之一或任意多个:视频图像中人物的头发区域、额头区域、脸颊区域、下巴区域、头部以外的身体区域;和/或,
    第二区域确定模块,用于当所述目标对象的类型为背景类型时,确定待展示的业务对象的展示区域包括:视频图像中的背景区域;和/或,
    第三区域确定模块,用于当所述目标对象的类型为手部类型时,确定待展示的业务对象的展示区域包括:视频图像中以手部所在的区域为中心的、设定范围内的区域;和/或,
    第四区域确定模块,用于当所述目标对象的类型为动作类型时,确定待展示的业务对象的展示区域包括:视频图像中预先设定的区域。
  31. 根据权利要求30所述的装置,其特征在于,所述动作类型对应的动作包括以下至少之一:眨眼、张嘴、点头、摇头、亲吻、微笑、挥手、剪刀手、握拳、托手、竖大拇指、摆手枪姿势、摆V字手、摆OK手。
  32. 根据权利要求21所述的装置,其特征在于,所述第二确定模块,具体用于根据所述至少一个目标对象的特征点和所述待展示的业务对象的类型,确定待展示的业务对象在所述视频图像中的展示位置。
  33. 根据权利要求32所述的装置,其特征在于,所述第二确定模块,具体用于根据所述至少一个目标对象的特征点和所述待展示的业务对象的类型,获得待展示的业务对象在所述视频图像中的多个展示位置;从所述多个展示位置中选择至少一个展示位置作为待展示的业务对象在所述视频图像中的展示位置。
  34. 根据权利要求32或33所述的装置,其特征在于,所述业务对象的类型包括以下任意一项或任意多项:额头贴片类型、脸颊贴片类型、下巴贴片类型、虚拟帽子类型、虚拟服装类型、虚拟妆容类型、虚拟头饰类型、虚拟发饰类型、虚拟首饰类型、背景类型、虚拟宠物类型、虚拟容器类型。
  35. 根据权利要求21所述的装置,其特征在于,所述第二确定模块,具体用于从预先存储的目标对象的特征点与展示位置的对应关系中,获取与所述至少一个目标对象的特征点相对应的目标展示位置;以及将获取的所述目标展示位置确定为所述待展示的业务对象在所述视频图像的展示位置。
  36. 根据权利要求21-35任一所述的装置,其特征在于,所述业务对象包括:包含有语义信息的特效;所述视频图像包括:直播类视频图像。
  37. 根据权利要求36所述的装置,其特征在于,所述包含有语义信息的特效包括包含广告信息的以下至少一种形式的特效:二维贴纸特效、或三维特效、粒子特效。
  38. 根据权利要求21-37任一所述的装置,其特征在于,所述业务对象包括:多个关联业务对象;
    所述第二确定模块,具体用于根据所述至少一个目标对象的特征点,确定多个待展示的关联业务对象在所述视频图像中相应的展示位置;
    所述绘制模块,具体用于在所述相应的展示位置采用计算机绘图方式分别绘制所述多个关联业务对象。
  39. 根据权利要求38所述的装置,其特征在于,所述多个关联业务对象包括以下至少一项或任意多项:用于展示同一业务对象主题的、包含有语义信息的多个特效,包含有语义信息的同一特效的多个展示部分,同一业务对象提供者提供的包含有语义信息的多个特效。
  40. 根据权利要求38或39所述的装置,其特征在于,所述相应的展示位置包括以下至少一个或任意多个:视频图像中人物的头发区域、额头区域、脸颊区域、下巴区域、头部以外的身体区域、视频图像中的背景区域、视频图像中以手部所在的区域为中心的设定范围内的区域、视频图像中预先设定的区域。
  41. 一种电子设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
    所述存储器用于存储至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1-20任一项所述的在视频图像中展示业务对象的方法对应的操作。
  42. 一种电子设备,其特征在于,包括:
    处理器和权利要求21-40任一所述的在视频图像中展示业务对象的装置;
    在处理器运行所述结构化文本检测系统时,权利要求21-40任一所述的在视频图像中展示业务对象的装置中的单元被运行。
  43. 一种计算机程序,包括计算机可读代码,其特征在于,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现权利要求1-20任一项所述的在视频图像中展示业务对象的方法中各步骤的指令。
  44. 一种计算机可读存储介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时实现权利要求1-20任一项所述的在视频图像中展示业务对象的方法中各步骤的操作。
PCT/CN2017/098027 2016-08-19 2017-08-18 在视频图像中展示业务对象的方法、装置和电子设备 WO2018033137A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/847,172 US11037348B2 (en) 2016-08-19 2017-12-19 Method and apparatus for displaying business object in video image and electronic device

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201610694812.XA CN107343225B (zh) 2016-08-19 2016-08-19 在视频图像中展示业务对象的方法、装置和终端设备
CN201610694625.1A CN107343211B (zh) 2016-08-19 2016-08-19 视频图像处理方法、装置和终端设备
CN201610694625.1 2016-08-19
CN201610694812.X 2016-08-19

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/847,172 Continuation US11037348B2 (en) 2016-08-19 2017-12-19 Method and apparatus for displaying business object in video image and electronic device

Publications (1)

Publication Number Publication Date
WO2018033137A1 true WO2018033137A1 (zh) 2018-02-22

Family

ID=61196412

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/098027 WO2018033137A1 (zh) 2016-08-19 2017-08-18 在视频图像中展示业务对象的方法、装置和电子设备

Country Status (2)

Country Link
US (1) US11037348B2 (zh)
WO (1) WO2018033137A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11308655B2 (en) 2018-08-24 2022-04-19 Beijing Microlive Vision Technology Co., Ltd Image synthesis method and apparatus

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8416247B2 (en) * 2007-10-09 2013-04-09 Sony Computer Entertaiment America Inc. Increasing the number of advertising impressions in an interactive environment
CN108280883B (zh) 2018-02-07 2021-05-04 北京市商汤科技开发有限公司 变形特效程序文件包的生成及变形特效生成方法与装置
US11379716B2 (en) * 2018-02-09 2022-07-05 Htc Corporation Method and electronic apparatus for adjusting a neural network
JP7075012B2 (ja) * 2018-09-05 2022-05-25 日本電信電話株式会社 画像処理装置、画像処理方法及び画像処理プログラム
EP3836021A4 (en) * 2018-09-19 2021-08-25 Huawei Technologies Co., Ltd. AI MODEL DEVELOPMENT PROCESS AND DEVICE
US20200346114A1 (en) * 2019-04-30 2020-11-05 Microsoft Technology Licensing, Llc Contextual in-game element recognition and dynamic advertisement overlay
CN110428390B (zh) * 2019-07-18 2022-08-26 北京达佳互联信息技术有限公司 一种素材展示方法、装置、电子设备和存储介质
GB202017464D0 (en) * 2020-10-30 2020-12-16 Tractable Ltd Remote vehicle damage assessment
US11687778B2 (en) 2020-01-06 2023-06-27 The Research Foundation For The State University Of New York Fakecatcher: detection of synthetic portrait videos using biological signals
CN111339298B (zh) * 2020-02-25 2024-04-09 北京小米松果电子有限公司 一种分类预测方法、装置及存储介质
CN111640199B (zh) * 2020-06-10 2024-01-09 浙江商汤科技开发有限公司 一种ar特效数据生成的方法及装置
US11295347B1 (en) * 2021-01-30 2022-04-05 Walmart Apollo, Llc Systems and methods for forecasting campaign parameters using machine learning architectures and techniques
US11398089B1 (en) * 2021-02-17 2022-07-26 Adobe Inc. Image processing techniques to quickly find a desired object among other objects from a captured video scene
CN115720279B (zh) * 2022-11-18 2023-09-15 杭州面朝信息科技有限公司 一种在直播场景中展现任意特效的方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339549A (zh) * 2007-07-03 2009-01-07 周磊 一种广告方法和系统
CN101364368A (zh) * 2008-09-18 2009-02-11 北京聚万传媒科技有限公司 在视频广告系统中嵌入和播放电子地图的方法及实现装置
US20130136416A1 (en) * 2011-11-30 2013-05-30 Nokia Corporation Method and apparatus for enriching media with meta-information
CN103702211A (zh) * 2013-12-09 2014-04-02 Tcl集团股份有限公司 一种基于电视播放内容的广告推送方法和系统
US8904033B2 (en) * 2010-06-07 2014-12-02 Adobe Systems Incorporated Buffering media content

Family Cites Families (394)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2751145B2 (ja) * 1993-12-15 1998-05-18 株式会社三城 眼鏡形状デザイン設計システム
US8574074B2 (en) * 2005-09-30 2013-11-05 Sony Computer Entertainment America Llc Advertising impression determination
JP2001283079A (ja) * 2000-03-28 2001-10-12 Sony Corp 通信サービス方法とその装置、通信端末装置、通信システム、広告宣伝方法
US20080040227A1 (en) * 2000-11-03 2008-02-14 At&T Corp. System and method of marketing using a multi-media communication system
US8224078B2 (en) * 2000-11-06 2012-07-17 Nant Holdings Ip, Llc Image capture and identification system and process
US8751310B2 (en) * 2005-09-30 2014-06-10 Sony Computer Entertainment America Llc Monitoring advertisement impressions
EP1423825B1 (en) * 2001-08-02 2011-01-26 Intellocity USA, Inc. Post production visual alterations
US6879709B2 (en) * 2002-01-17 2005-04-12 International Business Machines Corporation System and method for automatically detecting neutral expressionless faces in digital images
US8220018B2 (en) * 2002-09-19 2012-07-10 Tvworks, Llc System and method for preferred placement programming of iTV content
US7362919B2 (en) * 2002-12-12 2008-04-22 Eastman Kodak Company Method for generating customized photo album pages and prints based on people and gender profiles
WO2004068414A1 (ja) * 2003-01-27 2004-08-12 Fujitsu Limited 注目物体の出現位置表示装置
US20040194128A1 (en) * 2003-03-28 2004-09-30 Eastman Kodak Company Method for providing digital cinema content based upon audience metrics
US7391888B2 (en) * 2003-05-30 2008-06-24 Microsoft Corporation Head pose assessment methods and systems
US20050015370A1 (en) * 2003-07-14 2005-01-20 Stavely Donald J. Information management system and method
US7979877B2 (en) * 2003-12-23 2011-07-12 Intellocity Usa Inc. Advertising methods for advertising time slots and embedded objects
US9865017B2 (en) * 2003-12-23 2018-01-09 Opentv, Inc. System and method for providing interactive advertisement
US10032192B2 (en) * 2003-12-23 2018-07-24 Roku, Inc. Automatic localization of advertisements
US10387920B2 (en) * 2003-12-23 2019-08-20 Roku, Inc. System and method for offering and billing advertisement opportunities
JP2005266984A (ja) * 2004-03-17 2005-09-29 Konica Minolta Holdings Inc 画像処理システム
JP5025893B2 (ja) * 2004-03-29 2012-09-12 ソニー株式会社 情報処理装置および方法、記録媒体、並びにプログラム
US7239277B2 (en) * 2004-04-12 2007-07-03 Time Domain Corporation Method and system for extensible position location
JP4756876B2 (ja) * 2004-06-09 2011-08-24 キヤノン株式会社 画像表示制御装置、画像表示制御方法、プログラム、及び記憶媒体
EP1792314A1 (en) * 2004-08-23 2007-06-06 Sherpa Technologies, LLC Selective displaying of item information in videos
US20120306907A1 (en) * 2011-06-03 2012-12-06 Huston Charles D System and Method for Inserting and Enhancing Messages Displayed to a User When Viewing a Venue
JP3930898B2 (ja) * 2005-08-08 2007-06-13 松下電器産業株式会社 画像合成装置および画像合成方法
US8542928B2 (en) * 2005-09-26 2013-09-24 Canon Kabushiki Kaisha Information processing apparatus and control method therefor
US8626584B2 (en) * 2005-09-30 2014-01-07 Sony Computer Entertainment America Llc Population of an advertisement reference list
US20070130004A1 (en) * 2005-12-01 2007-06-07 Microsoft Corporation AD campaign optimization
JP4991317B2 (ja) * 2006-02-06 2012-08-01 株式会社東芝 顔特徴点検出装置及びその方法
US20070183665A1 (en) * 2006-02-06 2007-08-09 Mayumi Yuasa Face feature point detecting device and method
US8566865B2 (en) * 2006-03-07 2013-10-22 Sony Computer Entertainment America Llc Dynamic insertion of cinematic stage props in program content
US8549554B2 (en) * 2006-03-07 2013-10-01 Sony Computer Entertainment America Llc Dynamic replacement of cinematic stage props in program content
JP2007300185A (ja) * 2006-04-27 2007-11-15 Toshiba Corp 画像監視装置
US7742425B2 (en) * 2006-06-26 2010-06-22 The Boeing Company Neural network-based mobility management for mobile ad hoc radio networks
US7555468B2 (en) * 2006-06-26 2009-06-30 The Boeing Company Neural network-based node mobility and network connectivty predictions for mobile ad hoc radio networks
JP4757116B2 (ja) * 2006-06-30 2011-08-24 キヤノン株式会社 パラメータ学習方法及びその装置、パターン識別方法及びその装置、プログラム
US20110044501A1 (en) * 2006-07-14 2011-02-24 Ailive, Inc. Systems and methods for personalized motion control
US9050528B2 (en) * 2006-07-14 2015-06-09 Ailive Inc. Systems and methods for utilizing personalized motion control in virtual environment
US20080033801A1 (en) * 2006-07-14 2008-02-07 Vulano Group, Inc. System for dynamic personalized object placement in a multi-media program
US8413182B2 (en) * 2006-08-04 2013-04-02 Aol Inc. Mechanism for rendering advertising objects into featured content
JP4840066B2 (ja) * 2006-10-11 2011-12-21 セイコーエプソン株式会社 回転角度検出装置、および回転角度検出装置の制御方法
US20080109305A1 (en) * 2006-11-08 2008-05-08 Ma Capital Lllp Using internet advertising as a test bed for radio advertisements
US20080109845A1 (en) * 2006-11-08 2008-05-08 Ma Capital Lllp System and method for generating advertisements for use in broadcast media
JP2008146243A (ja) * 2006-12-07 2008-06-26 Toshiba Corp 情報処理装置、情報処理方法、及びプログラム
US8572642B2 (en) * 2007-01-10 2013-10-29 Steven Schraga Customized program insertion system
US9363576B2 (en) * 2007-01-10 2016-06-07 Steven Schraga Advertisement insertion systems, methods, and media
US7796787B2 (en) * 2007-02-03 2010-09-14 Arcsoft, Inc. Face component replacement
JP4829141B2 (ja) * 2007-02-09 2011-12-07 株式会社東芝 視線検出装置及びその方法
US8965762B2 (en) * 2007-02-16 2015-02-24 Industrial Technology Research Institute Bimodal emotion recognition method and system utilizing a support vector machine
TWI365416B (en) * 2007-02-16 2012-06-01 Ind Tech Res Inst Method of emotion recognition and learning new identification information
JP4309926B2 (ja) * 2007-03-13 2009-08-05 アイシン精機株式会社 顔特徴点検出装置、顔特徴点検出方法及びプログラム
US8988609B2 (en) * 2007-03-22 2015-03-24 Sony Computer Entertainment America Llc Scheme for determining the locations and timing of advertisements and other insertions in media
JP4289414B2 (ja) * 2007-03-27 2009-07-01 セイコーエプソン株式会社 画像変形のための画像処理
JP4289420B2 (ja) * 2007-05-10 2009-07-01 セイコーエプソン株式会社 画像処理装置および画像処理方法
US20090013347A1 (en) * 2007-06-11 2009-01-08 Gulrukh Ahanger Systems and methods for reporting usage of dynamically inserted and delivered ads
US20090006208A1 (en) * 2007-06-26 2009-01-01 Ranjit Singh Grewal Display of Video with Tagged Advertising
US8726194B2 (en) * 2007-07-27 2014-05-13 Qualcomm Incorporated Item selection using enhanced control
US20090083147A1 (en) * 2007-09-21 2009-03-26 Toni Paila Separation of advertising content and control
EP2597868B1 (en) * 2007-09-24 2017-09-13 Qualcomm Incorporated Enhanced interface for voice and video communications
US20090094638A1 (en) * 2007-10-03 2009-04-09 Tinbu, Llc Presentation and Distribution of Web Content Having Intelligent Advertisement Selection System
US8416247B2 (en) * 2007-10-09 2013-04-09 Sony Computer Entertaiment America Inc. Increasing the number of advertising impressions in an interactive environment
US20090119172A1 (en) * 2007-11-02 2009-05-07 Soloff David L Advertising Futures Marketplace Methods and Systems
US20090135177A1 (en) * 2007-11-20 2009-05-28 Big Stage Entertainment, Inc. Systems and methods for voice personalization of video content
US20100272365A1 (en) * 2007-11-29 2010-10-28 Koji Yamamoto Picture processing method and picture processing apparatus
US20090157472A1 (en) * 2007-12-14 2009-06-18 Kimberly-Clark Worldwide, Inc. Personalized Retail Information Delivery Systems and Methods
US9098766B2 (en) * 2007-12-21 2015-08-04 Honda Motor Co., Ltd. Controlled human pose estimation from depth image streams
WO2009094661A1 (en) * 2008-01-24 2009-07-30 The Trustees Of Columbia University In The City Of New York Methods, systems, and media for swapping faces in images
JP2009265709A (ja) * 2008-04-22 2009-11-12 Hitachi Ltd 入力装置
US8904430B2 (en) * 2008-04-24 2014-12-02 Sony Computer Entertainment America, LLC Method and apparatus for real-time viewer interaction with a media presentation
WO2009137368A2 (en) * 2008-05-03 2009-11-12 Mobile Media Now, Inc. Method and system for generation and playback of supplemented videos
KR100986101B1 (ko) * 2008-05-30 2010-10-08 이승철 얼굴 분석 서비스 제공 방법 및 장치
US8219438B1 (en) * 2008-06-30 2012-07-10 Videomining Corporation Method and system for measuring shopper response to products based on behavior and facial expression
US20100142448A1 (en) * 2008-09-04 2010-06-10 Ludger Schlicht Devices for a mobile, broadband, routable internet
US8752087B2 (en) * 2008-11-07 2014-06-10 At&T Intellectual Property I, L.P. System and method for dynamically constructing personalized contextual video programs
US20100154007A1 (en) * 2008-12-17 2010-06-17 Jean Touboul Embedded video advertising method and system
US20170337579A1 (en) * 2009-01-23 2017-11-23 Ronald Charles Krosky Media communication
JP5106459B2 (ja) * 2009-03-26 2012-12-26 株式会社東芝 立体物判定装置、立体物判定方法及び立体物判定プログラム
US20130024211A1 (en) * 2009-04-09 2013-01-24 Access Mobility, Inc. Active learning and advanced relationship marketing and health interventions
US8379940B2 (en) * 2009-06-02 2013-02-19 George Mason Intellectual Properties, Inc. Robust human authentication using holistic anthropometric and appearance-based features and boosting
US20100312608A1 (en) * 2009-06-05 2010-12-09 Microsoft Corporation Content advertisements for video
EP2462494A4 (en) * 2009-06-05 2014-08-13 Mozaik Multimedia Inc ECOSYSTEM FOR SMART CONTENT MARKING AND INTERACTION
JP5709410B2 (ja) * 2009-06-16 2015-04-30 キヤノン株式会社 パターン処理装置及びその方法、プログラム
US10872535B2 (en) * 2009-07-24 2020-12-22 Tutor Group Limited Facilitating facial recognition, augmented reality, and virtual reality in online teaching groups
US8763090B2 (en) * 2009-08-11 2014-06-24 Sony Computer Entertainment America Llc Management of ancillary content delivery and presentation
US20140046777A1 (en) * 2009-08-14 2014-02-13 Dataxu, Inc. Methods and systems for using consumer aliases and identifiers
US9111287B2 (en) * 2009-09-30 2015-08-18 Microsoft Technology Licensing, Llc Video content-aware advertisement placement
JP2011090466A (ja) * 2009-10-21 2011-05-06 Sony Corp 情報処理装置及び方法、並びにプログラム
WO2011079458A1 (en) * 2009-12-31 2011-07-07 Nokia Corporation Method and apparatus for local binary pattern based facial feature localization
US8818175B2 (en) * 2010-03-08 2014-08-26 Vumanity Media, Inc. Generation of composited video programming
US9646340B2 (en) * 2010-04-01 2017-05-09 Microsoft Technology Licensing, Llc Avatar-based virtual dressing room
US20110263946A1 (en) * 2010-04-22 2011-10-27 Mit Media Lab Method and system for real-time and offline analysis, inference, tagging of and responding to person(s) experiences
JP5240795B2 (ja) * 2010-04-30 2013-07-17 オムロン株式会社 画像変形装置、電子機器、画像変形方法、および画像変形プログラム
TW201142465A (en) * 2010-05-17 2011-12-01 Hon Hai Prec Ind Co Ltd Front projection device and front projection controlling method
JP5772821B2 (ja) * 2010-05-26 2015-09-02 日本電気株式会社 顔特徴点位置補正装置、顔特徴点位置補正方法および顔特徴点位置補正プログラム
US10614289B2 (en) * 2010-06-07 2020-04-07 Affectiva, Inc. Facial tracking with classifiers
JP5465620B2 (ja) * 2010-06-25 2014-04-09 Kddi株式会社 映像コンテンツに重畳する付加情報の領域を決定する映像出力装置、プログラム及び方法
US8645359B2 (en) * 2010-09-30 2014-02-04 Microsoft Corporation Providing associations between objects and individuals associated with relevant media items
US20120095825A1 (en) * 2010-10-18 2012-04-19 Microsoft Corporation Incentive Selection of Region-of-Interest and Advertisements for Image Advertising
US20120113223A1 (en) * 2010-11-05 2012-05-10 Microsoft Corporation User Interaction in Augmented Reality
US9360943B2 (en) * 2010-12-27 2016-06-07 Lg Electronics Inc. Display device and method of providing feedback for gestures thereof
JP5653206B2 (ja) * 2010-12-27 2015-01-14 日立マクセル株式会社 映像処理装置
JP2014509426A (ja) * 2011-02-23 2014-04-17 アユダ メディア システムズ インコーポレイティッド ペイパールック(PayPerLook)請求方法および屋外で出している広告のためのシステム
TW201237773A (en) * 2011-03-15 2012-09-16 Wistron Corp An electronic system, image adjusting method and computer program product thereof
US9047632B2 (en) * 2011-04-21 2015-06-02 Echostar Technologies L.L.C. Apparatus, systems and methods for facilitating shopping for items shown in media content events
JP2012252447A (ja) * 2011-06-01 2012-12-20 Sony Corp 情報処理装置および方法、記録媒体、並びにプログラム
DE112011105439B4 (de) * 2011-07-11 2021-06-17 Toyota Jidosha Kabushiki Kaisha Rote-Augen-Erfassungsvorrichtung
DE112011105435B4 (de) * 2011-07-11 2022-07-07 Toyota Jidosha Kabushiki Kaisha Augenliderfassungsvorrichtung
CN102985302A (zh) * 2011-07-11 2013-03-20 丰田自动车株式会社 车辆的紧急避险装置
KR101381439B1 (ko) * 2011-09-15 2014-04-04 가부시끼가이샤 도시바 얼굴 인식 장치 및 얼굴 인식 방법
US9626798B2 (en) * 2011-12-05 2017-04-18 At&T Intellectual Property I, L.P. System and method to digitally replace objects in images or video
US20130163854A1 (en) * 2011-12-23 2013-06-27 Chia-Ming Cheng Image processing method and associated apparatus
US8908904B2 (en) * 2011-12-28 2014-12-09 Samsung Electrônica da Amazônia Ltda. Method and system for make-up simulation on portable devices having digital cameras
US8989455B2 (en) * 2012-02-05 2015-03-24 Apple Inc. Enhanced face detection using depth information
US8990849B2 (en) * 2012-02-14 2015-03-24 Verizon Patent And Licensing Inc. Advertisement insertion into media content for streaming
JP6026119B2 (ja) * 2012-03-19 2016-11-16 株式会社東芝 生体情報処理装置
CN104246682B (zh) * 2012-03-26 2017-08-25 苹果公司 增强的虚拟触摸板和触摸屏
US9380282B2 (en) * 2012-03-26 2016-06-28 Max Abecassis Providing item information during video playing
JP6287827B2 (ja) * 2012-03-27 2018-03-07 日本電気株式会社 情報処理装置、情報処理方法、及びプログラム
TWI454966B (zh) * 2012-04-24 2014-10-01 Wistron Corp 手勢控制方法及手勢控制裝置
TWI591557B (zh) * 2012-05-07 2017-07-11 財團法人工業技術研究院 配置廣告的系統與方法
US20130339857A1 (en) * 2012-06-15 2013-12-19 The Mad Video, Inc. Modular and Scalable Interactive Video Player
TWI470999B (zh) * 2012-06-19 2015-01-21 Wistron Corp 編輯與儲存串流的方法、裝置、系統
US10139985B2 (en) * 2012-06-22 2018-11-27 Matterport, Inc. Defining, displaying and interacting with tags in a three-dimensional model
CN103514432B (zh) * 2012-06-25 2017-09-01 诺基亚技术有限公司 人脸特征提取方法、设备和计算机程序产品
US8578407B1 (en) * 2012-07-10 2013-11-05 Joao Redol Real time automated unobtrusive ancilliary information insertion into a video
US8955021B1 (en) * 2012-08-31 2015-02-10 Amazon Technologies, Inc. Providing extrinsic data for video content
US20140068664A1 (en) * 2012-09-05 2014-03-06 Keith Edward Bourne Method for adding an object map to a video sequence
US20140074586A1 (en) * 2012-09-12 2014-03-13 Microsoft Corporation Equilibrium allocation for budget smoothing in search advertising
KR102035134B1 (ko) * 2012-09-24 2019-10-22 엘지전자 주식회사 영상표시장치, 및 그 동작방법
KR101984683B1 (ko) * 2012-10-10 2019-05-31 삼성전자주식회사 멀티 디스플레이 장치 및 그 제어 방법
US11539525B2 (en) * 2018-07-24 2022-12-27 Royal Bank Of Canada Systems and methods for secure tokenized credentials
US8826325B2 (en) * 2012-12-08 2014-09-02 Joao Redol Automated unobtrusive ancilliary information insertion into a video
US9794642B2 (en) * 2013-01-07 2017-10-17 Gracenote, Inc. Inserting advertisements into video content
JP6387831B2 (ja) * 2013-01-15 2018-09-12 日本電気株式会社 特徴点位置検出装置、特徴点位置検出方法および特徴点位置検出プログラム
CN103093490B (zh) * 2013-02-02 2015-08-26 浙江大学 基于单个视频摄像机的实时人脸动画方法
GB201302194D0 (en) * 2013-02-07 2013-03-27 Crisalix Sa 3D platform for aesthetic simulation
US9384242B1 (en) * 2013-03-14 2016-07-05 Google Inc. Discovery of news-related content
US9277251B2 (en) * 2013-03-15 2016-03-01 Echostar Technologies L.L.C. Geographically independent determination of segment boundaries within a video stream
CN105229673B (zh) * 2013-04-03 2021-12-03 诺基亚技术有限公司 一种装置和相关联的方法
US9913002B2 (en) * 2013-06-12 2018-03-06 Lg Electronics Inc. Image display device and method for operating same
US10546318B2 (en) * 2013-06-27 2020-01-28 Intel Corporation Adaptively embedding visual advertising content into media content
JP6375480B2 (ja) * 2013-08-30 2018-08-22 パナソニックIpマネジメント株式会社 メイクアップ支援装置、メイクアップ支援システム、メイクアップ支援方法、およびメイクアップ支援プログラム
US20160212455A1 (en) * 2013-09-25 2016-07-21 Intel Corporation Dynamic product placement in media content
US10045091B1 (en) * 2013-09-30 2018-08-07 Cox Communications, Inc. Selectable content within video stream
US9135646B2 (en) * 2013-10-09 2015-09-15 Ricoh Company, Ltd. Associating advertising content with a channel
US9635398B2 (en) * 2013-11-01 2017-04-25 Adobe Systems Incorporated Real-time tracking collection for video experiences
US10095917B2 (en) * 2013-11-04 2018-10-09 Facebook, Inc. Systems and methods for facial representation
US9489760B2 (en) * 2013-11-14 2016-11-08 Intel Corporation Mechanism for facilitating dynamic simulation of avatars corresponding to changing user performances as detected at computing devices
CN105981050B (zh) * 2013-11-30 2019-05-07 北京市商汤科技开发有限公司 用于从人脸图像的数据提取人脸特征的方法和系统
US9798959B2 (en) * 2013-11-30 2017-10-24 Beijing Sensetime Technology Development Co., Ltd Method and system for recognizing faces
WO2015078018A1 (en) * 2013-11-30 2015-06-04 Xiaoou Tang Method and system for face image recognition
US20170132659A1 (en) * 2014-01-13 2017-05-11 Google Inc. Potential Revenue of Video Views
CN105874528B (zh) * 2014-01-15 2018-07-20 麦克赛尔株式会社 信息显示终端、信息显示系统以及信息显示方法
KR20150087544A (ko) * 2014-01-22 2015-07-30 엘지이노텍 주식회사 제스처 장치, 그 동작 방법 및 이를 구비한 차량
CN105284122B (zh) * 2014-01-24 2018-12-04 Sk 普兰尼特有限公司 用于通过使用帧聚类来插入广告的装置和方法
EP3107070B1 (en) * 2014-02-14 2020-04-29 Sony Interactive Entertainment Inc. Information processing device and information processing method
US20150235277A1 (en) * 2014-02-19 2015-08-20 Kyle W. Bagley Provision of Advertising in Social Media Content In a User Compensation Based Model
RU2014109439A (ru) * 2014-03-12 2015-09-20 ЭлЭсАй Корпорейшн Процессор изображений, содержащий систему распознавания жестов с сопоставлением положения руки, основываясь на признаках контура
JP6331515B2 (ja) * 2014-03-13 2018-05-30 パナソニックIpマネジメント株式会社 メイクアップ支援装置およびメイクアップ支援方法
CN106358444B (zh) * 2014-04-11 2019-07-30 北京市商汤科技开发有限公司 用于面部验证的方法和系统
JP6244059B2 (ja) * 2014-04-11 2017-12-06 ペキン センスタイム テクノロジー ディベロップメント カンパニー リミテッド 基準画像に基づく顔画像検証方法、及び顔画像検証システム
US20150304698A1 (en) * 2014-04-21 2015-10-22 Eyesee, Lda Dynamic Interactive Advertisement Insertion
CN106415594B (zh) * 2014-06-16 2020-01-10 北京市商汤科技开发有限公司 用于面部验证的方法和系统
US20150363698A1 (en) * 2014-06-16 2015-12-17 International Business Machines Corporation Dynamic content delivery based on high-affinity viewer points
US9847012B2 (en) * 2014-07-07 2017-12-19 Google Llc Meal-based medication reminder system
US9508151B2 (en) * 2014-07-10 2016-11-29 Ditto Labs, Inc. Systems, methods, and devices for image matching and object recognition in images using image regions
US20160012594A1 (en) * 2014-07-10 2016-01-14 Ditto Labs, Inc. Systems, Methods, And Devices For Image Matching And Object Recognition In Images Using Textures
KR101573312B1 (ko) * 2014-07-24 2015-12-03 주식회사 시어스랩 클라우드 앨범을 이용하는 광고 서비스 제공 방법
US10528982B2 (en) * 2014-09-12 2020-01-07 Facebook, Inc. Determining a prompt for performing an action presented to a user in association with video data
US9872081B2 (en) * 2014-10-20 2018-01-16 Nbcuniversal Media, Llc Digital content spatial replacement system and method
US20160112761A1 (en) * 2014-10-20 2016-04-21 United Video Properties, Inc. Systems and methods for generating media asset recommendations using a neural network generated based on consumption information
US10692531B2 (en) * 2014-10-25 2020-06-23 Yieldmo, Inc. Methods for serving interactive content to a user
CN110826530B (zh) * 2014-11-15 2023-06-30 北京旷视科技有限公司 使用机器学习进行面部检测
US9501716B2 (en) * 2014-12-11 2016-11-22 Intel Corporation Labeling component parts of objects and detecting component properties in imaging data
WO2016101131A1 (en) * 2014-12-23 2016-06-30 Intel Corporation Augmented facial animation
EP3238144B1 (en) * 2014-12-24 2021-04-14 DeepMind Technologies Limited Augmenting neural networks to generate additional outputs
JP2016126510A (ja) * 2014-12-26 2016-07-11 カシオ計算機株式会社 画像生成装置、画像生成方法及びプログラム
CN107004290B (zh) * 2015-01-06 2020-12-15 索尼公司 效果生成装置、效果生成方法以及程序
US20160196584A1 (en) * 2015-01-06 2016-07-07 Facebook, Inc. Techniques for context sensitive overlays
US10839416B1 (en) * 2015-01-08 2020-11-17 The Directv Group, Inc. Systems and methods for controlling advertising, upselling, cross-selling, and purchasing of products and services via user receiving devices and mobile devices
US10356478B2 (en) * 2015-01-08 2019-07-16 The Directv Group, Inc. Systems and methods for spotted advertising and control of corresponding user interfaces and transactions via user receiving devices and mobile devices
US20180053228A1 (en) * 2015-01-23 2018-02-22 Pcms Holdings, Inc. Systems and methods for allocating mobile advertisement inventory
US20160225053A1 (en) * 2015-01-29 2016-08-04 Clear Research Corporation Mobile visual commerce system
US11275747B2 (en) * 2015-03-12 2022-03-15 Yahoo Assets Llc System and method for improved server performance for a deep feature based coarse-to-fine fast search
CN105518709B (zh) * 2015-03-26 2019-08-09 北京旷视科技有限公司 用于识别人脸的方法、系统和计算机程序产品
CN104967885B (zh) * 2015-03-27 2019-01-11 哈尔滨工业大学深圳研究生院 一种基于视频内容感知的广告推荐方法及系统
US20160294891A1 (en) * 2015-03-31 2016-10-06 Facebook, Inc. Multi-user media presentation system
US10074041B2 (en) * 2015-04-17 2018-09-11 Nec Corporation Fine-grained image classification by exploring bipartite-graph labels
CN105517680B (zh) * 2015-04-28 2020-03-10 北京旷视科技有限公司 用于识别人脸的装置、系统和方法
US10303768B2 (en) * 2015-05-04 2019-05-28 Sri International Exploiting multi-modal affect and semantics to assess the persuasiveness of a video
US20160328868A1 (en) * 2015-05-07 2016-11-10 Facebook, Inc. Systems and methods for generating and presenting publishable collections of related media content items
US10417799B2 (en) * 2015-05-07 2019-09-17 Facebook, Inc. Systems and methods for generating and presenting publishable collections of related media content items
US9697437B2 (en) * 2015-05-18 2017-07-04 Facebook, Inc. Logo detection
US9741107B2 (en) * 2015-06-05 2017-08-22 Sony Corporation Full reference image quality assessment based on convolutional neural network
US20160364419A1 (en) * 2015-06-10 2016-12-15 Blackbird Technologies, Inc. Image and text data hierarchical classifiers
US9704020B2 (en) * 2015-06-16 2017-07-11 Microsoft Technology Licensing, Llc Automatic recognition of entities in media-captured events
US9390315B1 (en) * 2015-06-25 2016-07-12 A9.Com, Inc. Image match for featureless objects
US9952676B2 (en) * 2015-06-25 2018-04-24 Intel Corporation Wearable device with gesture recognition mechanism
US9883249B2 (en) * 2015-06-26 2018-01-30 Amazon Technologies, Inc. Broadcaster tools for interactive shopping interfaces
US10129582B2 (en) * 2015-06-30 2018-11-13 Kempt, LLC Systems, methods, and computer program products for capturing spectator content displayed at live events
JP6620439B2 (ja) * 2015-07-01 2019-12-18 株式会社リコー 学習方法、プログラム及び学習装置
CN107735795B (zh) * 2015-07-02 2021-11-26 北京市商汤科技开发有限公司 用于社会关系识别的方法和系统
US9792492B2 (en) * 2015-07-07 2017-10-17 Xerox Corporation Extracting gradient features from neural networks
US10643245B2 (en) * 2016-07-15 2020-05-05 NXT-ID, Inc. Preference-driven advertising systems and methods
US10311366B2 (en) * 2015-07-29 2019-06-04 Adobe Inc. Procedurally generating sets of probabilistically distributed styling attributes for a digital design
CN108027972B (zh) * 2015-07-30 2022-03-15 北京市商汤科技开发有限公司 用于对象跟踪的系统和方法
US10922722B2 (en) * 2015-07-31 2021-02-16 Verizon Media Inc. System and method for contextual video advertisement serving in guaranteed display advertising
US11071501B2 (en) * 2015-08-14 2021-07-27 Elucid Bioiwaging Inc. Quantitative imaging for determining time to adverse event (TTE)
US10769533B2 (en) * 2015-09-04 2020-09-08 Baidu Usa Llc Systems and methods for efficient neural network deployments
CN106530194B (zh) * 2015-09-09 2020-02-07 阿里巴巴集团控股有限公司 一种疑似侵权产品图片的检测方法及装置
US20170083086A1 (en) * 2015-09-18 2017-03-23 Kai Mazur Human-Computer Interface
US20170083524A1 (en) * 2015-09-22 2017-03-23 Riffsy, Inc. Platform and dynamic interface for expression-based retrieval of expressive media content
US9788022B2 (en) * 2015-09-29 2017-10-10 Verizon Patent And Licensing Inc. Systems and methods for optimizing digital advertisement insertion
US10169684B1 (en) * 2015-10-01 2019-01-01 Intellivision Technologies Corp. Methods and systems for recognizing objects based on one or more stored training images
CN106709404B (zh) * 2015-11-16 2022-01-04 佳能株式会社 图像处理装置及图像处理方法
US9877058B2 (en) * 2015-12-02 2018-01-23 International Business Machines Corporation Presenting personalized advertisements on smart glasses in a movie theater based on emotion of a viewer
US20170161772A1 (en) * 2015-12-03 2017-06-08 Rovi Guides, Inc. Methods and Systems for Targeted Advertising Using Machine Learning Techniques
CN105512273A (zh) * 2015-12-03 2016-04-20 中山大学 一种基于可变长深度哈希学习的图像检索方法
US10970863B2 (en) * 2015-12-28 2021-04-06 Andrew John-Haidukewych Hayduke System and method of analyzing features of the human face and breasts using one or more overlay grids
US10699296B2 (en) * 2015-12-30 2020-06-30 Verizon Patent And Licensing, Inc. Native video advertising with voice-based ad management and machine-to-machine ad bidding
JP6845982B2 (ja) * 2016-01-13 2021-03-24 フォーブ インコーポレーテッド 表情認識システム、表情認識方法及び表情認識プログラム
CN106991367B (zh) * 2016-01-21 2019-03-19 腾讯科技(深圳)有限公司 确定人脸转动角度的方法和装置
CN108701210B (zh) * 2016-02-02 2021-08-17 北京市商汤科技开发有限公司 用于cnn网络适配和对象在线追踪的方法和系统
EP3203412A1 (en) * 2016-02-05 2017-08-09 Delphi Technologies, Inc. System and method for detecting hand gestures in a 3d space
CA3013948A1 (en) * 2016-02-08 2017-08-17 Nuralogix Corporation System and method for detecting invisible human emotion in a retail environment
US9870638B2 (en) * 2016-02-24 2018-01-16 Ondrej Jamri{hacek over (s)}ka Appearance transfer techniques
US9852523B2 (en) * 2016-02-24 2017-12-26 Ondrej Jamri{hacek over (s)}ka Appearance transfer techniques maintaining temporal coherence
CN107133622B (zh) * 2016-02-29 2022-08-26 阿里巴巴集团控股有限公司 一种单词的分割方法和装置
US11030604B2 (en) * 2016-02-29 2021-06-08 Signpost Corporation Information processing system
US11741639B2 (en) * 2016-03-02 2023-08-29 Holition Limited Locating and augmenting object features in images
JP2017163180A (ja) * 2016-03-07 2017-09-14 富士通株式会社 ずれ判定プログラム、ずれ判定方法、及び、情報処理装置
WO2017156084A2 (en) * 2016-03-11 2017-09-14 Arizona Board Of Regents On Behalf Of Arizona State University Systems and methods for non-contact monitoring of ballistocardiogram, photoplethysmogram, blood pressure and abnormal heart rhythm
WO2017156628A1 (en) * 2016-03-17 2017-09-21 Avigilon Corporation System and method for training object classifier by machine learning
JP6122987B1 (ja) * 2016-03-18 2017-04-26 ヤフー株式会社 決定装置、決定方法、決定プログラム
US10839573B2 (en) * 2016-03-22 2020-11-17 Adobe Inc. Apparatus, systems, and methods for integrating digital media content into other digital media content
US10074161B2 (en) * 2016-04-08 2018-09-11 Adobe Systems Incorporated Sky editing based on image composition
EP3232368A1 (en) * 2016-04-14 2017-10-18 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Determining facial parameters
US11055537B2 (en) * 2016-04-26 2021-07-06 Disney Enterprises, Inc. Systems and methods for determining actions depicted in media contents based on attention weights of media content frames
US10157477B2 (en) * 2016-04-27 2018-12-18 Bellus 3D, Inc. Robust head pose estimation with a depth camera
US10755438B2 (en) * 2016-04-27 2020-08-25 Bellus 3D, Inc. Robust head pose estimation with a depth camera
US11055063B2 (en) * 2016-05-02 2021-07-06 Marvell Asia Pte, Ltd. Systems and methods for deep learning processor
WO2017191525A2 (en) * 2016-05-03 2017-11-09 Yembo, Inc. Systems and methods for providing ai-based cost estimates for services
WO2017192183A1 (en) * 2016-05-04 2017-11-09 Google Llc Augmenting neural networks with external memory using reinforcement learning
US20170323374A1 (en) * 2016-05-06 2017-11-09 Seok Hyun Park Augmented reality image analysis methods for the virtual fashion items worn
KR102465227B1 (ko) * 2016-05-30 2022-11-10 소니그룹주식회사 영상 음향 처리 장치 및 방법, 및 프로그램이 저장된 컴퓨터 판독 가능한 기록 매체
US11314967B2 (en) * 2016-06-01 2022-04-26 Ohio State Innovation Foundation System and method for recognition and annotation of facial expressions
US11409791B2 (en) * 2016-06-10 2022-08-09 Disney Enterprises, Inc. Joint heterogeneous language-vision embeddings for video tagging and search
CN106127120B (zh) * 2016-06-16 2018-03-13 北京市商汤科技开发有限公司 姿势估计方法和装置、计算机系统
US10282595B2 (en) * 2016-06-24 2019-05-07 International Business Machines Corporation Facial recognition encode analysis
WO2018003421A1 (ja) * 2016-06-30 2018-01-04 パナソニックIpマネジメント株式会社 画像処理装置および画像処理方法
WO2018009666A1 (en) * 2016-07-06 2018-01-11 Facebook, Inc. Combining faces from source images with target images based on search queries
US20180012253A1 (en) * 2016-07-07 2018-01-11 Facebook, Inc. Content data model for optimized content creation
US10726443B2 (en) * 2016-07-11 2020-07-28 Samsung Electronics Co., Ltd. Deep product placement
WO2018012136A1 (ja) * 2016-07-14 2018-01-18 パナソニックIpマネジメント株式会社 メイクアップ支援装置およびメイクアップ支援方法
US10573048B2 (en) * 2016-07-25 2020-02-25 Oath Inc. Emotional reaction sharing
US10600220B2 (en) * 2016-08-01 2020-03-24 Facebook, Inc. Systems and methods for content interaction
US10972495B2 (en) * 2016-08-02 2021-04-06 Invincea, Inc. Methods and apparatus for detecting and identifying malware by mapping feature data into a semantic space
CN106295567B (zh) * 2016-08-10 2019-04-12 腾讯科技(深圳)有限公司 一种关键点的定位方法及终端
EP3501014A1 (en) * 2016-08-17 2019-06-26 VID SCALE, Inc. Secondary content insertion in 360-degree video
CN107343220B (zh) * 2016-08-19 2019-12-31 北京市商汤科技开发有限公司 数据处理方法、装置和终端设备
WO2018033156A1 (zh) * 2016-08-19 2018-02-22 北京市商汤科技开发有限公司 视频图像的处理方法、装置和电子设备
US10121513B2 (en) * 2016-08-30 2018-11-06 International Business Machines Corporation Dynamic image content overlaying
US9824692B1 (en) * 2016-09-12 2017-11-21 Pindrop Security, Inc. End-to-end speaker recognition using deep neural network
EP3516583B1 (en) * 2016-09-21 2023-03-01 Gumgum, Inc. Machine learning models for identifying objects depicted in image or video data
CN107920248B (zh) * 2016-10-11 2020-10-30 京东方科技集团股份有限公司 图像编解码装置、图像处理系统、训练方法和显示装置
CN109890245A (zh) * 2016-10-24 2019-06-14 松下知识产权经营株式会社 图像处理装置、图像处理方法以及图像处理程序
KR102415506B1 (ko) * 2016-10-26 2022-07-01 삼성전자주식회사 뉴럴 네트워크 간소화 방법 및 장치
WO2018082084A1 (zh) * 2016-11-07 2018-05-11 中国科学院自动化研究所 融合全卷积神经网络和条件随机场的脑肿瘤自动分割方法
CN108074215B (zh) * 2016-11-09 2020-04-14 京东方科技集团股份有限公司 图像升频系统及其训练方法、以及图像升频方法
US10496885B2 (en) * 2016-11-11 2019-12-03 Qualcomm Incorporated Unified embedding with metric learning for zero-exemplar event detection
JP6854344B2 (ja) * 2016-11-15 2021-04-07 マジック リープ, インコーポレイテッドMagic Leap,Inc. 直方体検出のための深層機械学習システム
US10303979B2 (en) * 2016-11-16 2019-05-28 Phenomic Ai Inc. System and method for classifying and segmenting microscopy images with deep multiple instance learning
JP6874772B2 (ja) * 2016-11-25 2021-05-19 日本電気株式会社 画像生成装置、画像生成方法、およびプログラム
KR20180062647A (ko) * 2016-12-01 2018-06-11 삼성전자주식회사 눈 검출 방법 및 장치
US20180160158A1 (en) * 2016-12-06 2018-06-07 Bing Liu Method and system for live stream broadcast and content monetization
CN108229509B (zh) * 2016-12-16 2021-02-26 北京市商汤科技开发有限公司 用于识别物体类别的方法及装置、电子设备
US11832969B2 (en) * 2016-12-22 2023-12-05 The Johns Hopkins University Machine learning approach to beamforming
US20180181864A1 (en) * 2016-12-27 2018-06-28 Texas Instruments Incorporated Sparsified Training of Convolutional Neural Networks
US10515108B2 (en) * 2016-12-30 2019-12-24 Facebook, Inc. Dynamically ranking media effects based on user and device characteristics
CN108229269A (zh) * 2016-12-31 2018-06-29 深圳市商汤科技有限公司 人脸检测方法、装置和电子设备
CN108268885B (zh) * 2017-01-03 2020-06-30 京东方科技集团股份有限公司 特征点检测方法、设备和计算机可读存储介质
US10575067B2 (en) * 2017-01-04 2020-02-25 Samsung Electronics Co., Ltd. Context based augmented advertisement
US10530991B2 (en) * 2017-01-28 2020-01-07 Microsoft Technology Licensing, Llc Real-time semantic-aware camera exposure control
US10482639B2 (en) * 2017-02-21 2019-11-19 Adobe Inc. Deep high-resolution style synthesis
US10430978B2 (en) * 2017-03-02 2019-10-01 Adobe Inc. Editing digital images utilizing a neural network with an in-network rendering layer
US10187689B2 (en) * 2017-03-16 2019-01-22 The Directv Group, Inc Dynamic advertisement insertion
US10872272B2 (en) * 2017-04-13 2020-12-22 L'oreal System and method using machine learning for iris tracking, measurement, and simulation
US11030732B2 (en) * 2017-04-14 2021-06-08 Sony Interactive Entertainment Inc. Information processing device, information processing system, and image processing method for generating a sum picture by adding pixel values of multiple pictures
CN108229278B (zh) * 2017-04-14 2020-11-17 深圳市商汤科技有限公司 人脸图像处理方法、装置和电子设备
CN108229279B (zh) * 2017-04-14 2020-06-02 深圳市商汤科技有限公司 人脸图像处理方法、装置和电子设备
US10740613B1 (en) * 2017-04-20 2020-08-11 Digimarc Corporation Hybrid feature point/watermark-based augmented reality
CA3002470A1 (en) * 2017-04-24 2018-10-24 Evertz Microsystems Ltd. Systems and methods for media production and editing
CN107203897A (zh) * 2017-04-24 2017-09-26 广东数相智能科技有限公司 一种产品推荐度的评价方法、装置及系统
US10078909B1 (en) * 2017-05-16 2018-09-18 Facebook, Inc. Video stream customization using graphics
US10096169B1 (en) * 2017-05-17 2018-10-09 Samuel Chenillo System for the augmented assessment of virtual insertion opportunities
US11095942B2 (en) * 2017-05-25 2021-08-17 Turner Broadcasting System, Inc. Rules-based delivery and presentation of non-programming media items at client device
JP6726641B2 (ja) * 2017-05-26 2020-07-22 日東電工株式会社 画像分類プログラム、分類データ作成プログラム、及び、分類データ作成方法
US10331942B2 (en) * 2017-05-31 2019-06-25 Facebook, Inc. Face liveness detection
US20180357819A1 (en) * 2017-06-13 2018-12-13 Fotonation Limited Method for generating a set of annotated images
US20180374138A1 (en) * 2017-06-23 2018-12-27 Vufind Inc. Leveraging delayed and partial reward in deep reinforcement learning artificial intelligence systems to provide purchase recommendations
US11210498B2 (en) * 2017-06-26 2021-12-28 Nec Corporation Facial authentication device, facial authentication method, and program recording medium
US10438350B2 (en) * 2017-06-27 2019-10-08 General Electric Company Material segmentation in image volumes
CN108229468B (zh) * 2017-06-28 2020-02-21 北京市商汤科技开发有限公司 车辆外观特征识别及车辆检索方法、装置、存储介质、电子设备
US20190005149A1 (en) * 2017-07-03 2019-01-03 Nokia Solutions And Networks Oy Graph diffusion similarity measure for structured and unstructured data sets
US10474908B2 (en) * 2017-07-06 2019-11-12 GM Global Technology Operations LLC Unified deep convolutional neural net for free-space estimation, object detection and object pose estimation
US11630040B2 (en) * 2017-07-11 2023-04-18 Qatar University Real-time structural damage detection by convolutional neural networks
US10733755B2 (en) * 2017-07-18 2020-08-04 Qualcomm Incorporated Learning geometric differentials for matching 3D models to objects in a 2D image
CN109359499A (zh) * 2017-07-26 2019-02-19 虹软科技股份有限公司 一种用于脸部分类的方法和装置
CN109299636A (zh) * 2017-07-25 2019-02-01 丽宝大数据股份有限公司 可标示腮红区域的身体信息分析装置
CN109299639B (zh) * 2017-07-25 2021-03-16 虹软科技股份有限公司 一种用于表情识别的方法和装置
US20190035113A1 (en) * 2017-07-27 2019-01-31 Nvidia Corporation Temporally stable data reconstruction with an external recurrent neural network
US10327026B1 (en) * 2017-08-03 2019-06-18 Amazon Technologies, Inc. Presenting content-specific video advertisements upon request
CN108229293A (zh) * 2017-08-09 2018-06-29 北京市商汤科技开发有限公司 人脸图像处理方法、装置和电子设备
WO2019028798A1 (zh) * 2017-08-10 2019-02-14 北京市商汤科技开发有限公司 驾驶状态监控方法、装置和电子设备
US10929987B2 (en) * 2017-08-16 2021-02-23 Nvidia Corporation Learning rigidity of dynamic scenes for three-dimensional scene flow estimation
CN108229647A (zh) * 2017-08-18 2018-06-29 北京市商汤科技开发有限公司 神经网络结构的生成方法和装置、电子设备、存储介质
CN109426767A (zh) * 2017-08-24 2019-03-05 丽宝大数据股份有限公司 眼线描绘指引装置及其方法
US10839257B2 (en) * 2017-08-30 2020-11-17 Qualcomm Incorporated Prioritizing objects for object recognition
GB201714000D0 (en) * 2017-08-31 2017-10-18 Mirriad Advertising Ltd Machine learning for identification of candidate video insertion object types
US20190073589A1 (en) * 2017-09-01 2019-03-07 Pointr Data Inc. Multiplicity of intersecting neural networks overlay workloads
US10248971B2 (en) * 2017-09-07 2019-04-02 Customer Focus Software Limited Methods, systems, and devices for dynamically generating a personalized advertisement on a website for manufacturing customizable products
CN107578017B (zh) * 2017-09-08 2020-11-17 百度在线网络技术(北京)有限公司 用于生成图像的方法和装置
CN107516090B (zh) * 2017-09-11 2021-09-17 北京百度网讯科技有限公司 一体化人脸识别方法和系统
US10810657B2 (en) * 2017-09-15 2020-10-20 Waldo Photos, Inc. System and method adapted to facilitate sale of digital images while preventing theft thereof
US20190087712A1 (en) * 2017-09-18 2019-03-21 Qualcomm Incorporated Neural Network Co-Processing
CN107644209A (zh) * 2017-09-21 2018-01-30 百度在线网络技术(北京)有限公司 人脸检测方法和装置
CN107491771A (zh) * 2017-09-21 2017-12-19 百度在线网络技术(北京)有限公司 人脸检测方法和装置
EP3467707B1 (en) * 2017-10-07 2024-03-13 Tata Consultancy Services Limited System and method for deep learning based hand gesture recognition in first person view
US20190122082A1 (en) * 2017-10-23 2019-04-25 Motionloft, Inc. Intelligent content displays
US10713489B2 (en) * 2017-10-24 2020-07-14 Microsoft Technology Licensing, Llc Augmented reality for identification and grouping of entities in social networks
US11263525B2 (en) * 2017-10-26 2022-03-01 Nvidia Corporation Progressive modification of neural networks
US11004209B2 (en) * 2017-10-26 2021-05-11 Qualcomm Incorporated Methods and systems for applying complex object detection in a video analytics system
US11016729B2 (en) * 2017-11-08 2021-05-25 International Business Machines Corporation Sensor fusion service to enhance human computer interactions
US10515296B2 (en) * 2017-11-14 2019-12-24 Adobe Inc. Font recognition by dynamically weighting multiple deep learning neural networks
US10769411B2 (en) * 2017-11-15 2020-09-08 Qualcomm Technologies, Inc. Pose estimation and model retrieval for objects in images
CN108229305B (zh) * 2017-11-21 2021-06-04 北京市商汤科技开发有限公司 用于确定目标对象的外接框的方法、装置和电子设备
CN108229307B (zh) * 2017-11-22 2022-01-04 北京市商汤科技开发有限公司 用于物体检测的方法、装置和设备
CN108229308A (zh) * 2017-11-23 2018-06-29 北京市商汤科技开发有限公司 目标对象识别方法、装置、存储介质和电子设备
US10643383B2 (en) * 2017-11-27 2020-05-05 Fotonation Limited Systems and methods for 3D facial modeling
US20190025773A1 (en) * 2017-11-28 2019-01-24 Intel Corporation Deep learning-based real-time detection and correction of compromised sensors in autonomous machines
US20180359516A1 (en) * 2017-12-04 2018-12-13 Konstantin Kevin Gorinshteyn Flexible Video Platform with Optional Advertising
US10839151B2 (en) * 2017-12-05 2020-11-17 myFavorEats Ltd. Systems and methods for automatic analysis of text-based food-recipes
CN109905590B (zh) * 2017-12-08 2021-04-27 腾讯科技(深圳)有限公司 一种视频图像处理方法及装置
US10503970B1 (en) * 2017-12-11 2019-12-10 State Farm Mutual Automobile Insurance Company Method and system for identifying biometric characteristics using machine learning techniques
US10712811B2 (en) * 2017-12-12 2020-07-14 Facebook, Inc. Providing a digital model of a corresponding product in a camera feed
CN109960986A (zh) * 2017-12-25 2019-07-02 北京市商汤科技开发有限公司 人脸姿态分析方法、装置、设备、存储介质以及程序
US11068741B2 (en) * 2017-12-28 2021-07-20 Qualcomm Incorporated Multi-resolution feature description for object recognition
US11010944B2 (en) * 2017-12-28 2021-05-18 Facebook, Inc. Systems and methods for swapping faces and face components based on facial recognition
US10482321B2 (en) * 2017-12-29 2019-11-19 Cerner Innovation, Inc. Methods and systems for identifying the crossing of a virtual barrier
US10375354B2 (en) * 2018-01-05 2019-08-06 Facebook, Inc. Video communication using subtractive filtering
US11429807B2 (en) * 2018-01-12 2022-08-30 Microsoft Technology Licensing, Llc Automated collection of machine learning training data
CN110059522B (zh) * 2018-01-19 2021-06-25 北京市商汤科技开发有限公司 人体轮廓关键点检测方法、图像处理方法、装置及设备
US10706577B2 (en) * 2018-03-06 2020-07-07 Fotonation Limited Facial features tracker with advanced training for natural rendering of human faces in real-time
JP6859970B2 (ja) * 2018-03-09 2021-04-14 京セラドキュメントソリューションズ株式会社 ログイン支援システム
US10603593B2 (en) * 2018-03-21 2020-03-31 Valve Corporation Automatically reducing use of cheat software in an online game environment
CN108416321A (zh) * 2018-03-23 2018-08-17 北京市商汤科技开发有限公司 用于预测目标对象运动朝向的方法、车辆控制方法及装置
US11206375B2 (en) * 2018-03-28 2021-12-21 Gal Zuckerman Analyzing past events by utilizing imagery data captured by a plurality of on-road vehicles
WO2019190142A1 (en) * 2018-03-29 2019-10-03 Samsung Electronics Co., Ltd. Method and device for processing image
US20190329790A1 (en) * 2018-04-25 2019-10-31 Uber Technologies, Inc. Systems and Methods for Using Machine Learning to Determine Passenger Ride Experience
CN108830288A (zh) * 2018-04-25 2018-11-16 北京市商汤科技开发有限公司 图像处理方法、神经网络的训练方法、装置、设备及介质
US10956714B2 (en) * 2018-05-18 2021-03-23 Beijing Sensetime Technology Development Co., Ltd Method and apparatus for detecting living body, electronic device, and storage medium
US10521963B1 (en) * 2018-06-08 2019-12-31 Verizon Patent And Licensing Inc. Methods and systems for representing a pre-modeled object within virtual reality data
US11040227B2 (en) * 2018-06-28 2021-06-22 The Gmn Group Llc Respirator fitting device and method
US10733292B2 (en) * 2018-07-10 2020-08-04 International Business Machines Corporation Defending against model inversion attacks on neural networks
CN108921117A (zh) * 2018-07-11 2018-11-30 北京市商汤科技开发有限公司 图像处理方法及装置、电子设备和存储介质
KR102511292B1 (ko) * 2018-07-11 2023-03-17 삼성전자주식회사 전자 장치의 객체 인증 장치 및 방법
US10282720B1 (en) * 2018-07-16 2019-05-07 Accel Robotics Corporation Camera-based authorization extension system
US10373322B1 (en) * 2018-07-16 2019-08-06 Accel Robotics Corporation Autonomous store system that analyzes camera images to track people and their interactions with items
US10210860B1 (en) * 2018-07-27 2019-02-19 Deepgram, Inc. Augmented generalized deep learning with special vocabulary
CN110852134A (zh) * 2018-07-27 2020-02-28 北京市商汤科技开发有限公司 活体检测方法、装置及系统、电子设备和存储介质
US11036807B2 (en) * 2018-07-31 2021-06-15 Marvell Asia Pte Ltd Metadata generation at the storage edge
CN110795976B (zh) * 2018-08-03 2023-05-05 华为云计算技术有限公司 一种训练物体检测模型的方法、装置以及设备
US11138418B2 (en) * 2018-08-06 2021-10-05 Gal Zuckerman Systems and methods for tracking persons by utilizing imagery data captured by on-road vehicles
CN109308679B (zh) * 2018-08-13 2022-08-30 深圳市商汤科技有限公司 一种图像风格转换方法及装置、设备、存储介质
US20200066046A1 (en) * 2018-08-24 2020-02-27 Facebook, Inc. Sharing and Presentation of Content Within Augmented-Reality Environments
CN109409204B (zh) * 2018-09-07 2021-08-06 北京市商汤科技开发有限公司 防伪检测方法和装置、电子设备、存储介质
US11325252B2 (en) * 2018-09-15 2022-05-10 X Development Llc Action prediction networks for robotic grasping
CN109389069B (zh) * 2018-09-28 2021-01-05 北京市商汤科技开发有限公司 注视点判断方法和装置、电子设备和计算机存储介质
EP3640951A1 (en) * 2018-10-15 2020-04-22 Siemens Healthcare GmbH Evaluating a condition of a person
CN111079476B (zh) * 2018-10-19 2024-03-26 上海商汤智能科技有限公司 驾驶状态分析方法和装置、驾驶员监控系统、车辆
CN111079475A (zh) * 2018-10-19 2020-04-28 上海商汤智能科技有限公司 驾驶状态检测方法和装置、驾驶员监控系统、车辆
US10896320B2 (en) * 2018-11-14 2021-01-19 Baidu Usa Llc Child face distance alert system
US10977767B2 (en) * 2018-11-28 2021-04-13 Adobe Inc. Propagation of spot healing edits from one image to multiple images
US11010896B2 (en) * 2018-12-17 2021-05-18 Bodygram, Inc. Methods and systems for generating 3D datasets to train deep learning networks for measurements estimation
CN109697734B (zh) * 2018-12-25 2021-03-09 浙江商汤科技开发有限公司 位姿估计方法及装置、电子设备和存储介质
CN109522910B (zh) * 2018-12-25 2020-12-11 浙江商汤科技开发有限公司 关键点检测方法及装置、电子设备和存储介质
US11301718B2 (en) * 2018-12-28 2022-04-12 Vizit Labs, Inc. Systems, methods, and storage media for training a machine learning model
US11114086B2 (en) * 2019-01-18 2021-09-07 Snap Inc. Text and audio-based real-time face reenactment
CN111857111A (zh) * 2019-04-09 2020-10-30 商汤集团有限公司 对象三维检测及智能驾驶控制方法、装置、介质及设备
US11010872B2 (en) * 2019-04-29 2021-05-18 Intel Corporation Method and apparatus for person super resolution from low resolution image
CN112101066B (zh) * 2019-06-17 2024-03-08 商汤集团有限公司 目标检测方法和装置及智能驾驶方法、设备和存储介质
US11479148B2 (en) * 2019-08-08 2022-10-25 GM Global Technology Operations LLC Personalization settings based on body measurements
CN115605918A (zh) * 2019-10-04 2023-01-13 伟摩有限责任公司(Us) 时空嵌入
US11250572B2 (en) * 2019-10-21 2022-02-15 Salesforce.Com, Inc. Systems and methods of generating photorealistic garment transference in images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339549A (zh) * 2007-07-03 2009-01-07 周磊 一种广告方法和系统
CN101364368A (zh) * 2008-09-18 2009-02-11 北京聚万传媒科技有限公司 在视频广告系统中嵌入和播放电子地图的方法及实现装置
US8904033B2 (en) * 2010-06-07 2014-12-02 Adobe Systems Incorporated Buffering media content
US20130136416A1 (en) * 2011-11-30 2013-05-30 Nokia Corporation Method and apparatus for enriching media with meta-information
CN103702211A (zh) * 2013-12-09 2014-04-02 Tcl集团股份有限公司 一种基于电视播放内容的广告推送方法和系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11308655B2 (en) 2018-08-24 2022-04-19 Beijing Microlive Vision Technology Co., Ltd Image synthesis method and apparatus

Also Published As

Publication number Publication date
US20180108165A1 (en) 2018-04-19
US11037348B2 (en) 2021-06-15

Similar Documents

Publication Publication Date Title
WO2018033137A1 (zh) 在视频图像中展示业务对象的方法、装置和电子设备
WO2018033155A1 (zh) 视频图像的处理方法、装置和电子设备
WO2018033154A1 (zh) 手势控制方法、装置和电子设备
US10776970B2 (en) Method and apparatus for processing video image and computer readable medium
WO2018033143A1 (zh) 视频图像的处理方法、装置和电子设备
US11182591B2 (en) Methods and apparatuses for detecting face, and electronic devices
US20220004765A1 (en) Image processing method and apparatus, and storage medium
US11044295B2 (en) Data processing method, apparatus and electronic device
CN107343211B (zh) 视频图像处理方法、装置和终端设备
US10575067B2 (en) Context based augmented advertisement
US20200410770A1 (en) Augmented reality (ar) providing apparatus and method for recognizing context using neural network, and non-transitory computer-readable record medium for executing the method
US11182963B2 (en) Computerized system and method for providing a mobile augmented reality item display and selection experience
WO2018228384A1 (zh) 图像处理方法、装置、电子设备及存储介质
US11657575B2 (en) Generating augmented reality content based on third-party content
US11816926B2 (en) Interactive augmented reality content including facial synthesis
WO2020024692A1 (zh) 一种人机交互方法和装置
US20220319060A1 (en) Facial synthesis in augmented reality content for advertisements
US20240104954A1 (en) Facial synthesis in augmented reality content for online communities
CN107770602B (zh) 视频图像处理方法、装置和终端设备
WO2022146890A1 (en) Detection and obfuscation of display screens in augmented reality content
CN107770603B (zh) 视频图像处理方法、装置和终端设备
US20240062500A1 (en) Generating ground truths for machine learning
CN108074127B (zh) 业务对象的数据分析方法、装置和电子设备
US20230388109A1 (en) Generating a secure random number by determining a change in parameters of digital content in subsequent frames via graphics processing circuitry

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17841109

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17841109

Country of ref document: EP

Kind code of ref document: A1