CN112714337A

CN112714337A - Video processing method and device, electronic equipment and storage medium

Info

Publication number: CN112714337A
Application number: CN202011532518.1A
Authority: CN
Inventors: 刘巍
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-04-27

Abstract

The disclosure discloses a video processing method, relates to the technical field of artificial intelligence, and particularly relates to the technical field of image processing. The specific implementation scheme is as follows: acquiring a plurality of key points of a real object in a video; generating a virtual object of the real object according to a plurality of key points of the real object; determining a mapping relation between a plurality of key points in the real object and a plurality of key points in the virtual object; and displaying the virtual object in the area of the real object in the video based on the mapping relation. The present disclosure also discloses a video processing apparatus, an electronic device, and a storage medium.

Description

Video processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and more particularly, to image processing techniques. More particularly, the present disclosure provides a video processing method, apparatus, electronic device, and storage medium.

Background

With the rise of the mobile internet, sharing becomes an important creation direction for many video creators on a daily basis. At present, creators mostly expose faces of personal real images, but many people feel that portraits have extremely high position in personal privacy and do not want to expose faces. In addition, some people do not want to expose information such as brands, logos, etc. of personal items.

In the related art, the creator may use a map or mosaic for occlusion, but this is often out of the way with the video content, resulting in low video quality.

Disclosure of Invention

The disclosure provides a video processing method, apparatus, device and storage medium.

According to an aspect of the present disclosure, there is provided a video processing method including: acquiring a plurality of key points of a real object in a video; generating a virtual object of the real object according to a plurality of key points of the real object; determining a mapping relation between a plurality of key points in the real object and a plurality of key points in the virtual object; and displaying the virtual object in the area of the real object in the video based on the mapping relation.

According to another aspect of the present disclosure, there is provided a video processing apparatus including: the acquisition module is used for acquiring a plurality of key points of a real object in a video; the generating module is used for generating a virtual object of the real object according to the plurality of key points of the real object; the determining module is used for determining the mapping relation between a plurality of key points in the real object and a plurality of key points in the virtual object; and the display module is used for displaying the virtual object in the area of the real object in the video based on the mapping relation.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method provided according to the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method provided according to the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1A is a schematic diagram of an exemplary system architecture to which the video processing method and apparatus may be applied, according to one embodiment of the present disclosure;

FIG. 1B is a schematic diagram of an exemplary scene to which the video processing method and apparatus may be applied, according to one embodiment of the present disclosure;

FIG. 2 is a flow diagram of a video processing method according to one embodiment of the present disclosure;

FIG. 3 is a flow diagram of a video processing method according to one embodiment of the present disclosure;

FIG. 4 is a flow diagram of a video processing method according to one embodiment of the present disclosure;

fig. 5 is a block diagram of a video processing device according to one embodiment of the present disclosure;

fig. 6 is a block diagram of an electronic device of a video processing method according to one embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the related art, the use of real images by video creators results in the exposure of portrait privacy. Sensitive information of some items in the video (e.g., luxury brands, etc.) may have some adverse effects on the originator. If the video is shielded by using the mapping or the mosaic, the quality reduction of the video is reduced, and the position of the mapping needs to be adjusted all the time according to the action and the position of a character in each frame of the video, so that the video creation threshold and difficulty are increased, a plurality of people can hope to stop walking, and the number of creators is greatly influenced.

Some related technologies use a neural network model to generate a virtual character image or a virtual article, and use the virtual character image or the virtual article to replace a real character or an article to be imaged, so that the requirements of creators can be met. However, when the virtual object is used to directly replace the real object in the video, the position relationship is difficult to completely correspond, which easily causes the problem that the virtual object and the real object cannot be completely attached to each other, resulting in fuzzy, burr, black edge or blurring of the edge position of the virtual object.

Fig. 1A is a schematic diagram of an exemplary system architecture to which the video processing method and apparatus may be applied, according to one embodiment of the present disclosure. It should be noted that fig. 1A is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1A, the system architecture 100 according to this embodiment may include a terminal device 101, a network 102, and a server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired and/or wireless communication links, and so forth.

Various video processing software can be installed on the terminal device 101, and the user can perform editing functions such as cropping, splitting, and merging of videos using the video processing software of the terminal device 101. Terminal device 101 includes, but is not limited to, a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like.

The server 103 may be an electronic device providing a virtual object generation service. For example, the server 103 may obtain a video on the terminal device 101 through the network 102, and then capture a human object and key points of the human object in the video, where the key points of the human object may include key points on the eyes, nose, mouth, eyebrows, etc. of a human face, and each key point includes a group of pixel points, for example, a group of pixel points on the corners of the eyes constitutes one key point on the eyes. The position and color of each pixel in a group of pixels can characterize the characteristics of the key points formed by the group of pixels. The key points of the face can represent the features of the face, and the features of the face comprise the size and shape of each part and facial expression, such as the size and shape of a nose, the rising angle of a mouth corner, frowning and the like. The key points of the human subject may also include key points on the limb that may characterize the bending, twisting, etc. motion of the limb. The server 103 performs training of the neural network model using the acquired face key points and limb key points, so that the trained neural network model can output a virtual character object. Similarly, key points of the object can be acquired, the key points of the object can represent the size and the shape of the object, and the key points of the object are used for training the neural network model to obtain a model capable of outputting the virtual object.

The virtual object generation model trained by the server 103 can be stored in the server 103 for use. Illustratively, a user edits a piece of video using the terminal device 101, it is necessary to replace a real character image in the video with an avatar, several frames of images including the character image in the video or the video may be transmitted to the server 103, the server 103 may capture a plurality of key points of the real character object in the video and then input the plurality of key points to the virtual object generation model, so that the virtual object generation model outputs the virtual character object. The virtual object generation model may output a plurality of virtual character objects in a plurality of styles and then transmit the plurality of virtual character objects to the terminal device 101 for selection by the user. If the user is not satisfied with the generated avatar object and the user chooses to regenerate, the server 103 may collect more key points, generate a new avatar object and send it to the terminal device 101 for the user to select.

The virtual object generation model trained by the server 103 may also be stored in the terminal apparatus 101 for use. Illustratively, the terminal device 101 may capture a plurality of key points of a real character object in a video, then input the plurality of key points to the virtual object generation model, cause the virtual object generation model to output the virtual character object, and then present the virtual character object for selection by a user.

It should be understood that the number of terminal devices, networks, and servers in FIG. 1A are merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 1B is an exemplary scene schematic diagram to which the video processing method and apparatus may be applied, according to one embodiment of the present disclosure.

As shown in fig. 1B, the character figure in the left image is a real character object in the video, and the character figure in the right image is a generated virtual character object. The real human object in the left figure includes a plurality of key points, and the key points of the real human object may be represented as a ═ a (a ═ a)₁、A₂、A₃……A_n). The plurality of key points of the virtual character object in the right diagram may be represented as a' ═ (a)₁’、A₂’、A₃’……A_n'), each key point a of the real object ═ a (a)₁、A₂、A₃……A_n) With each key point a' in the virtual object (a)₁’、A₂’、A₃’……A_n') is a one-to-one correspondence. It should be noted that the number n of the key points may be infinite, and the key points shown in fig. 1B are only schematic.

According to an embodiment of the present disclosure, each key point a of a real object may be defined as (a)₁、A₂、A₃……A_n) Each of the key points a' of the virtual object is replaced with (a) in one-to-one correspondence₁’、A₂’、A₃’……A_n') or use each of the key points a' of the virtual object (a ═ a)₁’、A₂’、A₃’……A_n') each key point a of the real object is covered one-to-one (a)₁、A₂、A₃……A_n) Therefore, the virtual object is displayed in the area of the real object in the video, and the virtual object is used for replacing the real object to be taken out of the mirror, so that the portrait privacy of the user can be protected.

Fig. 2 is a flow diagram of a video processing method according to one embodiment of the present disclosure.

As shown in fig. 2, the video processing method 200 may include operations S210 to S240.

In operation S210, a plurality of key points of a real object in a video are acquired.

According to an embodiment of the present disclosure, the real objects in the video may include real person objects and real object objects. The key points of the real character object may include, for example, key points of a face including key points on the eyes, nose, mouth, eyebrows, etc., which can represent the features of the face including the size, shape, and facial expression of each part, such as the size and shape of the eyes, the raised mouth, and the expression of eyebrows, etc. The key points of the object can characterize the object size and shape.

In operation S220, a virtual object of the real object is generated according to a plurality of key points of the real object.

According to the embodiment of the disclosure, the collected key points are input into the trained virtual object generation model, the virtual object generation model simulates the key points to obtain virtual key points, and the virtual key points form a virtual object.

Illustratively, the face key points and the limb key points of the real character object are input to a virtual object generation model, which may generate a virtual face and virtual limbs, which may then be synthesized into a virtual character object.

Illustratively, keypoints of a real object are input to a virtual object generation model, which may generate a virtual object.

In operation S230, a mapping relationship between a plurality of key points in the real object and a plurality of key points in the virtual object is determined.

According to the embodiment of the present disclosure, each key point of the real object has a corresponding virtual key point in the virtual object, the real key point in the real object can be represented by a, the virtual key point in the virtual object can be represented by a',the key point a in the real object₁Corresponding virtual key points A in the virtual object₁', keypoint A in real object₂Corresponding virtual key points A in the virtual object₂' and so on.

In operation S240, a virtual object is displayed in a region of the real object in the video based on the mapping relationship.

According to an embodiment of the present disclosure, according to a mapping relationship between key points in a real object and key points of a virtual object, the real object in a video may be replaced or overlaid with the virtual object so that the virtual object is displayed in a region of the real object in the video. Illustratively, virtual keypoint A is used₁' overlay Key Point A₁Using virtual key points A₂' overlay Key Point A₂And so on. For example, the key points in the region of the real object may be eliminated, and the virtual key point a may be eliminated₁' addition to Key Point A₁Position of (2), virtual key point A₂' addition to Key Point A₂The location of (c), etc.

According to the embodiment of the disclosure, a plurality of key points of a real object in a video are acquired, a virtual object of the real object is generated according to the plurality of key points of the real object, a mapping relation between the plurality of key points of the real object and the plurality of key points of the virtual object is determined, and the virtual object is displayed in the area of the real object in the video based on the mapping relation. Because the virtual object is displayed in the area of the real object in the video, the problem of low video quality caused by the fact that a chartlet or a mosaic and a video lattice cannot enter can be avoided, and the video quality is improved.

Further, compared with the prior art that the position of the map is required to be adjusted according to the action and the position of the person in each frame of the video all the time when the map is used for shielding, the virtual object is automatically generated and displayed in the area of the real object in the video, so that the video creation threshold is reduced, the video interestingness is increased, and the portrait privacy of the user is protected.

Further, because the virtual object is displayed in the area of the real object in the video according to the mapping relationship between the plurality of key points in the real object and the plurality of key points in the virtual object, compared with the prior art in which the real object is directly replaced by the virtual object, the method can avoid the problems of edge position blurring, burrs, black edges or blurring caused by the fact that the virtual object and the real object are difficult to completely correspond in position relationship, and improve the video quality.

According to the embodiment of the disclosure, virtual objects of various styles, such as cartoon-style virtual objects, cartoon-style virtual objects and the like, can be generated according to different styles, a user can select the virtual objects according to the preference of the user, and if the user is not satisfied with the generated virtual objects, the user can select to regenerate the virtual objects, so that more key points can be collected to generate more vivid virtual objects by increasing the sampling rate.

According to the embodiment of the disclosure, the processing on the video can be performed after the video recording is completed or during the video recording process.

After the video recording is finished, key points of a real object in a first frame image or any frame image in the video can be collected, then the key point characteristics of the real object are simulated by using a virtual object generation model to generate virtual key points, the virtual key points have virtual characteristics generated by simulation, and then the virtual object is formed by the virtual key points. After the user determines to use the virtual object, the position of the virtual key point in the frame of image for collecting the key point can be determined according to the mapping relation between the key point in the real object and the virtual key point in the virtual object, and then the key point of the virtual object is used for replacing or covering the key point of the real object in the multi-frame image according to the motion track of each key point in the multi-frame image.

In the video recording process, key points of a real object in each frame of image can be collected, a virtual object is generated aiming at each frame of image, then the mapping relation between the key points of the real object and the key points of the virtual object in each frame of image is determined, and the key points of the real object are replaced or covered frame by using the key points of the virtual object on each frame of image directly according to the mapping relation.

The following description is directed to a case where the key points of the virtual object are used to replace or cover the key points of the real object in the multi-frame images according to the mapping relationship and the motion trajectories of the key points in the multi-frame images.

According to the embodiment of the disclosure, the motion trail of each key point of the real object in the multi-frame images is determined firstly. Specifically, the coordinates of each key point in the first frame image may be recorded from the first frame image in the video, then the coordinates of each key point in the second frame image may be recorded, and so on, and the coordinates of each key point in the last frame image may be recorded. The coordinates of each recorded key point in the video from the first frame image to the last frame image form the motion track of the key point.

Fig. 3 is a flow diagram of a video processing method according to another embodiment of the present disclosure.

As shown in fig. 3, the video processing method may include operations S341 to S343.

In operation S341, coordinates of respective key points of the real object in any one frame image of the multi-frame images are determined.

According to an embodiment of the present disclosure, each key point of a real object may be represented as, for example, a ═ (a ═ a)₁、A₂、A₃……A_n) Selecting any frame image X in the video, and determining the coordinates of each key point A in the image X. Image X may be the first frame image in the video, or the last frame image or any intermediate frame image.

In operation S342, coordinates of each key point of the virtual object in any one of the frame images are determined according to the coordinates and mapping relationship of each key point of the real object in any one of the frame images.

According to an embodiment of the present disclosure, each keypoint in a virtual object may be represented as, for example, a ═ (a)₁’、A₂’、A₃’……A_n'), each key point a of the real object ═ a (a)₁、A₂、A₃……A_n) With each key point a' in the virtual object (a)₁’、A₂’、A₃’……A_n') one-to-one. According to formula (A ═ a₁、A₂、A₃……A_n) At the coordinates in the image X, it is possible to determine the respective key point a' ═ in the virtual object (a)₁’、A₂’、A₃’……A_n') coordinates.

In operation S343, the virtual object is displayed in the region of the real object in the multi-frame images according to the motion trajectory based on the coordinates of the respective key points of the virtual object in any one of the frame images.

According to an embodiment of the present disclosure, at each key point a ═ (a) of the real object₁、A₂、A₃……A_n) On the motion trajectory of (a), a ═ is used in sequence₁’、A₂’、A₃’……A_n') alternative or override A ═ A₁、A₂、A₃……A_n)。

The following description is directed to replacing or overlaying the key points of the real object on each frame image frame by using the key points of the virtual object according to the mapping relationship.

Fig. 4 is a flow diagram of a video processing method according to another embodiment of the present disclosure.

As shown in fig. 4, the video processing method may include operations S441 to S443.

In operation S441, coordinates of respective key points of the real object in the respective frame images are determined.

According to an embodiment of the present disclosure, each key point a ═ of a real object is determined (a)₁、A₂、A₃……A_n) Coordinates on each frame image.

In operation S442, coordinates of each key point of the virtual object in each frame image are determined according to the coordinates and the mapping relationship of each key point of the real object in each frame image.

According to an embodiment of the present disclosure, each key point a ═ of the real object (a ═ a)₁、A₂、A₃……A_n) Determining the key point A' of the virtual object at the coordinate of each frame image(A₁’、A₂’、A₃’……A_n') coordinates on each frame image.

In operation S443, the virtual object is displayed in the region of the real object in each frame image according to the coordinates of each key point of the virtual object in each frame image.

According to an embodiment of the present disclosure, on each frame image, a key point a' of a virtual object is used (a ═ a)₁’、A₂’、A₃’……A_n') alternative or override A ═ A₁、A₂、A₃……A_n)。

According to an embodiment of the present disclosure, since a plurality of key points in a virtual object are mapped according to a plurality of key points in a real object and a plurality of key points in a virtual object, a' key point of the virtual object is used (a ═ b₁’、A₂’、A₃’……A_n') alternative or override A ═ A₁、A₂、A₃……A_n) Compared with the prior art in which the virtual object is directly used to replace the real object, the method can avoid the problems of edge position blurring, burrs, black edges or blurring caused by the fact that the position relationship between the virtual object and the real object is difficult to completely correspond, and improves the video quality.

According to an embodiment of the present disclosure, the virtual object generation model may be a model generating an virtual character, which may include a sub-model generating a virtual face and a sub-model generating a virtual limb.

The key points of the human face can be collected aiming at the real figure image in the video, the key points of the human face comprise key points on the positions of eyes, a nose, a mouth, eyebrows and the like, and each key point comprises a group of pixel points. The key points of the face can represent the features of the face, and the features of the face comprise the size, the shape and the facial expression of each part, such as the size and the shape of eyes, the rising angle of a mouth, frown and the like. Inputting the key points of the human face into a sub-model for generating the virtual human face, simulating the key points of the human face by the sub-model to generate the key points of the virtual human face, and generating the virtual human face according to the key points of the virtual human face.

For the real figure image in the video, key points on the limbs can be collected, and the key points can represent the bending, twisting and other actions of the limbs. Inputting the key points of the limb into a sub-model for generating the virtual limb, simulating the key points of the limb by the sub-model to generate the key points of the virtual limb, and generating the virtual limb according to the key points of the virtual limb.

According to the embodiment of the disclosure, the generated virtual human face and the virtual limb can be synthesized into the virtual character object, and the virtual character object is used for replacing the real character object to be seen from the mirror, so that the portrait privacy of the user is protected.

Fig. 5 is a block diagram of a video processing device according to one embodiment of the present disclosure.

As shown in fig. 5, the video processing 500 may include an acquisition module 501, a generation module 502, a determination module 503, and a display module 504.

The obtaining module 501 is configured to obtain a plurality of key points of a real object in a video.

The generating module 502 is configured to generate a virtual object of the real object according to a plurality of key points of the real object.

The determining module 503 is configured to determine a mapping relationship between a plurality of key points in the real object and a plurality of key points in the virtual object.

The display module 504 is configured to display the virtual object in the area of the real object in the video based on the mapping relationship.

According to an embodiment of the present disclosure, the video includes a plurality of frames of images, and the display module 504 includes a generating unit and a first display unit.

The generating unit is used for generating the motion trail of each key point of the real object according to the position of each key point of the real object in the multi-frame images.

The first display unit is used for displaying the virtual object in the area of the real object in the multi-frame images according to the mapping relation and the motion trail.

According to an embodiment of the present disclosure, the first display unit includes a first determination subunit, a second determination subunit, and a display subunit.

The first determining subunit is configured to determine coordinates of each of the key points of the real object in any one of the plurality of frame images.

The second determining subunit is configured to determine, according to the coordinates and the mapping relationship of each key point of the real object in any frame of image, the coordinates of each key point of the virtual object in any frame of image.

And the display subunit is used for displaying the virtual object in the area of the real object in the multi-frame images according to the motion trail according to the coordinates of each key point of the virtual object in any frame image.

According to the embodiment of the disclosure, the display subunit is specifically configured to replace or overlay the key points of the real object in the multi-frame images with the key points of the virtual object according to the motion trajectory according to the coordinates of the key points of the virtual object in any one frame image.

According to an embodiment of the present disclosure, the video includes a plurality of frames of images, and the display module 504 includes a first determining unit, a second determining unit, and a second display unit.

The first determination unit is used for determining the coordinates of each key point of the real object in each frame image.

The second determining unit is used for determining the coordinates of each key point of the virtual object in each frame image according to the coordinates and the mapping relation of each key point of the real object in each frame image.

The second display unit is used for displaying the virtual object in the area of the real object in each frame image according to the coordinates of each key point of the virtual object in each frame image.

According to an embodiment of the present disclosure, the second display unit is specifically configured to replace or overlay the key points of the real object in each frame image with the respective key points of the virtual object according to the coordinates of the respective key points of the virtual object in each frame image.

According to an embodiment of the present disclosure, the generating module 502 is configured to simulate each key point of the real object by using a preset network model, and generate a virtual object of the real object.

According to an embodiment of the present disclosure, the real object includes a human object, and the obtaining module 501 is configured to obtain key points of a face of the human object and key points of limbs of the human object.

According to the embodiment of the present disclosure, the preset network model includes a first sub-model and a second sub-model, and the generating module 502 is configured to simulate the key points of the face using the first sub-model to obtain the key points of the face of the virtual object; and simulating the key points of the limbs by using the second submodel to obtain the key points of the limbs of the virtual object.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as a video processing method. For example, in some embodiments, the video processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the video processing method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the video processing method in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A video processing method, comprising:

acquiring a plurality of key points of a real object in a video;

generating a virtual object of the real object according to the plurality of key points of the real object;

determining a mapping relationship between a plurality of key points in the real object and a plurality of key points in the virtual object;

displaying the virtual object in the area of the real object in the video based on the mapping relationship.

2. The method of claim 1, wherein the video comprises a plurality of frames of images, and the displaying the virtual object in the video in the region of the real object based on the mapping relationship comprises:

generating a motion track of each key point of the real object according to the position of each key point of the real object in the multi-frame image;

and displaying the virtual object in the area of the real object in the multi-frame image according to the mapping relation and the motion trail.

3. The method according to claim 2, wherein the position comprises coordinates, and the displaying the virtual object in the area of the real object in the multi-frame image according to the mapping relationship and the motion trajectory comprises:

determining coordinates of each key point of the real object in any frame image of the multi-frame images;

determining the coordinates of each key point of the virtual object in any frame image according to the coordinates of each key point of the real object in any frame image and the mapping relation;

and displaying the virtual object in the area of the real object in the multi-frame image according to the motion trail according to the coordinates of each key point of the virtual object in any frame image.

4. The method according to claim 3, wherein the displaying the virtual object in the area of the real object in the multi-frame image according to the motion trail according to the coordinates of the key points of the virtual object in any one frame image comprises:

and replacing or covering the key points of the real object in the multi-frame images by using the key points of the virtual object according to the motion trail according to the coordinates of the key points of the virtual object in any frame image.

5. The method of claim 1, wherein the video comprises a plurality of frames of images, and the displaying the virtual object in the video in the region of the real object based on the mapping relationship comprises:

determining coordinates of each key point of the real object in each frame image;

determining the coordinates of each key point of the virtual object in each frame image according to the coordinates of each key point of the real object in each frame image and the mapping relation;

and displaying the virtual object in the area of the real object in each frame image according to the coordinates of each key point of the virtual object in each frame image.

6. The method according to claim 5, wherein the displaying the virtual object in the area of the real object in the frame images according to the coordinates of the key points of the virtual object in the frame images comprises:

and replacing or covering the key points of the real object in each frame image by using the key points of the virtual object according to the coordinates of the key points of the virtual object in each frame image.

7. The method of claim 1, wherein the generating of the virtual object of the real object from the respective keypoints of the real object comprises:

and simulating each key point of the real object by using a preset network model to generate a virtual object of the real object.

8. The method of claim 7, wherein the real object comprises a human object, and the obtaining a plurality of keypoints of the real object in the video frame comprises:

acquiring key points of the face of the person object; and

and acquiring key points of the limbs of the human object.

9. The method of claim 8, wherein the preset network model comprises a first submodel and a second submodel; the simulating each key point of the real object by using a preset network model, and the generating of the virtual object of the real object comprises:

simulating the key points of the face by using the first sub-model to obtain the key points of the face of the virtual object;

and simulating the key points of the limbs by using the second submodel to obtain the key points of the limbs of the virtual object.

10. A video processing apparatus comprising:

the acquisition module is used for acquiring a plurality of key points of a real object in a video;

a generating module, configured to generate a virtual object of the real object according to a plurality of key points of the real object;

a determining module, configured to determine a mapping relationship between a plurality of key points in the real object and a plurality of key points in the virtual object;

a display module, configured to display the virtual object in the area of the real object in the video based on the mapping relationship.

11. The apparatus of claim 10, wherein the video comprises a plurality of frames of images, the display module comprising:

the generating unit is used for generating motion tracks of all key points of the real object according to the positions of all key points of the real object in the multi-frame images;

and the first display unit is used for displaying the virtual object in the area of the real object in the multi-frame images according to the mapping relation and the motion trail.

12. The apparatus of claim 11, wherein the first display unit comprises:

a first determining subunit, configured to determine coordinates of each keypoint of the real object in any one of the multiple frame images;

a second determining subunit, configured to determine, according to the coordinates of each key point of the real object in any one of the frame images and the mapping relationship, coordinates of each key point of the virtual object in any one of the frame images;

and the display subunit is configured to display the virtual object in the area of the real object in the multiple frames of images according to the motion trail according to the coordinates of each key point of the virtual object in any frame of image.

13. The apparatus of claim 10, wherein the video comprises a plurality of frames of images, the display module comprising:

a first determining unit configured to determine coordinates of each key point of the real object in each frame image;

a second determining unit, configured to determine, according to the coordinates of each key point of the real object in each frame image and the mapping relationship, coordinates of each key point of the virtual object in each frame image;

and the second display unit is used for displaying the virtual object in the area of the real object in each frame image according to the coordinates of each key point of the virtual object in each frame image.

14. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 9.

15. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 9.

16. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 9.