CN112053315A

CN112053315A - Method and apparatus for processing character image data

Info

Publication number: CN112053315A
Application number: CN202010962811.5A
Authority: CN
Inventors: 胡天舒; 马明明; 何声一; 郭汉奇; 李彤辉; 洪智滨
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-09-14
Filing date: 2020-09-14
Publication date: 2020-12-08

Abstract

The application discloses a method and a device for processing character image data, and relates to the technical field of computer vision and deep learning. The specific implementation mode comprises the following steps: acquiring a character image video of a first character as first data, and acquiring a character image video of a second character as second data, wherein the character image video has a dynamic character including an avatar, and the character image video of the first character has a plurality of dynamic characters; extracting an avatar of a first person and an avatar of a second person; and replacing the head portrait of the first person with the head portrait of the second person by using the head portrait replacement model to obtain a replaced head portrait. The head portrait can have different characteristics of different head portraits after head changing through head changing operation, so that organic combination of the two head portraits is realized. And the first person with rich dynamic image is combined, so that the problem that the second person is insufficient in dynamic image richness can be solved.

Description

Method and apparatus for processing character image data

Technical Field

The application relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and particularly relates to a method and a device for processing character image data.

Background

In the era of digital informatization, virtual replacement entities have become a popular trend. The digital virtual human has great application potential in multiple industries such as entertainment, media, customer service, finance and the like. The digital avatar is a multi-modal character video generation task, and the avatar can be driven by voice or characters.

The modeling of the digital virtual human often needs to record a large amount of data in a specific environment, each time a new image is modeled, a model is invited to personally come to a professional studio, and a professional director is required to guide and finish some specific performances, so that the cost for making the virtual human image is undoubtedly increased.

Disclosure of Invention

A method, an apparatus, an electronic device, and a storage medium for processing character image data are provided.

According to a first aspect, there is provided a method for processing character image data, comprising: acquiring a character image video of a first character as first data, and acquiring a character image video of a second character as second data, wherein the character image video has a dynamic character comprising an avatar, and the character image video of the first character has a plurality of dynamic characters; extracting the head portrait of a first person in the first data and the head portrait of a second person in the second data; and replacing the head portrait of the first person with the head portrait of the second person by using the head portrait replacement model to obtain a replaced head portrait, wherein the replaced head portrait has the head portrait dynamic image of the first person and the head portrait attribute image of the second person.

According to a second aspect, there is provided an apparatus for processing character image data, comprising: an acquisition unit configured to acquire a character image video of a first person as first data and a character image video of a second person as second data, wherein the character image video has a dynamic character including an avatar dynamic character, and the character image video of the first person has a plurality of dynamic characters; an extracting unit configured to extract an avatar of a first person in the first data and an avatar of a second person in the second data; and a replacing unit configured to replace the avatar of the first person with the avatar of the second person using the avatar replacement model to obtain a replaced avatar, wherein the replaced avatar has an avatar of the first person and an avatar attribute avatar of the second person.

According to a third aspect, there is provided an electronic device comprising: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement a method as in any one of the embodiments of the method for processing character image data.

According to a fourth aspect, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as in any one of the embodiments of the method for processing character data.

According to the scheme of the application, the head portrait comprising the dynamic image of the first person and the attribute image of the second person can be obtained through head changing operation, so that the head portrait has different characteristics of different head portraits after head changing, and organic combination of the two head portraits is realized. And the first person with rich dynamic images is combined, so that the problem of insufficient dynamic image richness of the second person can be solved, and the head portrait has the attributes of the second person after head changing and rich head portrait dynamic images.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram to which some embodiments of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for processing character image data according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a method for processing character image data according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method for processing character image data according to the present application;

FIG. 5 is a schematic diagram illustrating the structure of one embodiment of an apparatus for processing character image data according to the present application;

fig. 6 is a block diagram of an electronic device for implementing a method for processing character image data according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the method for processing character image data or the apparatus for processing character image data of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as video applications, live applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103.

Here, the

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the

terminal devices

101, 102, 103. The background server can analyze and process the received data such as the character image video and the like, and feed back the processing result (such as the replaced head portrait) to the terminal equipment.

It should be noted that the method for processing character data provided in the embodiment of the present application may be executed by the server 105 or the

terminal devices

101, 102, 103, and accordingly, the means for processing character data may be disposed in the server 105 or the

terminal devices

101, 102, 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method for processing character image data according to the present application is shown. The method for processing character image data comprises the following steps:

step 201, acquiring a character image video of a first person as first data, and acquiring a character image video of a second person as second data, wherein the character image video has a dynamic character comprising an avatar, and the character image video of the first person has a plurality of dynamic characters.

In the present embodiment, an execution subject (e.g., a server or a terminal device shown in fig. 1) on which the method for processing character image data is executed may acquire a character image video of a first character as first data. Also, the execution subject may acquire a character image video of a second character as the second data. In practice, the character may refer to a real character, or may refer to a virtual character such as a cartoon character. The number of character image videos acquired here about the first person and the second person may each be at least one. The number of genres of the avatar included in the second data may be smaller than the number of genres of the avatar included in the first data, and the number of character image videos of the first person may include a plurality, such as an expression and a lip shape. Plural here means at least two.

The character image video comprises dynamic characters, the character image video of the first character comprises the dynamic characters of the first character, and the character image video of the second character comprises the dynamic characters of the second character. The avatar herein may each include an avatar.

The avatar may refer to a character avatar representing a dynamic, rather than a relaxed, natural state of the character, i.e., an avatar representing a state of movement of a portion of the character, e.g., the avatar may include a character holding both hands high. The image may be a whole body image, or may be a specific local image such as a head or a part below the head, etc.

In step 202, the avatar of the first person in the first data and the avatar of the second person in the second data are extracted.

In this embodiment, the execution subject may extract an avatar of a first person in the first data, and may further extract an avatar of a second person in the second data. The head portrait extracted here refers to a head portrait region in the video. Specifically, the execution subject described above may extract the avatar of the person in various ways. For example, the avatar region in the avatar video is directly extracted from a local or other electronic device as the extracted avatar.

And step 203, replacing the head portrait of the first person with the head portrait of the second person by using the head portrait replacement model to obtain a replaced head portrait, wherein the replaced head portrait has the head portrait dynamic image of the first person and the head portrait attribute image of the second person.

In this embodiment, the execution subject may replace the acquired avatar of the first person with the extracted avatar of the second person using the avatar replacement model, with the result of the replacement being a replaced avatar. The attribute character may refer to a character that can reflect the attribute of a character, that is, a look, which does not vary with the movement of any part of the character, such as five sense organs, a face, hair, and wearing. The avatar attribute images may include five sense organs, face shape, hair (which may include hair color, hairstyle, etc.), headwear, and the like.

The avatar replacement model may be any model that enables avatar replacement, such as generating a confrontation network. Specifically, the avatar replacement model may be used to replace the avatar of the first person with the avatar of the second person such that the replaced avatar includes the avatar of the first person and the attribute avatar of the second person. For example, the changed avatar may include an expression of the first character, and five sense organs, a face, hair, headwear, etc. of the second character. The head portrait replacing model is used for replacing the head portrait to be replaced by the adopted head portrait, so that the obtained head portrait has the head portrait dynamic image of the head portrait to be replaced and the head portrait attribute image of the adopted head portrait.

The method provided by the above embodiment of the application can obtain the head portrait including the dynamic image of the first person and the attribute image of the second person through head changing operation, so that the head portrait has different characteristics of different head portraits after head changing, and organic combination of the two head portraits is realized. And the first person with rich dynamic images is combined, so that the problem of insufficient dynamic image richness of the second person can be solved, and the head portrait has the attributes of the second person after head changing and rich head portrait dynamic images.

In some optional implementations of the embodiment, the person character video of the first person has a duration greater than a duration of the person character video of the second person, the plurality of avatars in the first data include a plurality of necessary avatars required to create the avatar, and the second data lacks at least one of the plurality of necessary avatars included in the at least one of the plurality of necessary avatars for each of the plurality of necessary avatars.

In these alternative implementations, the requisite avatar may include a plurality of avatar avatars, including avatars taken at a plurality of angles, including positive angles. For example, the necessary avatar may include at least two of: the head portrait animation with positive angle and each side angle comprises a plurality of preset expressions and a plurality of preset lips, and in addition, the necessary animation can also comprise various body animations, such as various body actions. The plurality of animated figures of the first data may further include an expressionless avatar, i.e., a necessary attribute figure, required to create the avatar. The second data lacks at least one of the necessary avatars, or lacks at least one of a plurality of necessary avatars included in the necessary avatar for each of the at least one of the necessary avatars, for example, 60 preset expressions included in the necessary avatar, and 50 preset expressions included in the second data.

The avatar in the second data includes an avatar photographed at a positive angle, and the necessary avatar includes an avatar photographed at a plurality of angles at which the positive angle exists. Specifically, only the character photographed at a positive angle may be included in the second data. The positive angle refers to an angle at which the front angle or the angle difference from the front angle is smaller than a preset angle threshold. For example, the preset angle threshold may be 10 degrees.

These implementations may utilize first data comprising the necessary animated image, with a small amount of, less-than-full material of the second person, for the head and face changing operations. The time for acquiring the necessary dynamic image of the second person is saved, and the time and labor cost for manufacturing the virtual image of the second person are reduced.

In some optional application scenarios of these implementations, the method may further include: an avatar is created based on the replaced avatar, wherein the avatar has an avatar of the first person and an avatar attribute avatar of the second person. The head portrait animation in the first data comprises a lip shape and an expression, and the head portrait animation in the second data comprises a lip shape and/or an expression;

in these alternative application scenarios, the execution subject may create an avatar based on the obtained post-replacement avatar. In practice, the execution subject may create the avatar based on the replaced avatar in various ways. For example, the execution body may create an avatar using an avatar of a plurality of necessary avatars.

Specifically, the avatar of the first person in the first data may include a dynamic lip shape and an expression, for example, the lip shape may include an open mouth and a closed mouth. The second data may include only the lip shape, or only the expression, or both. The number of avatar included in the second data may be less than the number of avatar included in the first data.

These application scenarios may utilize the first data comprising the necessary avatar, with a small amount of, less rich material of the second character, to enable creation of an avatar having the avatar of the second character. The time for acquiring the necessary dynamic image for the second person is saved, and the time and labor cost for manufacturing the virtual image of the second person are reduced.

Optionally, the plurality of animated figures in the first data further comprises a body animated figure comprising body movements; creating an avatar based on the replaced avatar, comprising: combining the replaced head portrait with the first data so that the head portrait in the combined first data is the replaced head portrait; creating an avatar using the combined processed first data, wherein the avatar has an avatar of a first person, a body avatar and body attribute avatar, and an avatar attribute avatar of a second person.

Specifically, the execution subject may combine the replaced avatar and the first data, so that the avatar in the first data is the replaced avatar. Specifically, the executing entity may perform the combining process in various manners, for example, the executing entity may fuse the replaced avatar with the first data to obtain a combining process result. In addition, the execution subject may replace the avatar in the first data with the replaced avatar and the avatar replacement model to obtain the combined result. In practice, in the joining process, the execution subject may align the replaced avatar with the avatar in the first data before fusing or replacing the model with the avatar.

The created avatar may have the avatar attribute of the second person, i.e. the appearance of the avatar, while other avatars, such as body appearance, expression, lip, body movement, may be in the first data.

The execution main body can splice the replaced head portrait to the body of the first person, so that the created virtual image not only has the head portrait, but also has the body part of the first person.

In some optional implementations of this embodiment, the avatar replacement model may be trained by: and training the initial avatar replacement model based on the avatar of the first person in the first data and the avatar of the second person in the second data to obtain a trained avatar replacement model.

In these alternative implementations, the executing agent or other electronic device may train the initial avatar replacement model based on the avatar of the first person and the avatar of the second person, resulting in an avatar replacement model that may be applied.

In practice, the executing body or other electronic device described above may train the initial avatar replacement model based on the avatar of the first person and the avatar of the second person in various ways. For example, the executing agent may generate an avatar having an attribute character of the avatar of the second person using a generator in the generation countermeasure network, and recognize whether the generated avatar has the attribute character of the avatar of the second person using a discriminator in the generation countermeasure network. If the recognition result is yes, the generation of the confrontation network is completed, namely the training of the avatar replacement model is completed.

These implementations may be trained on the avatar of the first person and the avatar of the second person such that the trained models generate an avatar having the avatar of the first person and the avatar attribute of the second person.

Optionally, the initial avatar replacement model includes two encoders and two decoders; the training of the initial avatar replacement model based on the avatar of the first person in the first data and the avatar of the second person in the second data may include: the encoder and the two decoders are trained based on the avatar of the first person in the first data and the avatar of the second person in the second data, such that the avatar generated by one of the decoders has the avatar attribute character of the second person.

Specifically, the executing agent or other electronic device may train the encoder and the decoder based on the avatar of the first person in the first data and the avatar of the second person in the second data, so that the avatar generated by one of the decoders (e.g., the first decoder) has the avatar attribute of the second person.

In practice, as an example, the executing agent may train the encoder and the first decoder based on the second data, so that the first decoder generates the avatar having the avatar attribute of the second person. And, the execution subject may train the encoder and the second decoder based on the first data such that the features extracted by the encoder include features of the avatar in the first data. Then, the executing body can train the encoder, the first decoder and the second decoder based on the second data, so that the avatar generated by the first decoder has the avatar attribute image of the second person, and the trained avatar replacement model is obtained.

These alternative implementations may use the encoder and the decoder as models for performing the head-changing operation, so that an accurate avatar-changing model may be obtained after the encoder and the decoder are trained, and an avatar generated by the model has accurate avatar attributes of the second person.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for processing character image data according to the present embodiment. In the application scenario of fig. 3, the execution subject 301 acquires a character image video of a first person, zhang, having a avatar including an avatar, as the first data 302, and a character image video of a second person, lie, having a variety of avatars, as the second data 303. The execution subject 301 extracts the avatar 304 of zhang in the first data 302 and the avatar 305 of lie in the second data 303. The execution main body 301 replaces the zhang-three head portrait with the zhu-four head portrait by using the head portrait replacement model, and obtains a replaced head portrait 306, wherein the replaced head portrait has a zhang-three head portrait dynamic image and a zhu-four head portrait attribute image.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method for processing character image data is shown. The process 400 includes the following steps:

step 401, acquiring a character image video of a first person as first data, and acquiring a character image video of a second person as second data, wherein the character image video of the first person has a plurality of dynamic characters including head portrait dynamic characters, and the character image video of the second person has dynamic characters including head portrait dynamic characters.

In the present embodiment, an execution subject (e.g., a server or a terminal device shown in fig. 1) on which the method for processing character image data is executed may acquire a character image video of a first character as first data. Also, the execution subject may acquire a character image video of a second character as the second data. In practice, the character may refer to a real character, or may refer to a virtual character such as a cartoon character.

Step 402, locating key points of the avatar for the first data and the second data, and aligning the avatar in the first data and the second data based on the key points.

In this embodiment, the execution subject may perform detection of key points of the avatar on the first data and the second data to locate the key points of the avatar, so that the execution subject may perform alignment of the avatar on the first data and the second data based on the key points. For example, the execution subject may use the key point template to achieve alignment, that is, align the key point detected in the first data with the key point template, and align the key point detected in the second data with the key point model, thereby achieving alignment of the avatar in the first data and the second data.

In step 403, image segmentation is performed on the alignment result of the first data to extract the avatar of the first person in the first data, and image segmentation is performed on the alignment result of the second data to extract the avatar of the second person in the second data.

In this embodiment, the execution subject may perform image segmentation on the aligned first data to extract an avatar region of the first person in the first data. And performing image segmentation on the aligned second data to obtain an avatar area of a second person in the second data. In practice, image segmentation may refer to foreground and background separation, or to avatar detection.

And step 404, replacing the head portrait of the first person with the head portrait of the second person by using the head portrait replacement model to obtain a replaced head portrait, wherein the replaced head portrait has the dynamic image of the first person and the head portrait attribute image of the second person.

In this embodiment, the execution subject may replace the acquired avatar of the first person with the acquired avatar of the second person using the avatar replacement model, with the result of the replacement being a replaced avatar. The attribute character may refer to a character capable of reflecting the essential attributes of a character without changing according to the movement of any part of the character, such as five sense organs.

The embodiment can accurately extract the head portrait of the person through alignment and image segmentation.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for processing character image data, the apparatus embodiment corresponds to the method embodiment shown in fig. 2, and the apparatus embodiment may further include the same or corresponding features or effects as the method embodiment shown in fig. 2, except for the features described below. The device can be applied to various electronic equipment.

As shown in fig. 5, the apparatus 500 for processing character image data of the present embodiment includes: an acquisition unit 501, an extraction unit 502, and a replacement unit 503. Wherein the obtaining unit 501 is configured to obtain a character image video of a first person as first data, and obtain a character image video of a second person as second data, wherein the character image video has a character including an avatar, and the character image video of the first person has a plurality of characters; an extracting unit 502 configured to extract an avatar of a first person in the first data and an avatar of a second person in the second data; a replacing unit 503 configured to replace the avatar of the first person with the avatar of the second person using the avatar replacement model to obtain a replaced avatar, wherein the replaced avatar has an avatar of the first person and an avatar attribute avatar of the second person.

In this embodiment, specific processes of the obtaining unit 501, the extracting unit 502, and the replacing unit 503 of the apparatus 500 for processing human image data and technical effects thereof can refer to the related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, respectively, and are not described herein again.

In some optional implementations of the embodiment, the avatar in the first data includes a lip shape and an expression, and the avatar in the second data includes a lip shape and/or an expression; the device still includes: a creating unit configured to create an avatar based on the replaced avatar, wherein the avatar has an avatar of the first person and an avatar attribute avatar of the second person.

In some optional implementations of the embodiment, the plurality of animated figures in the first data further includes a body animated figure, the body animated figure including a body action;

a creating unit further configured to perform creating an avatar based on the replaced avatar as follows: combining the replaced head portrait with the first data so that the head portrait in the combined first data is the replaced head portrait; creating an avatar using the combined processed first data, wherein the avatar has an avatar of a first person, a body avatar and body attribute avatar, and an avatar attribute avatar of a second person.

In some optional implementations of this embodiment, the avatar replacement model is trained by the following steps: and training the initial avatar replacement model based on the avatar of the first person in the first data and the avatar of the second person in the second data to obtain a trained avatar replacement model.

In some optional implementations of this embodiment, the initial avatar replacement model includes an encoder and a decoder, and the number of the decoders is two; training an initial avatar replacement model based on an avatar of a first person in the first data and an avatar of a second person in the second data, comprising: the encoder and the two decoders are trained based on the avatar of the first person in the first data and the avatar of the second person in the second data, such that the avatar generated by one of the decoders has the avatar attribute character of the second person.

In some optional implementations of the embodiment, the extracting unit is further configured to perform extracting the avatar of the first person in the first data and the avatar of the second person in the second data as follows: key points of the head portrait are located for the first data and the second data, and the head portrait is aligned with the first data and the second data based on the key points; image segmentation is performed on the alignment result of the first data to extract the avatar of the first person in the first data, and image segmentation is performed on the alignment result of the second data to extract the avatar of the second person in the second data.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, is a block diagram of an electronic device for a method of processing character image data according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for processing character image data provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the method for processing character data provided by the present application.

The memory 602, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the method for processing character data in the embodiment of the present application (for example, the acquisition unit 501, the extraction unit 502, and the replacement unit 503 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing, i.e., implements the method for processing character image data in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 602.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of an electronic device for processing character image data, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 602 optionally includes memory remotely located from processor 601, which may be connected via a network to an electronic device for processing character image data. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method for processing character image data may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus for processing character image data, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, etc. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, an extraction unit, and a replacement unit. Where the names of these units do not constitute a limitation on the unit itself in some cases, for example, the extraction unit may also be described as a "unit that extracts the avatar of the first person in the first data and the avatar of the second person in the second data".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a character image video of a first character as first data, and acquiring a character image video of a second character as second data, wherein the character image video has a dynamic character comprising an avatar, and the character image video of the first character has a plurality of dynamic characters; extracting the head portrait of a first person in the first data and the head portrait of a second person in the second data; and replacing the head portrait of the first person with the head portrait of the second person by using the head portrait replacement model to obtain a replaced head portrait, wherein the replaced head portrait has the head portrait dynamic image of the first person and the head portrait attribute image of the second person.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for processing character image data, the method comprising:

acquiring a character image video of a first character as first data, and acquiring a character image video of a second character as second data, wherein the character image video has a dynamic character comprising an avatar, and the character image video of the first character has a plurality of dynamic characters;

extracting the avatar of the first person in the first data and the avatar of the second person in the second data;

and replacing the head portrait of the first person with the head portrait of the second person by using the head portrait replacement model to obtain a replaced head portrait, wherein the replaced head portrait has the head portrait dynamic image of the first person and the head portrait attribute image of the second person.

2. The method of claim 1, wherein the character image video of the first person has a duration greater than a duration of the character image video of the second person, the plurality of avatars in the first data include a plurality of necessary avatars required to create the avatar, and the second data lacks at least one of the plurality of necessary avatars included in at least one of the plurality of necessary avatars for each of the plurality of necessary avatars.

3. The method of claim 2, wherein the avatar in the first data includes a lip shape and an expression, and the avatar in the second data includes a lip shape and/or an expression;

the method further comprises the following steps:

creating an avatar based on the replaced avatar, wherein the avatar has an avatar of the first person and an avatar attribute avatar of the second person.

4. The method of claim 3, wherein the plurality of animators in the first data further comprises body animators, the body animators comprising body movements;

creating an avatar based on the replaced avatar, comprising:

combining the replaced head portrait with the first data so that the head portrait in the combined first data is the replaced head portrait;

creating an avatar using the combined processed first data, wherein the avatar has an avatar, body avatar and body attribute avatar of the first person and an avatar attribute avatar of the second person.

5. The method of claim 1, wherein the avatar replacement model is trained by:

and training an initial avatar replacement model based on the avatar of the first person in the first data and the avatar of the second person in the second data to obtain a trained avatar replacement model.

6. The method of claim 4, wherein the initial avatar replacement model includes an encoder and a decoder, the number of decoders being two;

the training of an initial avatar replacement model based on the avatar of the first person in the first data and the avatar of the second person in the second data includes:

and training the encoder and the two decoders based on the first character avatar in the first data and the second character avatar in the second data, so that the avatar generated by one decoder has the avatar attribute character of the second character.

7. The method of claim 1, wherein the extracting the avatar of the first person in the first data and the avatar of the second person in the second data comprises:

locating key points of an avatar for the first data and the second data, and aligning the avatar in the first data and the second data based on the key points;

and performing image segmentation on the alignment result of the first data to extract the avatar of the first person in the first data, and performing image segmentation on the alignment result of the second data to extract the avatar of the second person in the second data.

8. An apparatus for processing character image data, the apparatus comprising:

an acquisition unit configured to acquire a character image video of a first person having a dynamic character including an avatar dynamic character as first data and acquire a character image video of a second person having a plurality of dynamic characters as second data;

an extracting unit configured to extract an avatar of the first person in the first data and an avatar of the second person in the second data;

a replacing unit configured to replace the avatar of the first person with the avatar of the second person using an avatar replacement model to obtain a replaced avatar, wherein the replaced avatar has an avatar of the first person and an avatar attribute avatar of the second person.

9. The apparatus of claim 8, wherein the character image video of the first person has a duration greater than a duration of the character image video of the second person, the plurality of avatars in the first data include a plurality of necessary avatars required to create the avatar, and the second data lacks at least one of the plurality of necessary avatars included in at least one of the plurality of necessary avatars for each of the plurality of necessary avatars.

10. The apparatus of claim 9, wherein the avatar in the first data includes a lip shape and an expression, and the avatar in the second data includes a lip shape and/or an expression;

the device further comprises:

a creating unit configured to create an avatar based on the replaced avatar, wherein the avatar has an avatar of the first person and an avatar attribute of the second person.

11. The apparatus of claim 10, wherein the plurality of animators in the first data further comprises body animators, the body animators comprising body movements;

the creating unit is further configured to perform the creating of the avatar based on the replaced avatar as follows:

12. The apparatus of claim 8, wherein the avatar replacement model is trained by:

13. The apparatus of claim 12, wherein the initial avatar replacement model includes an encoder and a decoder, the number of decoders being two;

14. The apparatus of claim 8, wherein the extraction unit is further configured to perform the extracting of the avatar of the first person in the first data and the avatar of the second person in the second data as follows:

15. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

16. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-7.