CN112188234B - Image processing and live broadcasting method and related devices - Google Patents

Image processing and live broadcasting method and related devices Download PDF

Info

Publication number
CN112188234B
CN112188234B CN201910593748.XA CN201910593748A CN112188234B CN 112188234 B CN112188234 B CN 112188234B CN 201910593748 A CN201910593748 A CN 201910593748A CN 112188234 B CN112188234 B CN 112188234B
Authority
CN
China
Prior art keywords
attribute
image data
face
original
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910593748.XA
Other languages
Chinese (zh)
Other versions
CN112188234A (en
Inventor
王文斓
刘炉
任高生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huya Technology Co Ltd
Original Assignee
Guangzhou Huya Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huya Technology Co Ltd filed Critical Guangzhou Huya Technology Co Ltd
Priority to CN201910593748.XA priority Critical patent/CN112188234B/en
Publication of CN112188234A publication Critical patent/CN112188234A/en
Application granted granted Critical
Publication of CN112188234B publication Critical patent/CN112188234B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display

Abstract

The embodiment of the invention discloses an image processing method, a live broadcasting method and a related device. The image processing method comprises the steps of obtaining original image data, wherein the original image data has a character image; extracting original attribute features from the original image data, wherein the original attribute features are features representing the human face attributes of the human figure image; adjusting the original attribute characteristics to obtain target attribute characteristics; and performing image reconstruction processing by using the target attribute characteristics to obtain target image data with the adjusted face attributes, solving the problem that effects caused by different adjustment sequences influence each other when the face attributes are adjusted by using different image processing modes, realizing simultaneous change of the face attributes of the original image data so as to adjust the figure images in the original image data, and further reducing the development difficulty and development time of developers.

Description

Image processing and live broadcasting method and related device
Technical Field
The embodiment of the invention relates to a live broadcast technology, in particular to an image processing method, a live broadcast method and a related device.
Background
In the live broadcast platform, the anchor user can use a face processing tool provided by the live broadcast platform to perform image processing on a face image in a live broadcast video stream corresponding to a program when the program is live broadcast so as to adjust the character image of the anchor user in the live broadcast video stream. Specifically, the human figure may be represented by facial attributes, such as gender, age, identity, nose shape, eye shape, mouth shape, hair style, hair color, skin color, nationality, and the like. Further, the character image is adjusted by adjusting attributes of the face, such as increasing/decreasing/changing gender, getting older or younger, increasing or decreasing pouches, big or small eyes, getting fat or thin, increasing or decreasing makeup, and the like.
Generally, for each kind of face attribute change, an image processing tool corresponding to the face attribute needs to be developed. When the face image needs to be simultaneously changed by various face attributes, image processing tools corresponding to the various face attributes need to be used in sequence. On one hand, each image processing mode has relevance, and the change effect is easily influenced due to different use orders; on the other hand, each image processing mode needs to be developed independently, so that the development difficulty and the development time are increased.
Disclosure of Invention
The invention provides an image processing and live broadcasting method and a related device, which are used for simultaneously changing the face attribute of original image data and reducing the development difficulty and development time of developers.
In a first aspect, an embodiment of the present invention provides an image processing method, where the method includes:
acquiring original image data, wherein the original image data has a character image;
extracting original attribute features from the original image data, wherein the original attribute features are features representing the human face attributes of the human figure image;
adjusting the original attribute characteristics to obtain target attribute characteristics;
and performing image reconstruction processing by using the target attribute characteristics to obtain target image data with the adjusted human face attribute.
Further, the method is based on a face generator and a face encoder, the face generator comprising: decoding the network;
the extracting of the original attribute features from the original image data includes:
inputting the original image data into the face encoder for encoding processing, and outputting an original face code;
and inputting the original face code into the decoding network for decoding processing, and outputting original attribute characteristics including at least one face attribute.
Further, the face generator further comprises an image reconstruction network connected with the decoding network, wherein the image reconstruction network is used for inputting a plurality of groups of features for representing the attributes of the face so as to reconstruct image data;
the adjusting the original attribute characteristics to obtain target attribute characteristics includes:
receiving a first user operation;
determining reference image data including a face image selected by the first user operation;
extracting features representing the attributes of the human face from the reference image data to serve as reference attribute features;
taking the original attribute features of a preset group number as target attribute features of the image reconstruction network;
and replacing the original attribute features belonging to the same group with the reference attribute features in the target attribute features.
Further, the adjusting the original attribute characteristics to obtain the target attribute characteristics includes:
receiving a second user operation;
determining an adjustment amplitude selected by the second user operation and related to a target face attribute, wherein the target face attribute is a face attribute to be adjusted in the original image data;
determining an adjustment direction associated with the target face attribute;
and moving the original attribute feature by the adjustment amplitude along the adjustment direction to obtain the target attribute feature of the adjusted face attribute.
Further, the determining an adjustment direction associated with the target face attribute includes:
acquiring sample image data, wherein the sample image data is marked with a first attribute vector and a second attribute vector related to the attributes of the target face;
extracting sample attribute features related to the target human face attribute from the sample image data;
determining a hyperplane in a feature space formed by the attribute features, wherein the hyperplane divides the feature space into a first feature space and a second feature space, the first feature space comprises the attribute features of the image data marked as a first attribute vector, and the second feature space comprises the attribute features of the image data marked as a second attribute vector;
and taking the normal vector of the hyperplane as an adjusting direction associated with the target face attribute.
Further, the acquiring sample image data includes:
randomly generating at least one face code;
respectively inputting the face codes into the face generator to carry out image reconstruction processing to obtain at least one sample image data;
determining a preset classification model for identifying the target face attribute, wherein the target face attribute is characterized by using a first attribute vector or a second attribute vector;
inputting the sample image data into the classification model for identification processing, and determining the probability that the sample image data belongs to a first attribute vector or a second attribute vector;
and screening out the first n sample image data of the probability.
In another embodiment, the determining the adjustment direction associated with the target face attribute includes:
acquiring first reference image data and second reference image data, wherein the second reference image data is image data obtained by changing the attribute of a target face of the first reference image data;
extracting a first reference attribute feature from the first reference image data;
extracting a second reference attribute feature from the second reference image data;
and taking the difference value of the second reference attribute characteristic and the first reference attribute characteristic as an adjusting direction.
Furthermore, the image reconstruction network comprises a plurality of characteristic reconstruction layers which are sequentially connected;
the image reconstruction processing using the target attribute features to obtain target image data with the adjusted face attribute, including:
aiming at a feature reconstruction layer of the current layer, receiving a candidate image output by a feature reconstruction layer of the previous layer;
receiving target attribute characteristics to be input into a characteristic reconstruction layer of the current layer;
in a feature reconstruction layer of the layer, reconstructing the candidate image by using the target attribute feature to obtain a feature image;
if the feature reconstruction layer of the layer is the last layer, outputting the feature image as target image data;
and if the characteristic reconstruction layer of the layer is not the last layer, outputting the characteristic image as a new candidate image.
In a second aspect, an embodiment of the present invention further provides a live broadcasting method, where the live broadcasting method includes:
acquiring original video data, wherein the original video data comprises original image data, and anchor users are arranged in the original image data;
extracting original attribute features from each frame of original image data, wherein the original attribute features are features representing the face attributes of the anchor users;
adjusting the original attribute characteristics to obtain target attribute characteristics;
performing image reconstruction processing by using each frame of target attribute feature to obtain target image data with the adjusted face attribute;
and playing the target video data with the target image data.
Further, the playing back is performed by target video data having the target image data, including:
processing the target image data of two adjacent frames to obtain new target image data;
and playing the target video data with the new target image data.
In a third aspect, an embodiment of the present invention further provides an image processing apparatus, including:
the system comprises an original image data acquisition module, a character image acquisition module and a character image acquisition module, wherein the original image data acquisition module is used for acquiring original image data which has a character image;
the original attribute feature extraction module is used for extracting original attribute features from the original image data, wherein the original attribute features are features representing the human face attributes of the human figure image;
the adjusting module is used for adjusting the original attribute characteristics to obtain target attribute characteristics;
and the reconstruction module is used for carrying out image reconstruction processing by using the target attribute characteristics to obtain target image data with the adjusted face attribute.
In a fourth aspect, an embodiment of the present invention further provides a live broadcast apparatus, where the apparatus includes:
the system comprises a collecting module, a processing module and a display module, wherein the collecting module is used for collecting original video data, the original video data comprises original image data, and the original image data comprises a main broadcasting user;
the characteristic extraction module is used for extracting original attribute characteristics from each frame of original image data, wherein the original attribute characteristics are characteristics representing the face attributes of the anchor user;
the attribute adjusting module is used for adjusting the original attribute characteristics to obtain target attribute characteristics;
the image reconstruction module is used for carrying out image reconstruction processing by using each frame of target attribute feature to obtain target image data with the adjusted face attribute;
and the playing module is used for playing the target video data with the target image data.
In a fifth aspect, an embodiment of the present invention further provides an image processing apparatus, including: a memory and one or more processors;
the memory to store one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the image processing method of any one of the first aspects.
In a sixth aspect, an embodiment of the present invention further provides a live broadcast device, where the live broadcast device includes: a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a live method as in any one of the second aspects.
In a seventh aspect, an embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are used to execute the image processing method according to any one of the first aspect when executed by a computer processor.
In an eighth aspect, the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions, when executed by a computer processor, are used to perform the live broadcast method according to any one of the second aspects.
The method comprises the steps of obtaining original image data, wherein the original image data has a character image; extracting original attribute features from the original image data, wherein the original attribute features are features representing the face attributes of the figure image; adjusting the original attribute characteristics to obtain target attribute characteristics; and performing image reconstruction processing by using the target attribute characteristics to obtain target image data with the adjusted face attribute, solving the problem that effects caused by different adjustment sequences influence each other when the face attribute is adjusted by using different image processing modes, realizing the purpose of simultaneously changing the face attribute of the original image data so as to adjust the character image in the original image data, and further reducing the development difficulty and development time of developers.
Drawings
Fig. 1A is a flowchart of an image processing method according to an embodiment of the present invention;
fig. 1B is a schematic structural diagram of a face encoder according to an embodiment of the present invention;
fig. 1C is a schematic structural diagram of a face generator according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for adjusting original attribute characteristics according to a second embodiment of the present invention;
fig. 3 is a flowchart of a sub-method for adjusting original attribute characteristics according to a third embodiment of the present invention;
fig. 4 is a flowchart of a live broadcasting method according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an image processing apparatus according to a fifth embodiment of the present invention;
fig. 6 is a schematic structural diagram of a live broadcasting device according to a sixth embodiment of the present invention;
fig. 7 is a schematic structural diagram of an image processing apparatus according to a seventh embodiment of the present invention;
fig. 8 is a schematic structural diagram of a live broadcast device according to an eighth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1A is a flowchart of an image processing method according to an embodiment of the present invention, fig. 1B is a schematic structural diagram of a face encoder according to an embodiment of the present invention, and fig. 1C is a schematic structural diagram of a face generator according to an embodiment of the present invention. The present embodiment is applicable to a case where a character in image data is changed using an image processing technique, specifically, for changing a character attribute of the character. Referring to fig. 1A, the method may be performed by an image processing apparatus, which may be a computer, a server, a mobile terminal, or the like.
In this embodiment, in an application scenario in which the character attribute of the character image is changed, an area of the face data may be determined from the initial image data by inputting the initial image data, detecting a face key point for the face data in the area, and aligning the face data according to the face key point. Further, the aligned face data may be used as original face data, and the image processing method provided in this embodiment is used to perform adjustment, so as to obtain target image data with a modified person attribute. The target image data may be aligned with the face data in the initial image data according to the face key points, and the face data in the initial image data may be replaced with the target image data.
Referring to fig. 1B-1C, the method may be based on a face encoder and a face generator. The face encoder is used for encoding input image data to obtain image codes. The face generator is used for processing the input image code and regenerating image data.
The face encoder 10 includes an image decomposition network 11 and an encoding network 12 connected to each other. The image decomposition network 11 is configured to perform feature extraction processing on input image data x to obtain an attribute feature w, where the attribute feature w is a feature of a face attribute of a character image. The coding network 12 is configured to perform coding processing on the input attribute features w to obtain a face code z; in the face encoder 10 shown in fig. 1B, the image decomposition Network 11 may be implemented by using a Convolutional Neural Network (CNN), and the encoding Network 12 may be implemented by using a Full Connected Layer (FC).
Wherein the face generator 20 comprises: a decoding network 21 and an image reconstruction network 22 connected. The decoding network 21 can decode the input face code z to obtain the attribute feature w'. The image reconstruction network 22 is used to input a plurality of sets of features for characterizing attributes of a human face, namely attribute features w ', to reconstruct image data x'. The image reconstruction network 22 includes a plurality of feature reconstruction layers connected in sequence, each of the feature reconstruction layers is connected to a neuron node, and is configured to input a set of attribute features w 'related to face features, which correspond to each other, and the last feature reconstruction layer is configured to output image data x' having a plurality of sets of the face attributes. In the face generator 20 shown in fig. 1C, the decoding network 21 can be implemented by using a full connection layer, and the image reconstruction network 22 can be implemented by using a convolutional neural network.
It should be noted that when the face encoder 10 is used together with the face generator 20, x is the image data of the input character image, z is the output face code, and the attribute feature w is an attribute feature located in the middle of the face encoder 10. The attribute features w in the face encoder 10 and the attribute features w' in the face generator 20 have a certain degree of decoupling.
It should be noted that the encoding network 12 and the decoding network 21 are optional modules, and when the face encoder includes only the image decomposition network 11 and the face generator 20 includes only the image reconstruction network 22, z = w and w' = z.
The method specifically comprises the following steps:
s110, obtaining original image data, wherein the original image data has a character image.
In this embodiment, the original image data has a human image. The character image may be represented by facial attributes, such as gender, age, identity, nose shape, eye shape, mouth shape, hair style, hair color, skin color, nationality, and the like.
In this embodiment, the original image data includes face data corresponding to the human figure.
In one embodiment, an area of face data may be determined from initial image data by inputting the initial image data, performing face keypoint detection on the face data in the area, and aligning the face data according to the face keypoint. Further, the aligned face data may be used as the original face data.
And S120, extracting original attribute features from the original image data, wherein the original attribute features are features representing the human face attributes of the human figure image.
In this embodiment, the face attributes of the character image may be characterized by using attribute features. For example, when attribute features are represented by vectors, the vectors may be used to represent a plurality of facial features, each bit in the vector may represent a facial attribute separately, or the entire vector may be used to determine a facial attribute.
In this embodiment, the attribute features of the human figure extracted from the original image data are original attribute features.
In one embodiment, the original face code can be output by inputting the original image data into a face coder for coding; and inputting the original face code into a decoding network for decoding, and outputting original attribute features comprising at least one face attribute.
And S130, adjusting the original attribute characteristics to obtain target attribute characteristics.
In this embodiment, the adjustment of the attributes of the face of the human figure may be achieved by adjusting the attributes of the human figure, such as increasing/decreasing/changing gender, aging or becoming younger, increasing or decreasing eye bags, big or small eyes, increasing or thinning, increasing or decreasing makeup, and the like.
When adjusting the face attributes of the human image, a method of adjusting the face code is usually adopted. Since in the distribution space of the face code z, if the sampling space of z is a fixed-shape distribution (for example, a normal distribution with a mean value of 0 and a standard deviation of 1), the attribute features represented by the face code z are easily coupled together. This makes the adjustment effect of the face code adjustment method uncontrollable. The encoding network 12 and the decoding network 21 can be used to obtain the attribute characteristics w and w', respectively.
In contrast, due to the attribute feature w in the face encoder 10 and the attribute feature w' in the face generator 20, the coupling state of the attribute feature in the original space z can be dissociated to some extent. That is, the original attribute features are different from the original face codes, and the original attribute features are decoupled to some extent. To illustrate the nature of the clear decoupling, further, in the attribute feature space constructed with attribute features w. Due to the decoupled nature, a direction vector can be determined for each face attribute in the attribute feature space. The direction vector can be used as an adjusting direction corresponding to the human face attribute, and when the position of the original attribute feature in the attribute feature space is adjusted along the adjusting direction, the human face attribute of the human figure can be correspondingly adjusted. This allows the effect of the adjustment in the manner of adjusting the property feature w' (i.e. the original property feature) to be mastered.
And S140, carrying out image reconstruction processing by using the target attribute characteristics to obtain target image data with the adjusted face attribute.
In this embodiment, the image reconstruction processing may be performed on the target attribute feature using the image reconstruction network described above. Specifically, referring to the structure of the image reconstruction network shown in fig. 1C, a candidate image output by a feature reconstruction layer of a previous layer may be received by the feature reconstruction layer of the current layer; receiving target attribute characteristics to be input into a characteristic reconstruction layer of the current layer; in a feature reconstruction layer of the layer, reconstructing the candidate image by using the target attribute feature to obtain a feature image; if the feature reconstruction layer of the layer is the last layer, outputting the feature image as target image data; and if the feature reconstruction layer of the layer is not the last layer, outputting the feature image as a new candidate image.
In an embodiment, corresponding to the case that the original image data is the face data in the original image data, the target image data may be aligned with the face data in the original image data according to the face key point, and the face data in the original image data is replaced with the target image data.
According to the technical scheme of the embodiment, original image data is obtained, and the original image data has a character image; extracting original attribute features from the original image data, wherein the original attribute features are features representing the face attributes of the figure image; adjusting the original attribute characteristics to obtain target attribute characteristics; the technical scheme adopts the original attribute characteristic with decoupling property to adjust, solves the problem that the effects caused by different adjusting sequences influence each other when different image processing modes are used for adjusting the face attributes, realizes the simultaneous change of the face attributes of the original image data, adjusts the character image in the original image data, further reduces the development difficulty and development time of developers, and can determine the adjusting effect.
Example two
Fig. 2 is a flowchart of a method for adjusting original attribute characteristics according to a second embodiment of the present invention. The present embodiment further refines the above embodiments, and describes in detail the adjustment sub-method of the original attribute feature in the adjustment mode of the reference image. In this embodiment, the target attribute feature may be from reference image data in addition to the original attribute feature.
In the adjustment manner of the reference image, referring to fig. 2, step S130 may be further refined into steps S210-S250:
s210, receiving a first user operation.
In this embodiment, a selection page may be set, and reference image data may be imported or selected through the selection page. The first user operation is a user operation on the selected page. The reference image data may be image data including a face image imported by a user, or may be randomly generated image data including a face image.
S220, determining reference image data including a face image selected by the first user operation;
in this embodiment, the reference image data acted on by the first user operation is determined as the selected reference image data.
In general, the user can select the reference image data according to personal preference so that the finally reconstructed target image data has the attribute of the face in the reference image data. For example, the user may select the reference image data having the attributes of the human face such as the large eye, the olecranon nose, the sharp chin, and the like, so that the human image in the finally reconstructed target image data also has the attributes of the human face such as the large eye, the olecranon nose, the sharp chin, and the like of the reference image data; one or more facial attributes (e.g., age, gender, appeal, fatness, etc.) may also be adjusted according to the user's preferences based on the user's own face, and the degree and direction of adjustment (e.g., age with 2 directions of aging and younger age) may be controlled by the user. The two adjustment modes of the human face attributes can be performed simultaneously.
And S230, extracting the characteristic of the attribute of the face from the reference image data as a reference attribute characteristic.
In this embodiment, reference image data may be input to a face encoder to perform encoding processing, and a reference face code may be output; and inputting the reference face code into a decoding network for decoding processing, and outputting a reference attribute feature comprising at least one face attribute.
S240, taking the original attribute features of the preset group number as target attribute features of the image reconstruction network.
In this embodiment, the target attribute feature is used to perform image reconstruction processing to obtain target image data.
In this embodiment, a plurality of sets of target attribute features may be input to an image reconstruction network to perform image reconstruction processing, so as to output target image data.
Specifically, referring to fig. 1C, the image reconstruction network 22 includes a plurality of feature reconstruction layers connected in sequence, each feature reconstruction layer is connected to a neuron node and is configured to input a set of target attribute features related to the face features corresponding to each feature reconstruction layer, and the last feature reconstruction layer is configured to output target image data having the face attributes represented by the target attribute features.
In this embodiment, each set of original attribute features is input into the corresponding feature reconstruction layer.
And S250, replacing the original attribute features belonging to the same group as the reference attribute features in the target attribute features with the reference attribute features.
In this embodiment, the target attribute feature may be from the original attribute feature or from the reference image data, so that the target image data obtained by image reconstruction processing has the face attribute in the reference image data.
Furthermore, in the training process of the face generator, the face attribute aimed at by each layer of feature reconstruction layer can be determined, for example, one layer of feature reconstruction layer is mainly used for reconstructing the face contour, and the other layer of feature reconstruction layer is mainly used for reconstructing the gender; a feature reconstruction layer is mainly used for reconstructing face details; and the feature reconstruction layer is mainly used for reconstructing the national facial relations and the like.
Because the reference attribute features can also be used for representing various face attributes, the face attribute to be adjusted can be used as a target attribute; further, the reference attribute feature may be input to a feature reconstruction layer corresponding to the target attribute, that is, a group corresponding to the reference attribute feature may be determined, and if the first layer feature reconstruction layer is input, the reference attribute feature belongs to the first group. Further, in the target attribute feature constructed using the original attribute features, the original attribute features belonging to the same group as the reference attribute features are replaced with the reference attribute features.
EXAMPLE III
Fig. 3 is a flowchart of a method for adjusting original attribute characteristics according to a third embodiment of the present invention. The present embodiment is further refined on the basis of the above-mentioned embodiments, and a detailed description is given to an adjustment sub-method of the original attribute feature in a manner of determining the adjustment direction.
In the way of determining the adjustment direction, referring to fig. 3, step S130 may be further refined to steps S310-S340:
and S310, receiving a second user operation.
In this embodiment, an attribute selection page may be set, and the attribute of the face to be adjusted may be determined by using the attribute selection page. The second user operation is a user operation for the property selection page.
And S320, determining the adjustment amplitude of the target face attribute selected by the second user operation, wherein the target face attribute is the face attribute to be adjusted in the original image data.
In this embodiment, the adjustment of the attributes of the face of the human figure may be achieved by adjusting the attributes of the human figure, such as increasing/decreasing/changing gender, aging or becoming younger, increasing or decreasing eye bags, big or small eyes, increasing or thinning, increasing or decreasing makeup, and the like. The adjustment amplitude in this embodiment may be used to indicate the degree of adjusting the attribute of the target face. If the target face attribute is adjusted to make the character image become old or young, the higher the adjustment range corresponding to the adjustment direction toward the old one is, the older the adjusted character image is.
And S330, determining an adjusting direction associated with the target face attribute.
In this embodiment, due to the attribute feature w in the face encoder 10 and the attribute feature w' in the face generator 20, the coupling state of the attribute feature in the original space z can be dissociated to a certain extent. That is, the original attribute features are different from the original face code, and the original attribute features are decoupled to some extent.
In one implementation, sample image data may be tagged with a first attribute vector and a second attribute vector relating to attributes of a target face by obtaining the sample image data. If the target face attribute is gender, the first attribute vector can represent males, and the second attribute vector can represent females; if the target face attribute is eye type, the first attribute vector can represent a large eye, the second attribute vector can represent a small eye, and the distinction between the large eye and the small eye can be defined by using a preset standard eye. Extracting sample attribute features related to the attributes of the target human face from the sample image data; determining a hyperplane in a feature space formed by the attribute features; and taking the normal vector of the hyperplane as an adjusting direction associated with the target face attribute. The hyperplane divides the feature space into a first feature space and a second feature space, wherein the first feature space comprises the attribute features of the image data marked as the first attribute vector, and the second feature space comprises the attribute features of the image data marked as the second attribute vector. For example, a support vector machine may be obtained by training using the sample image data, and a normal vector of a hyperplane of the support vector machine is an adjustment direction.
Further, at least one face code can be generated randomly; respectively inputting the face codes into a face generator to carry out image reconstruction processing to obtain at least one sample image data; determining a preset classification model for identifying the target face attribute, wherein the target face attribute is characterized by using a first attribute vector or a second attribute vector; inputting sample image data into a classification model for identification processing, and determining the probability that the sample image data belongs to a first attribute vector or a second attribute vector; the first n sample image data of the probability are screened out, so that the attribute features of the sample image data have obvious separation effect in the attribute feature space, the attribute features of the image data marked as the second attribute vector are not included in the first feature space, and the attribute features of the image data marked as the first attribute vector are included in the second feature space. Thereby, the accuracy of determining the adjustment direction can be improved.
In yet another embodiment, the first reference image data and the second reference image data may be acquired, the second reference image data being image data obtained by modifying the attribute of the target face of the first reference image data; extracting a first reference attribute feature from the first reference image data; extracting a second reference attribute feature from the second reference image data; and taking the difference value of the second reference attribute characteristic and the first reference attribute characteristic as the adjustment direction. For example, 2 pictures may be taken, where the 2 pictures only have different attributes of a certain face, and then the two pictures are respectively input to the encoder to calculate the attribute features, where a difference between the two attribute features is an adjustment direction of the attribute of the face (for example, an original face passing through a thin face, and a difference between the attribute features of the 2 pictures of the face is an adjustment direction of the thin face).
And S340, moving the original attribute features along the adjusting direction by the adjusting amplitude to obtain target attribute features of the adjusted face attributes.
For example, the target attribute feature may be expressed as: w is a new '= w' + α Δ w, where w new 'is target attribute characteristic, w' is original attribute characteristic, alpha is adjustment amplitude, and delta w is characterizationThe adjustment direction of (3).
Example four
Fig. 4 is a flowchart of a live broadcasting method according to a fourth embodiment of the present invention. The embodiment is applicable to a case where the character image of the anchor user in the live video is changed by using an image processing technology, and in particular, is used for changing the character attribute of the anchor user. The method can be executed by a live device, and the live device can be a computer, a server, a mobile terminal and the like.
Referring to fig. 4, the method specifically includes the following steps:
s410, collecting original video data, wherein the original video data comprise original image data, and anchor users are arranged in the original image data.
In this embodiment, initial image data may be extracted from the original video data, and further, an area of face data of the anchor user is determined from the initial image data, face key points are detected for the face data in the area, and the face data is aligned according to the face key points. Further, the aligned face data may be used as the original face data.
And S420, extracting original attribute features from each frame of original image data, wherein the original attribute features are features representing the face attributes of the anchor users.
In this embodiment, the face attribute of the anchor user may be characterized using the attribute feature. For example, when attribute features are represented by a vector, the vector may be used to represent a plurality of face features, each bit in the vector may represent a face attribute separately, or, of course, the entire vector may be used to determine a face attribute.
In this embodiment, the attribute feature of the anchor user extracted from the original image data is an original attribute feature.
In one embodiment, the original face code can be output by inputting the original image data into a face encoder for encoding; and inputting the original face code into a decoding network for decoding, and outputting original attribute features comprising at least one face attribute.
And S430, adjusting the original attribute characteristics to obtain target attribute characteristics.
In this embodiment, for example, the attribute of the face of the anchor user may be adjusted by adjusting the attribute characteristics of the anchor user, such as increasing/decreasing/changing the gender, aging or younger, increasing or decreasing the pouch, big or small eyes, increasing or thinning, increasing or decreasing the makeup, and the like.
When the face attribute of the anchor user is adjusted, a face coding adjustment mode is usually adopted. Since the attribute features represented using face code z are easily coupled together in the distribution space of face code z. This makes the adjustment effect of the face code adjustment method uncontrollable.
In contrast, due to the attribute feature w in the face encoder 10 and the attribute feature w' in the face generator 20, the coupling state of the attribute feature in the original space z can be dissociated to some extent. That is, the original attribute features are different from the original face code, and the original attribute features are decoupled to some extent. To illustrate the nature of the clear decoupling, further, in the attribute feature space constructed with attribute features w. Due to the decoupled nature, the directional vector for each face attribute can be determined in the attribute feature space. The direction vector can be used as an adjustment direction corresponding to the face attribute, and when the position of the original attribute feature in the attribute feature space is adjusted along the adjustment direction, the face attribute of the anchor user can be correspondingly adjusted. This allows the effect of the adjustment in the manner of adjusting the property feature w' (i.e. the original property feature) to be mastered.
In this embodiment, the adjustment method of the original attribute features has two forms: the adjustment mode of the reference image and the mode of determining the adjustment direction.
1. Reference image adjusting mode
In this embodiment, the first user operation may be received; determining reference image data including a face image selected by a first user operation; extracting features representing the attributes of the human face from the reference image data to serve as reference attribute features; taking the original attribute features of the preset group number as target attribute features of an input image reconstruction network; in the target attribute feature, the original attribute feature belonging to the same group as the reference attribute feature is replaced by the reference attribute feature.
2. Means for determining the direction of adjustment
In this embodiment, the second user operation may be received; determining the adjustment range of the target face attribute selected by the second user operation, wherein the target face attribute is the face attribute to be adjusted in the original image data; determining an adjustment direction associated with the target face attribute; and moving the original attribute characteristics along the adjusting direction to adjust the amplitude to obtain the target attribute characteristics of the adjusted face attributes.
S440, image reconstruction processing is carried out by using each frame of target attribute feature, and target image data with the adjusted face attribute are obtained.
In this embodiment, the image reconstruction processing may be performed on the target attribute feature using the image reconstruction network described above. Specifically, referring to the structure of the image reconstruction network shown in fig. 1C, a candidate image output by a feature reconstruction layer of a previous layer may be received by the feature reconstruction layer of the current layer; receiving target attribute characteristics to be input into a characteristic reconstruction layer of the current layer; in a feature reconstruction layer of the layer, reconstructing the candidate image by using the target attribute feature to obtain a feature image; if the feature reconstruction layer of the layer is the last layer, outputting the feature image as target image data; and if the feature reconstruction layer of the layer is not the last layer, outputting the feature image as a new candidate image.
In an embodiment, corresponding to the case that the original image data is the face data in the original image data, the target image data may be aligned with the face data in the original image data according to the key point of the face, and the face data in the original image data may be replaced by the target image data.
S450, playing the target video data with the target image data.
In this embodiment, new target image data is obtained by processing the target image data of two adjacent frames; and playing the target video data with the new target image data.
In this embodiment, the processing may include alignment processing, fusion processing, smoothing processing, or the like.
In this embodiment, the correspondence processing is used to detect face key points in the target image data, and further, to align the face key points in the target image data of two adjacent frames. Generally, the detected face key points may be provided with numbers, it may be determined that the face key points with the same number in the target image data of two adjacent frames have a corresponding relationship, and the two face key points with the corresponding relationship are aligned.
The smoothing process may stretch the face key points of the current frame of target image data according to the face key points of the previous frame of target image data, and may be represented as:
d t =αd t +(1-β)d t-1
wherein dt is a face key point of the previous frame target image data, dt-1 is a face key point of the target image data, and beta is a hyper-parameter. After this operation, the face attribute adjustment effect of the anchor user between frames becomes smoother.
Further, in this embodiment, the fusion processing is used to combine multiple frames of target image data to obtain target video data, and then the target video data is obtained by combining initial image data that has replaced face data.
According to the technical scheme of the embodiment, original video data are collected, wherein the original video data comprise original image data, and anchor users are arranged in the original image data; extracting original attribute features from each frame of original image data, wherein the original attribute features are features representing the face attributes of the anchor users; adjusting the original attribute characteristics to obtain target attribute characteristics; carrying out image reconstruction processing by using each frame of target attribute feature to obtain target image data with the adjusted face attribute; the technical scheme adopts the original attribute characteristic with decoupling property to adjust, solves the problem that the effects brought by different adjusting sequences influence each other when different image processing modes are used for adjusting the face attributes, realizes the simultaneous change of the face attributes of the original image data so as to adjust the figure images in the original image data, further reduces the development difficulty and development time of developers, and can determine the adjusting effect. Furthermore, by performing smoothing processing on the target image data of two adjacent frames, the stability of the target image data of two adjacent frames in the playing process can be ensured, and the situation of jitter is avoided.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an image processing apparatus according to a fifth embodiment of the present invention.
Referring to fig. 5, the image processing apparatus specifically includes the following structure: an original image data acquisition module 510, an original attribute feature extraction module 520, an adjustment module 530, and a reconstruction module 540.
An original image data obtaining module 510, configured to obtain original image data, where the original image data has a human image.
An original attribute feature extraction module 520, configured to extract an original attribute feature from the original image data, where the original attribute feature is a feature that represents a face attribute of the human image.
An adjusting module 530, configured to adjust the original attribute feature to obtain a target attribute feature.
A reconstructing module 540, configured to perform image reconstruction processing using the target attribute feature, to obtain target image data with the adjusted face attribute.
According to the technical scheme of the embodiment, original image data is obtained, and the original image data has a character image; extracting original attribute features from the original image data, wherein the original attribute features are features representing the human face attributes of the human figure image; adjusting the original attribute characteristics to obtain target attribute characteristics; the technical scheme adopts the original attribute characteristic with decoupling property to adjust, solves the problem that the effects caused by different adjusting sequences influence each other when different image processing modes are used for adjusting the face attributes, realizes the simultaneous change of the face attributes of the original image data, adjusts the character image in the original image data, further reduces the development difficulty and development time of developers, and can determine the adjusting effect.
On the basis of the technical scheme, the method is based on a face generator and a face encoder, wherein the face generator comprises: decoding the network; the original attribute feature extraction module 520 includes:
and the coding unit is used for inputting the original image data into the face coder for coding and outputting an original face code.
And the decoding unit is used for inputting the original face code into the decoding network for decoding processing and outputting original attribute characteristics comprising at least one face attribute.
On the basis of the technical scheme, the face generator further comprises an image reconstruction network connected with the decoding network, and the image reconstruction network is used for inputting a plurality of groups of characteristics for representing the attributes of the face so as to reconstruct image data; an adjustment module 530, comprising:
the first user operation receiving unit is used for receiving a first user operation.
And the reference image data determining unit is used for determining the reference image data which is selected by the first user operation and comprises a face image.
And the reference attribute feature extraction unit is used for extracting features representing the attributes of the human face from the reference image data as reference attribute features.
And the target attribute characteristic initial unit is used for taking the original attribute characteristics of a preset group number as the target attribute characteristics of the image reconstruction network.
And the replacing unit is used for replacing the original attribute features belonging to the same group as the reference attribute features in the target attribute features with the reference attribute features.
On the basis of the above technical solution, the adjusting module 530 includes:
and the second user operation receiving unit is used for receiving a second user operation.
And the target face attribute determining unit is used for determining the adjustment amplitude which is selected by the second user operation and is related to the target face attribute, wherein the target face attribute is the face attribute to be adjusted and is contained in the original image data.
And the adjusting direction determining unit is used for determining the adjusting direction associated with the target face attribute.
And the target attribute characteristic determining unit is used for moving the original attribute characteristic along the adjusting direction by the adjusting amplitude to obtain the target attribute characteristic of the adjusted face attribute.
On the basis of the above technical solution, the adjustment direction determining unit includes:
a sample image data obtaining subunit, configured to obtain sample image data, where the sample image data is marked with a first attribute vector and a second attribute vector related to the attribute of the target face.
And the sample attribute feature extraction subunit is used for extracting sample attribute features related to the target human face attributes from the sample image data.
And a hyperplane determining subunit, configured to determine a hyperplane in a feature space formed by the attribute features, where the hyperplane divides the feature space into a first feature space and a second feature space, where the first feature space includes the attribute feature of the image data marked as the first attribute vector, and the second feature space includes the attribute feature of the image data marked as the second attribute vector.
And the first adjusting direction determining subunit is used for taking the normal vector of the hyperplane as the adjusting direction associated with the target face attribute.
On the basis of the technical scheme, the sample image data acquisition subunit is specifically used for randomly generating at least one face code; respectively inputting the face codes into the face generator to carry out image reconstruction processing to obtain at least one sample image data; determining a preset classification model for identifying the target face attribute, wherein the target face attribute is characterized by using a first attribute vector or a second attribute vector; inputting the sample image data into the classification model for identification processing, and determining the probability that the sample image data belongs to a first attribute vector or a second attribute vector; and screening out the first n sample image data of the probability.
On the basis of the above technical solution, the adjustment direction determining unit includes:
a reference image obtaining subunit, configured to obtain first reference image data and second reference image data, where the second reference image data is image data obtained by changing an attribute of a target face of the first reference image data;
a first reference extracting subunit operable to extract a first reference attribute feature from the first reference image data;
a second reference extracting subunit operable to extract a second reference attribute feature from the second reference image data;
a second adjustment direction determining subunit, configured to use a difference between the second reference attribute feature and the first reference attribute feature as an adjustment direction.
On the basis of the technical scheme, the image reconstruction network comprises a plurality of layers of feature reconstruction layers which are sequentially connected; a reconstruction module 540, comprising:
and the candidate image receiving unit is used for receiving the candidate image output by the feature reconstruction layer of the previous layer aiming at the feature reconstruction layer of the current layer.
The target attribute feature receiving unit is used for receiving target attribute features to be input into the feature reconstruction layer of the local layer;
and the reconstruction unit is used for reconstructing the candidate image by using the target attribute feature in a feature reconstruction layer of the layer to obtain a feature image.
And the first output unit is used for outputting the characteristic image as target image data if the characteristic reconstruction layer of the layer is the last layer.
And the second output unit is used for outputting the characteristic image as a new candidate image if the characteristic reconstruction layer of the current layer is not the last layer.
The product can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example six
Fig. 6 is a schematic structural diagram of a live broadcasting device according to a sixth embodiment of the present invention.
Referring to fig. 6, the live broadcasting apparatus includes: an acquisition module 610, a feature extraction module 620, an attribute adjustment module 630, an image reconstruction module 640, and a playback module 650.
The system comprises an acquisition module 610, a display module and a processing module, wherein the acquisition module 610 is used for acquiring original video data, the original video data comprises original image data, and the original image data comprises a main broadcasting user;
a feature extraction module 620, configured to extract an original attribute feature from each frame of the original image data, where the original attribute feature is a feature that characterizes a face attribute of the anchor user;
an attribute adjusting module 630, configured to adjust the original attribute characteristics to obtain target attribute characteristics;
an image reconstruction module 640, configured to perform image reconstruction processing using each frame of the target attribute feature to obtain target image data with the adjusted face attribute;
and a playing module 650 for playing the target video data with the target image data.
According to the technical scheme of the embodiment, original video data are collected, wherein the original video data comprise original image data, and anchor users are arranged in the original image data; extracting original attribute features from each frame of original image data, wherein the original attribute features are features representing the face attributes of the anchor users; adjusting the original attribute characteristics to obtain target attribute characteristics; carrying out image reconstruction processing by using each frame of target attribute feature to obtain target image data with the adjusted face attribute; the technical scheme adopts the original attribute characteristic with decoupling property to adjust, solves the problem that the effects brought by different adjusting sequences influence each other when different image processing modes are used for adjusting the face attributes, realizes the simultaneous change of the face attributes of the original image data so as to adjust the figure images in the original image data, further reduces the development difficulty and development time of developers, and can determine the adjusting effect. Furthermore, by performing smoothing processing on the target image data of two adjacent frames, the stability of the target image data of two adjacent frames in the playing process can be ensured, and the situation of jitter is avoided.
On the basis of the above technical solution, the playing module 650 includes:
and the smoothing unit is used for smoothing the target image data of two adjacent frames to obtain new target image data.
And the playing unit is used for playing the target video data with the new target image data.
The product can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE seven
Fig. 7 is a schematic structural diagram of an image processing apparatus according to a seventh embodiment of the present invention. As shown in fig. 7, the image processing apparatus includes: a processor 70, a memory 71, an input device 72, and an output device 73. The number of the processors 70 in the image processing apparatus may be one or more, and one processor 70 is taken as an example in fig. 7. The number of the memories 71 in the image processing apparatus may be one or more, and one memory 71 is illustrated in fig. 7. The processor 70, the memory 71, the input device 72, and the output device 73 of the image processing apparatus may be connected by a bus or other means, and are exemplified by being connected by a bus in fig. 7.
The memory 71 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the image processing method according to any embodiment of the present invention (for example, the original image data acquisition module 510, the original attribute feature extraction module 520, the adjustment module 530, and the reconstruction module 540 in the image processing apparatus). The memory 71 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 71 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 71 may further include memory located remotely from the processor 70, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 72 may be used to receive input numeric or character information and generate key signal inputs relating to viewer user settings and function controls of the image processing apparatus, as well as a camera for capturing images and a sound pickup apparatus for capturing audio data. The output device 73 may include an audio device such as a speaker. It should be noted that the specific composition of the input device 72 and the output device 73 can be set according to actual conditions.
The processor 70 executes various functional applications of the apparatus and data processing, i.e., implements the image processing method described above, by executing software programs, instructions, and modules stored in the memory 71.
Example eight
Fig. 8 is a schematic structural diagram of a live broadcast device according to an eighth embodiment of the present invention. As shown in fig. 8, the live device includes: a processor 80, a memory 81, an input device 82, and an output device 83. The number of processors 80 in the live device may be one or more, and one processor 80 is taken as an example in fig. 8. The number of the memories 81 in the live device may be one or more, and one memory 81 is taken as an example in fig. 8. The processor 80, the memory 81, the input device 82 and the output device 83 of the live device may be connected by a bus or other means, and fig. 8 illustrates an example of a connection by a bus. The live broadcast equipment can be a computer, a server and the like. In this embodiment, the live broadcast device is taken as a server for detailed description, and the server may be an independent server or a cluster server.
The memory 81 is used as a computer readable storage medium for storing software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the live broadcasting method according to any embodiment of the present invention (for example, the capturing module 610, the feature extracting module 620, the attribute adjusting module 630, the image reconstructing module 640, and the playing module 650 in the live broadcasting device). The memory 81 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 81 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 81 may further include memory located remotely from processor 80, which may be connected to devices through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 82 may be used to receive input numeric or character information and generate key signal inputs relating to viewer user settings and function controls of the live equipment, as well as a camera for capturing images and a sound pickup device for capturing audio data. The output device 83 may include an audio device such as a speaker. The specific composition of the input device 82 and the output device 83 may be set according to actual conditions.
The processor 80 executes software programs, instructions and modules stored in the memory 81 to execute various functional applications of the device and data processing, i.e. to implement the live broadcast method described above.
Example nine
An embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform an image processing method, including:
acquiring original image data, wherein the original image data has a character image;
extracting original attribute features from the original image data, wherein the original attribute features are features representing the human face attributes of the human figure image;
adjusting the original attribute characteristics to obtain target attribute characteristics;
and performing image reconstruction processing by using the target attribute characteristics to obtain target image data with the adjusted human face attribute.
Of course, the storage medium provided by the embodiment of the present invention includes computer-executable instructions, where the computer-executable instructions are not limited to the operations of the image processing method described above, and may also execute related operations in the image processing method provided by any embodiment of the present invention, and have corresponding functions and beneficial effects.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a robot, a personal computer, a server, or a network device) to execute the image processing method according to any embodiment of the present invention.
It should be noted that, in the image processing apparatus, each unit and each module included in the image processing apparatus are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, the specific names of the functional units are only for the convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
Example ten
Embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a live broadcast method, including:
acquiring original video data, wherein the original video data comprises original image data, and the original image data comprises a main broadcasting user;
extracting original attribute features from each frame of original image data, wherein the original attribute features are features representing the face attributes of the anchor users;
adjusting the original attribute characteristics to obtain target attribute characteristics;
carrying out image reconstruction processing by using each frame of target attribute feature to obtain target image data with the adjusted face attribute;
and playing the target video data with the target image data.
Of course, the storage medium provided in the embodiment of the present invention includes computer-executable instructions, and the computer-executable instructions are not limited to the operations of the live broadcast method described above, and may also perform related operations in the live broadcast method provided in any embodiment of the present invention, and have corresponding functions and advantages.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions to enable a computer device (which may be a robot, a personal computer, a server, or a network device) to execute the live broadcast method according to any embodiment of the present invention.
It should be noted that, in the above live broadcasting device, each unit and each module included in the live broadcasting device are only divided according to functional logic, but are not limited to the above division, as long as the corresponding function can be implemented; in addition, the specific names of the functional units are only for the convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (14)

1. An image processing method, comprising:
the method is based on a face generator and a face encoder, the face generator comprising: decoding the network;
acquiring original image data, wherein the original image data has a character image;
extracting raw attribute features from the raw image data comprises:
inputting the original image data into the face encoder for encoding processing, and outputting an original face code;
inputting the original face code into the decoding network for decoding processing, and outputting an original attribute feature comprising at least one face attribute, wherein the original attribute feature is a feature representing the face attribute of the character image;
adjusting the original attribute characteristics to obtain target attribute characteristics;
and performing image reconstruction processing by using the target attribute characteristics to obtain target image data with the adjusted human face attribute.
2. The method of claim 1, wherein the face generator further comprises an image reconstruction network connected to the decoding network, the image reconstruction network being configured to input a plurality of sets of features characterizing attributes of the face to reconstruct the image data;
the adjusting the original attribute characteristics to obtain target attribute characteristics includes:
receiving a first user operation;
determining reference image data including a face image selected by the first user operation;
extracting features representing the attributes of the human face from the reference image data to serve as reference attribute features;
taking the original attribute features of a preset group number as target attribute features of the image reconstruction network;
and replacing the original attribute features belonging to the same group as the reference attribute features in the target attribute features by the reference attribute features.
3. The method of claim 1, wherein the adjusting the original attribute feature to obtain a target attribute feature comprises:
receiving a second user operation;
determining an adjustment amplitude selected by the second user operation and related to a target face attribute, wherein the target face attribute is a face attribute to be adjusted in the original image data;
determining an adjustment direction associated with the target face attribute;
and moving the original attribute feature by the adjustment amplitude along the adjustment direction to obtain the target attribute feature of the adjusted face attribute.
4. The method of claim 3, wherein determining the adjustment direction associated with the target face attribute comprises:
acquiring sample image data, wherein the sample image data is marked with a first attribute vector and a second attribute vector related to the attributes of the target face;
extracting sample attribute features related to the target human face attribute from the sample image data;
determining a hyperplane in a feature space formed by the attribute features, wherein the hyperplane divides the feature space into a first feature space and a second feature space, the first feature space comprises the attribute features of the image data marked as a first attribute vector, and the second feature space comprises the attribute features of the image data marked as a second attribute vector;
and taking the normal vector of the hyperplane as an adjusting direction associated with the target face attribute.
5. The method of claim 4, wherein the acquiring sample image data comprises:
randomly generating at least one face code;
respectively inputting the face codes into the face generator to carry out image reconstruction processing to obtain at least one sample image data;
determining a preset classification model for identifying the target face attribute, wherein the target face attribute is characterized by using a first attribute vector or a second attribute vector;
inputting the sample image data into the classification model for identification processing, and determining the probability that the sample image data belongs to a first attribute vector or a second attribute vector;
and screening the first n sample image data of the probability.
6. The method of claim 3, wherein determining the adjustment direction associated with the target face attribute comprises:
acquiring first reference image data and second reference image data, wherein the second reference image data is image data obtained by changing the attribute of a target face of the first reference image data;
extracting a first reference attribute feature from the first reference image data;
extracting a second reference attribute feature from the second reference image data;
and taking the difference value of the second reference attribute characteristic and the first reference attribute characteristic as an adjusting direction.
7. The method according to any one of claims 2-6, wherein the face generator further comprises an image reconstruction network connected to the decoding network, the image reconstruction network comprising a plurality of sequentially connected feature reconstruction layers;
the image reconstruction processing using the target attribute features to obtain target image data with the adjusted face attribute, including:
aiming at a feature reconstruction layer of the current layer, receiving a candidate image output by a feature reconstruction layer of the previous layer;
receiving target attribute characteristics to be input into a characteristic reconstruction layer of the current layer;
in a feature reconstruction layer of the layer, reconstructing the candidate image by using the target attribute feature to obtain a feature image;
if the feature reconstruction layer of the layer is the last layer, outputting the feature image as target image data;
and if the feature reconstruction layer of the current layer is not the last layer, outputting the feature image as a new candidate image.
8. A live broadcast method, comprising:
acquiring original video data, wherein the original video data comprises original image data, and anchor users are arranged in the original image data;
extracting original attribute features from each frame of original image data, wherein the original attribute features are features representing the face attributes of the anchor user;
adjusting the original attribute characteristics to obtain target attribute characteristics;
carrying out image reconstruction processing by using each frame of target attribute feature to obtain target image data with the adjusted face attribute;
playing back the target video data with the target image data comprises:
processing the target image data of two adjacent frames to obtain new target image data;
and playing the target video data with the new target image data.
9. An image processing apparatus characterized by comprising:
the human face generator comprises a human face generator and a human face encoder, wherein the human face generator comprises: decoding the network;
the system comprises an original image data acquisition module, a character image acquisition module and a character image acquisition module, wherein the original image data acquisition module is used for acquiring original image data which has a character image;
the original attribute feature extraction module is used for extracting original attribute features from the original image data, wherein the original attribute features are features representing the face attributes of the figure image;
the original attribute feature extraction module comprises:
the coding unit is used for inputting the original image data into the face coder for coding processing and outputting an original face code;
the decoding unit is used for inputting the original face code into the decoding network for decoding processing and outputting original attribute characteristics comprising at least one face attribute;
the adjusting module is used for adjusting the original attribute characteristics to obtain target attribute characteristics;
and the reconstruction module is used for carrying out image reconstruction processing by using the target attribute characteristics to obtain target image data with the adjusted human face attribute.
10. A live broadcast apparatus, comprising:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring original video data, the original video data comprises original image data, and the original image data comprises an anchor user;
the characteristic extraction module is used for extracting original attribute characteristics from each frame of original image data, wherein the original attribute characteristics are characteristics representing the face attributes of the anchor user;
the attribute adjusting module is used for adjusting the original attribute characteristics to obtain target attribute characteristics;
the image reconstruction module is used for carrying out image reconstruction processing by using each frame of target attribute feature to obtain target image data with the adjusted face attribute;
the playing module is used for playing the target video data with the target image data;
the playing module comprises:
the smoothing unit is used for smoothing the target image data of two adjacent frames to obtain new target image data;
and the playing unit is used for playing the target video data with the new target image data.
11. An image processing apparatus characterized by comprising: a memory and one or more processors;
the memory to store one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the image processing method of any one of claims 1-7.
12. A live device, comprising: a memory and one or more processors;
the memory to store one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the live method of claim 8.
13. A storage medium containing computer-executable instructions for performing the image processing method of any one of claims 1-7 when executed by a computer processor.
14. A storage medium containing computer executable instructions, which when executed by a computer processor, are for performing the live method of claim 8.
CN201910593748.XA 2019-07-03 2019-07-03 Image processing and live broadcasting method and related devices Active CN112188234B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910593748.XA CN112188234B (en) 2019-07-03 2019-07-03 Image processing and live broadcasting method and related devices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910593748.XA CN112188234B (en) 2019-07-03 2019-07-03 Image processing and live broadcasting method and related devices

Publications (2)

Publication Number Publication Date
CN112188234A CN112188234A (en) 2021-01-05
CN112188234B true CN112188234B (en) 2023-01-06

Family

ID=73915169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910593748.XA Active CN112188234B (en) 2019-07-03 2019-07-03 Image processing and live broadcasting method and related devices

Country Status (1)

Country Link
CN (1) CN112188234B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991484B (en) * 2021-04-28 2021-09-03 中科计算技术创新研究院 Intelligent face editing method and device, storage medium and equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4101260B2 (en) * 2005-09-01 2008-06-18 キヤノン株式会社 Image processing apparatus and image processing method
CN106231415A (en) * 2016-08-18 2016-12-14 北京奇虎科技有限公司 A kind of interactive method and device adding face's specially good effect in net cast
CN113329252B (en) * 2018-10-24 2023-01-06 广州虎牙科技有限公司 Live broadcast-based face processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112188234A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
Plummer et al. Conditional image-text embedding networks
Peng et al. Two-stream collaborative learning with spatial-temporal attention for video classification
Yang et al. Unsupervised extraction of video highlights via robust recurrent auto-encoders
Ma et al. Stage-wise salient object detection in 360 omnidirectional image via object-level semantical saliency ranking
Quispe et al. Top-db-net: Top dropblock for activation enhancement in person re-identification
CN107222795B (en) Multi-feature fusion video abstract generation method
CN110839173A (en) Music matching method, device, terminal and storage medium
CN109472764B (en) Method, apparatus, device and medium for image synthesis and image synthesis model training
Wang et al. Robust video-based person re-identification by hierarchical mining
CN110879974B (en) Video classification method and device
CN112804558B (en) Video splitting method, device and equipment
CN113537056A (en) Avatar driving method, apparatus, device, and medium
CN114550070A (en) Video clip identification method, device, equipment and storage medium
CN112102468B (en) Model training method, virtual character image generation device, and storage medium
CN112597824A (en) Behavior recognition method and device, electronic equipment and storage medium
Zhu et al. Learning audio-visual correlations from variational cross-modal generation
Wang et al. Fast and accurate action detection in videos with motion-centric attention model
CN112188234B (en) Image processing and live broadcasting method and related devices
Zhao et al. Saliency-guided video classification via adaptively weighted learning
CN113825012B (en) Video data processing method and computer device
CN109299777B (en) Data processing method and system based on artificial intelligence
US20220375223A1 (en) Information generation method and apparatus
Wang et al. Low-latency human action recognition with weighted multi-region convolutional neural network
CN113497947B (en) Video recommendation information output method, device and system
CN111090996B (en) Word segmentation method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant