CN114419177A - Personalized expression package generation method and system, electronic equipment and readable medium - Google Patents

Personalized expression package generation method and system, electronic equipment and readable medium Download PDF

Info

Publication number
CN114419177A
CN114419177A CN202210018166.0A CN202210018166A CN114419177A CN 114419177 A CN114419177 A CN 114419177A CN 202210018166 A CN202210018166 A CN 202210018166A CN 114419177 A CN114419177 A CN 114419177A
Authority
CN
China
Prior art keywords
expression
feature
package
semantic
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210018166.0A
Other languages
Chinese (zh)
Inventor
单志辉
孙环荣
宫新伟
陈兆金
徐灵敏
赵世亭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xunze Network Technology Co ltd
Original Assignee
Shanghai Xunze Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xunze Network Technology Co ltd filed Critical Shanghai Xunze Network Technology Co ltd
Priority to CN202210018166.0A priority Critical patent/CN114419177A/en
Publication of CN114419177A publication Critical patent/CN114419177A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method and a system for generating a personalized expression package, electronic equipment and a readable medium, wherein the method comprises the following steps: s100, when an expression package tool triggers the production configuration of an expression package, receiving a plurality of source images with target faces, wherein each target face has different expression action information; s200, performing expression action-related feature extraction on the target face in each source image by using a preset feature label model, generating expression feature libraries related to each expression action, and performing semantic labeling on each expression feature library to obtain a plurality of expression feature libraries with semantic results; s300, receiving the expression feature information in each expression feature library by using a feature fusion model, fusing a plurality of expression feature libraries with semantic results into a personalized expression package of the target face after VIT visual processing, and displaying the corresponding expression action images of the target face according to the semantic results. The invention can display the expression and action image matched with the semantic result by directly utilizing the semantic result in the input content.

Description

Personalized expression package generation method and system, electronic equipment and readable medium
Technical Field
The invention relates to the technical field of computer vision and deep learning, in particular to a method and a system for generating a personalized expression package, electronic equipment and a readable medium.
Background
With the continuous development of social networks, people gradually evolve from text communication to communication using symbols, images, emoticons and the like. Because the emoticon can enrich the chat content of the user, make up the advantages of boring character communication, inaccurate meaning representation and the like, and improve the communication efficiency better, for example, the emotion which is difficult to describe through the characters is expressed by the user through the emoticon information in the emoticon. Many expression packages using face images as materials exist at present, but most of the expression packages are the existing expression package materials stored by users, cannot be identified and defined according to the face images, and need the users to define the face images.
Disclosure of Invention
The embodiment of the application provides a method, a system, an electronic device and a readable medium for generating a personalized expression package, so that the defects that in the prior art, an expression is hard to synthesize and manufactured, and a user needs to select an expression action to be displayed are overcome, and the beneficial effect that an expression action image matched with a semantic result can be displayed by utilizing the semantic result contained in input content is achieved.
In a first aspect, an embodiment of the present application provides a method for generating a personalized facial expression package, where the method includes:
s100, when an expression package tool triggers the production configuration of an expression package, receiving a plurality of source images with target faces, wherein each target face has different expression action information;
s200, performing expression action-related feature extraction on the target face in each source image by using a preset feature label model, generating expression feature libraries related to each expression action, and performing semantic labeling on each expression feature library to obtain a plurality of expression feature libraries with semantic results;
s300, receiving the expression feature information in each expression feature library by using a feature fusion model, fusing a plurality of expression feature libraries with semantic results into an individualized expression package of the target face after VIT visual processing, and displaying corresponding target face expression action images according to the semantic results.
Further, after the step S300, the method further includes storing the personalized expression package of the target face generated by fusion, so that when the expression package tool triggers expression package application configuration, after a semantic result related to the personalized expression package is analyzed from an input content on a screen, the personalized expression package of the target face is called from the expression library, and a target facial expression action image matched with the semantic result is displayed.
Further, after the step S300, when the emoticon application configuration is triggered in response to the emoticon tool, receiving text content input on a screen, analyzing one semantic result with the personalized emoticon in the text content, triggering and calling the personalized emoticon in the emoticon library, and displaying an emoticon image matched with the semantic result.
Further, in the step S200, the feature label model performs feature extraction related to expression actions by using a deep learning technique, generates an expression feature library related to each expression action, and performs semantic labeling related to the expression actions on each expression feature library.
Further, the feature labeling model adopts a convolutional neural network for feature extraction, which includes:
performing convolution down-sampling processing on each source image for multiple times to complete feature extraction;
performing convolution down-sampling processing on the target face in each source image for multiple times to complete key point feature extraction;
and after carrying out convolution down-sampling processing on each target face for multiple times, finishing feature contour extraction related to expression and action.
Further, in step S300, the method for fusing a plurality of facial expression feature libraries with semantic results into a personalized facial expression package of the target face includes:
s310, obtaining an expression feature library subjected to convolution down-sampling for multiple times;
s320, merging the feature information after convolution down-sampling processing in different expression feature libraries each time, inputting the merged feature information into a pre-constructed VIT network architecture, and up-sampling to obtain a first fusion feature;
s330, after the first fusion features of different times are spliced according to channels, convolution up-sampling is carried out to obtain second fusion features;
s340, splicing the first fusion feature and the second fusion feature again according to the channel to obtain a third fusion feature;
and S350, performing convolution up-sampling on the third fusion feature for multiple times to obtain a fused personalized expression package.
Further, after step S340, obtaining a target face image of the desired expression action by convolution down-sampling is also included.
Further, in the VIT network architecture, the combined feature information is received, the feature information with the semantic mark is subjected to blocking processing, and a plurality of feature image blocks are obtained;
and performing linear transformation on each characteristic image block, realizing characteristic image block embedding through dimension reduction processing, and inputting the characteristic image blocks into a transform encoder in sequence after spatial position coding is performed on the characteristic image block embedding so as to perform fusion of various action expression characteristics.
In a second aspect, an embodiment of the present application provides a personalized facial expression package generating system, where the method described in any one of the first aspects is adopted, and the system includes:
the system comprises an image receiving module, a display module and a display module, wherein the image receiving module is configured to respond to an expression package tool to trigger expression package manufacturing configuration, and receive a plurality of source images with target faces, wherein each target face has different expression action information;
the semantic labeling module is configured to extract features related to expression actions of the target face in the source images by using a preset feature labeling model, generate expression feature libraries related to the expression actions, and perform semantic labeling on the expression feature libraries to obtain a plurality of expression feature libraries with semantic results;
and the expression generation module is configured to receive expression feature information in each expression feature library by using a feature fusion model, fuse the plurality of expression feature libraries with semantic results into an individualized expression package of the target face after VIT visual processing, and display a corresponding target face expression action image according to the semantic results.
In a third aspect, an embodiment of the present application provides an electronic device, including:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as in any one of the first aspects.
In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the method according to any one of the first aspect.
The technical scheme provided in the embodiment of the application has at least the following technical effects:
1. compared with the traditional expression image synthesis processing technology, the method and the device have the advantages that the image copy 'hard and hard' nesting is replaced by the fusion technology, in the traditional synthesis technology, the facial image is directly copied on the needed expression image or expression images with different styles are selected, in the embodiment, after the expression characteristic image is extracted, the VI T (VI s i on Transformer) technology is deeply learned after the facial image is divided into blocks and crushed, an integral personalized expression package with multi-expression actions is formed, and the expression needed by a user is automatically generated and displayed according to the display requirement.
2. According to the method and the device, a natural language processing technology is combined, the expression characteristic image is subjected to semantic annotation to be provided with corresponding semantic information and enters a VI T network architecture as a one-dimensional input sequence for learning and training, and the expression characteristic image with a semantic result is formed, so that when a user needs the corresponding expression action image, the expression action image corresponding to the semantic result can be directly displayed according to the input content on a screen without selecting from a plurality of expression packages. That is to say, the personalized expression package after being made can present the corresponding expression action image only by inputting the related expression keywords, such as inputting the 'Duzui', and the system returns to generate the 'Duzui' expression of the user input face. Of course, the expression motion image in this embodiment may be a still image or a moving image. The generation configuration of the personalized expression package in the embodiment can customize the facial image to generate the action expression image in the required expression package, for example, the expression image of the user or collected public characters is made into the expression package, so that the affinity and the interestingness are increased.
Drawings
Fig. 1 is a flowchart of a method for generating a personalized facial expression package according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a fusion algorithm in the first embodiment of the present application;
fig. 3 is a block diagram of a personalized facial expression package generating system according to a second embodiment of the present application.
Detailed Description
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Example one
Referring to fig. 1-2, an embodiment of the present application provides a method for generating a personalized facial expression package, where the method includes:
step S100, when the expression package manufacturing configuration is triggered in response to an expression package tool, receiving a plurality of source images with target faces, wherein each target face has different expression action information.
The expression package tool in the personalized expression package generation method in the embodiment is loaded in a user terminal, the user terminal may be a smart phone, a tablet computer, or the like, and the loading mode may be an independent APP application program, a wechat applet, or a plug-in embedded into an application program such as an input method. And triggering and opening a source image receiving configuration when the expression package tool triggers the expression package making configuration function, wherein the source image can be picture information stored in the user terminal or picture information acquired by a camera on the current user terminal. For example, when the emoticon manufacturing configuration is triggered to be opened, a camera on the user terminal is called, and image acquisition of different emotions is performed according to the number of required source images, such as emotions like laughing, smiling, head bending, mouth pounding, crying and the like.
Step S200, performing expression action-related feature extraction on the target face in each source image by using a preset feature label model, generating expression feature libraries related to each expression action, and performing semantic labeling on each expression feature library to obtain a plurality of expression feature libraries with semantic results.
In the step S200, the feature label model performs feature extraction related to expression actions by using a deep learning technique, generates an expression feature library related to each expression action, and performs semantic labeling related to expression actions on each expression feature library.
The feature labeling model adopts a convolutional neural network for feature extraction, and comprises the following steps: performing convolution down-sampling processing on each source image for multiple times to complete feature extraction; performing convolution down-sampling processing on the target face in each source image for multiple times to complete key point feature extraction; and after carrying out convolution down-sampling processing on each target face for multiple times, finishing feature contour extraction related to expression and action.
Further explaining, extracting the characteristics of each source image, and marking the processing result of each convolution down-sampling as Si after carrying out convolution down-sampling processing for multiple times; extracting key point features of the target face in each source image, and marking the processing result of each convolution down-sampling as Li after carrying out convolution down-sampling processing for multiple times; extracting characteristic contours related to expression and actions in each target face, and marking the processing result of each convolution down-sampling as ei after carrying out convolution down-sampling processing for multiple times; and acquiring an expression feature library comprising Si, Li and ei.
In this embodiment, the number of times of convolution down-sampling processing is defined as 5, and Si ═ S0, S1, S2, S3, S4, Li ═ L0, L1, L2, L3, L4, ei ═ e0, e1, e2, e3, e 4.
That is, the source image is downsampled five times by convolution, and the results of each downsampling by convolution are labeled S0, S1, S2, S3, S4 in order. Extracting 68 key points on the target face; and extracting the eye and mouth regions of the target face. Taking the expression action of "beep mouth, smile, and head-skew" as an example, five convolution and down-sampling are performed, and the results of each convolution and down-sampling are labeled as L0, L1, L2, L3, and L4 in order. And extracting the outlines of the eyes and the mouth of the target face, and showing the outlines of the eyes and the mouth which can be bleb, smile and askew. And five convolution and downsampling are carried out on the data, and the results of each convolution and downsampling are marked as e0, e1, e2, e3 and e4 in sequence.
And step S300, receiving the expression feature information in each expression feature library by using a feature fusion model, fusing a plurality of expression feature libraries with semantic results into an individualized expression package of the target face after VIT visual processing, so as to display corresponding target face expression action images according to the semantic results.
Step S300 is followed by storing the personalized expression package of the target face generated by fusion, so that when the expression package tool triggers expression package application configuration, after parsing a semantic result related to the personalized expression package from the input content on the screen, the personalized expression package of the target face is called from the expression library, and a target facial expression action image matched with the semantic result is displayed.
Further, after the step S300, when the emoticon application configuration is triggered in response to the emoticon tool, receiving text content input on a screen, analyzing one semantic result with the personalized emoticon in the text content, triggering and calling the personalized emoticon in the emoticon library, and displaying an emoticon image matched with the semantic result.
In step S300, the method for fusing a plurality of expression feature libraries with semantic results into a personalized expression package of a target face includes:
and step S310, obtaining the expression feature library after convolution and downsampling for many times.
And step S320, merging the feature information after convolution down-sampling processing in different expression feature libraries, inputting the merged feature information into a pre-constructed VIT network architecture, and performing up-sampling to obtain a first fusion feature.
And step S330, splicing the first fusion features of different times according to the channels, and performing convolution and upsampling to obtain second fusion features.
And S340, splicing the first fusion feature and the second fusion feature again according to the channel to obtain a third fusion feature.
And S350, performing convolution up-sampling on the third fusion feature for multiple times to obtain a fused personalized expression package.
Further, the method for performing fusion processing on the features extracted after the 5 times of convolution downsampling processing includes: merging the { S4, L4 and E4}, inputting the merged data into a VIT network architecture, and then performing upsampling to obtain a first fusion characteristic E1; merging the { S3, L3 and E3} and inputting the merged data into a ViT network architecture to obtain a first converged feature E2; splicing the first fusion feature E1 and the first fusion feature E2 according to channels, and performing convolution upsampling to obtain a second fusion feature E3; merging the { S2, L2 and E2}, inputting the merged data into a ViT network architecture to obtain a first fused feature E4, and splicing a second fused feature E3 and the first fused feature E4 according to channels to obtain a third fused feature; and performing convolution upsampling on the third fusion characteristic for multiple times to obtain a fused personalized expression package. In this embodiment, three times of convolution upsampling is performed on the third fused feature. The obtained personalized facial expression package may show expressive actions like beeping, smiling, head-shaking, and only one expressive action at a time.
After step S340, the method further includes obtaining a target face image of the desired expression action by convolution down-sampling. That is, in order to apply the personalized expression package, a target face image of the target expression action needs to be acquired through a reverse operation of the fusion operation.
Further, in the VIT network architecture, the combined feature information is received, the feature information with the semantic mark is subjected to blocking processing, and a plurality of feature image blocks are obtained; and performing linear transformation on each characteristic image block, realizing characteristic image block embedding through dimension reduction processing, and inputting the characteristic image blocks into a transform encoder in sequence after spatial position coding is performed on the characteristic image block embedding so as to perform fusion of various action expression characteristics.
In the VIT network architecture, based on a transform technology, image information in an expression feature library is divided into image blocks patches, position embedding imbedding is carried out on the expression feature information while linear embedding is carried out, and the position embedding imbedding is carried out on the expression feature information and is input into a standard transform encoder in sequence. To further illustrate, in this embodiment, the standard Transformer encoder receives a semantic result token of the one-dimensional sequence as an input, and in order to process the two-dimensional expression feature image, the expression feature image is expressed as an input
Figure BDA0003460923260000081
Deformed into a series of flattened two-dimensional feature image blocks, represented as
Figure BDA0003460923260000082
Where (H, W) denotes the resolution of the original image, p2Representing the resolution of each characteristic image block. N ═ HW/p2Is the effective sequence length of the VIT network structure. The VIT network architecture uses the same width at all layers, i.e., each vectorized feature image block is mapped onto a model dimension for a trainable linear projection, and the corresponding output feature image block is embedded. The following equation is expressed:
Figure BDA0003460923260000083
embedding in a sequence of feature image blocks
Figure BDA0003460923260000084
Previously, a deep learning embedding was added, in a transform encoder
Figure BDA0003460923260000085
The states in the output may be represented as an image y, as in the formula:
Figure BDA0003460923260000086
in the pre-training and fine-tuning stage, the classification head is attached to
Figure BDA0003460923260000087
The Transformer encoder consists of multiple interactive layers of multi-headed attention (MSA) and MLP, each feature image block preceded by layerorm (ln), and residual concatenation applied after each block. MLP contains two layers that exhibit GELU nonlinearity, as shown by the formula: z' ofl=MSA(LN(zl-1))+zl-1,l=1...L;zl=MLP(LN(z`l))+z`lL is 1.
It can be seen that, compared with the conventional expression image synthesis processing technology, in the embodiment, the image replication 'hard and hard' nesting is replaced by the fusion technology, in the conventional synthesis technology, the facial image is directly replicated on the required expression image or expression images with different styles are selected, in the embodiment, after the expression feature image is extracted, the blocks are kneaded and smashed, the deep learning vit (vision transform) technology is performed, an integral personalized expression package with multi-expression actions is formed, and the expression required by the user is automatically generated and displayed according to the display requirement.
By combining a natural language processing technology, semantic annotation is carried out on expression characteristic images, so that the expression characteristic images are provided with corresponding semantic information and used as one-dimensional input sequences to enter a VI T network architecture for learning and training, and expression characteristic images with semantic results are formed. That is to say, the personalized expression package after being made can present the corresponding expression action image only by inputting the related expression keywords, such as inputting the 'Duzui', and the system returns to generate the 'Duzui' expression of the user input face. Of course, the expression motion image in this embodiment may be a still image or a moving image. The generation configuration of the personalized expression package in the embodiment can customize the facial image to generate the action expression image in the required expression package, for example, the expression image of the user or collected public characters is made into the expression package, so that the affinity and the interestingness are increased.
Example two
Referring to fig. 3, the present embodiment provides a personalized expression package generating system, which employs the method according to any one of the embodiments, and the system includes:
the image receiving module 100 is configured to receive a plurality of source images with target faces when an expression package tool triggers an expression package making configuration, wherein each target face has different expression action information.
The semantic labeling module 200 is configured to perform expression and motion related feature extraction on the target face in each source image by using a preset feature label model, generate an expression feature library related to each expression motion, and perform semantic labeling on each expression feature library to obtain a plurality of expression feature libraries with semantic results.
The expression generation module 300 is configured to receive expression feature information in each expression feature library by using a feature fusion model, and fuse a plurality of expression feature libraries with semantic results into an individualized expression package of a target face after VIT visual processing, so as to display a corresponding target face expression action image according to the semantic results.
EXAMPLE III
The embodiment provides an electronic device, including: one or more processors; the memory is used for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of embodiment one.
The present embodiment provides a computer-readable medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the method according to any one of the first embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (11)

1. A method for generating a personalized expression package, the method comprising:
s100, when an expression package tool triggers the production configuration of an expression package, receiving a plurality of source images with target faces, wherein each target face has different expression action information;
s200, performing expression action-related feature extraction on the target face in each source image by using a preset feature label model, generating expression feature libraries related to each expression action, and performing semantic labeling on each expression feature library to obtain a plurality of expression feature libraries with semantic results;
s300, receiving the expression feature information in each expression feature library by using a feature fusion model, fusing a plurality of expression feature libraries with semantic results into an individualized expression package of the target face after VIT visual processing, and displaying corresponding target face expression action images according to the semantic results.
2. The method for generating the personalized expression package according to claim 1, further comprising, after the step S300, storing the personalized expression package of the target face generated by fusion, so that when the expression package tool triggers expression package application configuration, after parsing a semantic result related to the personalized expression package from on-screen input content, calling the personalized expression package of the target face from the expression library, and displaying a target facial expression action image matched with the semantic result.
3. The method for generating the personalized expression package according to claim 1, wherein after the step S300, when an expression package application configuration is triggered in response to an expression package tool, receiving text content input on a screen, analyzing one semantic result with the personalized expression package in the text content, triggering and calling the personalized expression package in the expression library, and displaying an expression action image matched with the semantic result.
4. The method as claimed in claim 1, wherein in step S200, the feature label model performs feature extraction related to expression and action by using a deep learning technique, generates an expression feature library related to each expression and action, and performs semantic labeling related to expression and action on each expression feature library.
5. The method of generating personalized expression packages according to claim 4, wherein the feature labeling model employs a convolutional neural network for feature extraction, which comprises:
performing convolution down-sampling processing on each source image for multiple times to complete feature extraction;
performing convolution down-sampling processing on the target face in each source image for multiple times to complete key point feature extraction;
and after carrying out convolution down-sampling processing on each target face for multiple times, finishing feature contour extraction related to expression and action.
6. The method for generating personalized expression packages according to claim 5, wherein in step S300, the method for fusing a plurality of expression feature libraries with semantic results into the personalized expression package of the target face comprises:
s310, obtaining an expression feature library subjected to convolution down-sampling for multiple times;
s320, merging the feature information after convolution down-sampling processing in different expression feature libraries each time, inputting the merged feature information into a pre-constructed VIT network architecture, and up-sampling to obtain a first fusion feature;
s330, after the first fusion features of different times are spliced according to channels, convolution up-sampling is carried out to obtain second fusion features;
s340, splicing the first fusion feature and the second fusion feature again according to the channel to obtain a third fusion feature;
and S350, performing convolution up-sampling on the third fusion feature for multiple times to obtain a fused personalized expression package.
7. The method of generating a personalized expression package according to claim 6, further comprising, after the step S340, obtaining the target facial image of the desired expression action by convolution down-sampling.
8. The method for generating personalized expression packages according to claim 6, wherein in the VIT network architecture, the combined feature information is received, the feature information with semantic tags is subjected to blocking processing, and a plurality of feature image blocks are obtained;
and performing linear transformation on each characteristic image block, realizing characteristic image block embedding through dimension reduction processing, and inputting the characteristic image blocks into a transform encoder in sequence after spatial position coding is performed on the characteristic image block embedding so as to perform fusion of various action expression characteristics.
9. A system for generating a personalized expression package, using the method of any one of claims 1 to 8, the system comprising:
the system comprises an image receiving module, a display module and a display module, wherein the image receiving module is configured to respond to an expression package tool to trigger expression package manufacturing configuration, and receive a plurality of source images with target faces, wherein each target face has different expression action information;
the semantic labeling module is configured to extract features related to expression actions of the target face in the source images by using a preset feature labeling model, generate expression feature libraries related to the expression actions, and perform semantic labeling on the expression feature libraries to obtain a plurality of expression feature libraries with semantic results;
and the expression generation module is configured to receive expression feature information in each expression feature library by using a feature fusion model, fuse the plurality of expression feature libraries with semantic results into an individualized expression package of the target face after VIT visual processing, and display a corresponding target face expression action image according to the semantic results.
10. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-8.
11. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN202210018166.0A 2022-01-07 2022-01-07 Personalized expression package generation method and system, electronic equipment and readable medium Pending CN114419177A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210018166.0A CN114419177A (en) 2022-01-07 2022-01-07 Personalized expression package generation method and system, electronic equipment and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210018166.0A CN114419177A (en) 2022-01-07 2022-01-07 Personalized expression package generation method and system, electronic equipment and readable medium

Publications (1)

Publication Number Publication Date
CN114419177A true CN114419177A (en) 2022-04-29

Family

ID=81271855

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210018166.0A Pending CN114419177A (en) 2022-01-07 2022-01-07 Personalized expression package generation method and system, electronic equipment and readable medium

Country Status (1)

Country Link
CN (1) CN114419177A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115330913A (en) * 2022-10-17 2022-11-11 广州趣丸网络科技有限公司 Three-dimensional digital population form generation method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115330913A (en) * 2022-10-17 2022-11-11 广州趣丸网络科技有限公司 Three-dimensional digital population form generation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111858954B (en) Task-oriented text-generated image network model
WO2021073417A1 (en) Expression generation method and apparatus, device and storage medium
CN111401216B (en) Image processing method, model training method, image processing device, model training device, computer equipment and storage medium
CN111553267B (en) Image processing method, image processing model training method and device
CN111369646B (en) Expression synthesis method integrating attention mechanism
CN115565238B (en) Face-changing model training method, face-changing model training device, face-changing model training apparatus, storage medium, and program product
Saunders et al. Anonysign: Novel human appearance synthesis for sign language video anonymisation
CN113536999A (en) Character emotion recognition method, system, medium and electronic device
JP2023543964A (en) Image processing method, image processing device, electronic device, storage medium and computer program
Shahzad et al. Role of zoning in facial expression using deep learning
Gündüz et al. Turkish sign language recognition based on multistream data fusion
CN114419177A (en) Personalized expression package generation method and system, electronic equipment and readable medium
Renjith et al. Indian sign language recognition: A comparative analysis using cnn and rnn models
Sönmez et al. Convolutional neural networks with balanced batches for facial expressions recognition
CN114677569B (en) Character-image pair generation method and device based on feature decoupling
CN113761281A (en) Virtual resource processing method, device, medium and electronic equipment
Ezekiel et al. Investigating GAN and VAE to train DCNN
CN113129399A (en) Pattern generation
Kong et al. DualPathGAN: Facial reenacted emotion synthesis
US20240169701A1 (en) Affordance-based reposing of an object in a scene
Sreerenganathan et al. Assistive Application for the Visually Impaired using Machine Learning and Image Processing
CN116304163B (en) Image retrieval method, device, computer equipment and medium
CN115147850B (en) Training method of character generation model, character generation method and device thereof
Alongi Towards multi-granular explainable AI: increasing the explainability level for deepfake detectors
Kumar et al. Computer Vision and Creative Content Generation: Text-to-Sketch Conversion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination