CN114418837B

CN114418837B - Dressing migration method and electronic equipment

Info

Publication number: CN114418837B
Application number: CN202210345783.1A
Authority: CN
Inventors: 李子荣; 朱超; 冯蔚腾; 刘吉林
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-04-02
Filing date: 2022-04-02
Publication date: 2023-06-13
Anticipated expiration: 2042-04-02
Also published as: CN114418837A

Abstract

The embodiment of the application discloses a makeup migration method and electronic equipment, and relates to the field of image processing. The method comprises the following steps: acquiring a source image and a reference image, wherein the source image and the reference image respectively comprise a face image; and removing the category characteristics of the reference image and the category characteristics of the source image, and then migrating the makeup of the reference image to the source image and generating a target image. The target image is to migrate the makeup of the reference image to the source image, and the makeup migration error caused by inconsistent sizes of five sense organs is avoided by removing the category characteristics, so that the accuracy of the makeup migration is improved.

Description

Dressing migration method and electronic equipment

Technical Field

The application relates to the field of image processing, in particular to a makeup migration method and electronic equipment.

Background

With the development of technology, the processing of a single image has become more and more complex, parameters to be adjusted are more and more, and various parameters are difficult for a user to master, so that the method is unfavorable for the user to use. Such as making up a face image, requires the user to adjust various parameters (e.g., eye shadow size, lip color, nose wing height) to make up the face image.

Making up the face image requires a certain familiarity of the user with various parameters and a certain time for adjusting the parameters, which may reduce the user experience. The processing method for making up the face image is long in time consumption and has a high use threshold for users. Therefore, how to provide a method for efficiently applying makeup to a face image is a technical problem to be solved.

Disclosure of Invention

The embodiment of the application provides a makeup migration method and electronic equipment, which can make up face images more efficiently and accurately.

In a first aspect, the present application provides a makeup migration method, including: acquiring a source image and a reference image, wherein the source image and the reference image respectively comprise a face image; extracting the features of the salient region of the source image to obtain a first salient feature, and extracting the features of the salient region of the reference image to obtain a second salient feature; the salient features include five sense organs features, category features and makeup features; removing the category features in the first significant features to obtain category-removed source image features, and removing the category features in the second significant features to obtain category-removed reference image features; and fusing the dressing feature in the category-removed reference image feature to the five-sense organ feature of the category-removed source image feature to obtain a target image.

The generated target image has the advantages that the dressing style is reserved in the previous process, so that the category characteristics are removed, and the influence caused by inconsistent five sense organs of the source image and the reference image is avoided.

In a possible implementation manner of the first aspect, the extracting features of the salient region of the source image obtains a first salient feature, including: inputting the source image into a first extraction sub-model, and outputting the first salient features; wherein the first extraction sub-model has the ability to extract features of a salient region of the source image; the extracting features of the salient region of the reference image yields second salient features, comprising: inputting the reference image into a second extraction sub-model, and outputting the second salient features; wherein the second extraction sub-model has the ability to extract features of the salient region of the reference image.

The salient region of the image is extracted through the extraction sub-model, the extraction sub-model does not separate the facial regions of the human, but only extracts the salient region with the salient characteristics in the face image, so that the extraction process is simplified, and the segmentation time and the operation speed are saved.

In another possible implementation manner of the first aspect, the removing the class feature in the first salient feature to obtain a de-class source image feature, removing the class feature in the second salient feature to obtain a de-class reference image feature includes: inputting the first salient features into a mask submodel, and outputting the image features without category sources; inputting the second salient features into a mask submodel, and outputting the de-classified reference image features; the mask submodel has the ability to remove category features from salient features.

The mask sub-model removes category characteristics through mask operation, and cosmetic migration errors caused by the category characteristics are avoided.

In another possible implementation manner of the first aspect, the salient feature is a multi-dimensional matrix, and one row or one column of the multi-dimensional matrix is a category feature.

In another possible implementation manner of the first aspect, the fusing the makeup feature in the declassified reference image feature to the facial feature of the declassified source image feature to obtain the target image includes: inputting the de-classified source image features and the de-classified reference image features into a fusion sub-model, and outputting fusion salient features; the fusion sub-model has the capability of fusing the dressing feature in the declassification reference image feature to the five sense organs feature in the declassification source image feature; and inputting the fused salient features into a decoding submodel, and outputting the target image.

The image fusion is carried out by fusing the submodels, the advantages of the neural network in image processing are utilized, and the target image with high definition and high fidelity can be generated.

In another possible implementation manner of the first aspect, the feature of the five sense organs includes at least one of a size of each part of the five sense organs, a position of each part of the five sense organs in the face, a morphology of the five sense organs, and a contour of the five sense organs.

In another possible implementation of the first aspect, the cosmetic feature includes at least one of a cosmetic color, a cosmetic position, and a cosmetic form.

In a second aspect, there is provided an electronic device comprising: a processor; the processor is configured to perform the method according to any of the first aspects above according to instructions in a memory after being coupled to the memory and reading the instructions in the memory.

In a third aspect, there is provided a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of any of the first aspects described above.

In a fourth aspect, embodiments of the present application provide a computer program product which, when run on a computer, causes the computer to perform the method of any one of the above-described first aspect and possible implementations thereof.

It should be appreciated that, the technical solutions of the second aspect to the fourth aspect and the corresponding possible embodiments of the present application may refer to the technical effects of the first aspect and the corresponding possible embodiments, which are not described herein.

Drawings

Fig. 1 is a schematic diagram of a makeup overflow according to an embodiment of the present application;

fig. 2 is a schematic diagram of makeup migration provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device to which the makeup migration method provided in the embodiment of the present application is applicable;

fig. 4 is a schematic flow chart of a makeup migration method provided in an embodiment of the present application;

fig. 5 is a schematic flow chart of a makeup migration method provided in an embodiment of the present application;

fig. 6 is a schematic flow chart of a first training provided in an embodiment of the present application;

FIG. 7 is a schematic flow chart of a second training according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application are described below with reference to the drawings in the embodiments of the present application. In the description of the embodiments of the present application, the terminology used in the embodiments below is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of this application and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include, for example, "one or more" such forms of expression, unless the context clearly indicates to the contrary. It should also be understood that in the various embodiments herein below, "at least one", "one or more" means one or more than two (including two). The term "and/or" is used to describe an association relationship of associated objects, meaning that there may be three relationships; for example, a and/or B may represent: a alone, a and B together, and B alone, wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise. The term "coupled" includes both direct and indirect connections, unless stated otherwise. The terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.

In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

Users often take self-timer or take face photos of other people in daily life, and the face photos taken may be face images which are not made up, but cannot achieve the makeup effect which the users want to achieve, so the users can make up the face images which are not made up through the face beautifying software or the face making up software.

In one implementation, the application of makeup may be based on face recognition of a single image. For example, the beauty software can accurately position the five sense organs by recognizing the facial image, and then the user manually adds the makeup of the five sense organs to make up. In one example, the beauty software may divide the positions of the eyes through face recognition, and the user manually adds eyelashes to the eyes. For another example, the beautifying software may manually perform the lipstick adding operation on the mouth by dividing the position of the mouth.

In this implementation, the user needs to have a certain familiarity with various parameters of the face image (for example, eyebrow size, lipstick color) and needs a certain time to make a tuning, which reduces the user's use experience and takes a long time.

In one implementation, the makeup of the reference image may be migrated into the source image based on the antagonistic neural network. For example, the facial features of the reference image may be individually segmented based on the antagonistic neural network, the makeup of the eyes, eyelashes, nose, lips, etc. may be specifically extracted, and then the makeup of each area may be directly added to the facial features corresponding to the source image. The facial features include eyebrows, eyes, ears, nose and mouth.

In the implementation mode, the facial features of the extracted reference images are separately segmented, and then the respective dressing of the facial features is extracted. If the difference between the facial features in the reference image and the facial features in the source image is large, the facial makeup in the reference image is transferred to the facial features of the source image, which may result in a mismatch phenomenon of the facial makeup caused by the mismatch of the facial features. The source image represents an image which needs to be subjected to makeup, the reference image represents an image with makeup as a reference, and the target image is an image generated after the makeup in the reference image is transferred to the source image. As shown in fig. 1, an image 101 is a reference image, an image 102 is a source image, and the reference image has makeup of long eyelashes, pink lips and the like carried by the facial features, so that the makeup of the reference image needs to be transferred to the facial features of the source image to generate a target image 103. Since the five sense organs of the reference image 101 are larger and wider, for example, the eyes and lips of the reference image 101 are larger than the eyes and lips of the source image 102. If the individual segmentation is adopted, the makeup of the eyelashes and lips of the reference image is directly transferred onto the source image, which can cause the makeup of the eyelashes and lips in the target image 103 to exceed the corresponding range of the five sense organs. As can be seen from fig. 1, the eye lashes in the target image after makeup migration go beyond the eye area to the eyebrows, and the lip makeup goes beyond the lip area, which is called color overflow. Similarly, if the source image is larger and wider than the five sense organs in the reference image, the makeup in the reference image may be insufficient to cover the five sense organs in the source image, thereby representing a loss of makeup. The implementation mode is troublesome to operate, the features are required to be extracted after the five sense organs are segmented in advance in the image, and the phenomenon of mismatch of the makeup can be caused because the five sense organs of the person in the reference image and the source image are not matched.

The embodiment of the application provides a makeup migration method, which can extract the makeup characteristics in a reference image without extracting the specific makeup size. For example, makeup includes color, location, morphology, specific dimensions, etc. While the make-up features include the color, location and morphology of the make-up, the make-up features do not include the specific dimensions of the make-up. And transferring the makeup characteristic of the reference image to the source image, so as to avoid the color overflow phenomenon or the makeup missing phenomenon. Illustratively, the effect of the makeup transference is shown in fig. 2, wherein an image 201 is a reference image, an image 202 is a source image, and an image 203 is a target image. It is necessary to migrate the makeup of the reference image onto the source image and generate the target image. The makeup of the reference image is long eyelashes and pink lips, and eyes and lips of the reference image are larger than eyes and lips of the source image, but the makeup of the generated target image can be well attached to five sense organs of the source image.

In one implementation, the saliency areas of the reference image and the saliency areas in the source image may be extracted based on an image processing model. The salient region represents a region with prominent features in the face image. For example, the eyes, nose, mouth, eyebrows, etc. are located in the face. In one example, the region in which the facial features are located is the saliency region. The image processing model obtains facial features, makeup features, category features and the like of the face according to the information of the saliency area. Further, after removing the category features in the reference image and the source image, the makeup features in the reference image are migrated into the source image. Avoiding color overflow or loss of makeup. In addition, the scheme does not need to partition five sense organs in advance, and only the salient region of the face is needed to be extracted, so that the implementation mode is simpler.

The makeup migration method provided by the embodiment of the application can be applied to electronic equipment such as terminal equipment (such as a mobile phone), a tablet personal computer, a notebook computer, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a handheld computer, a netbook, a personal digital assistant (personal digital assistant, PDA), wearable equipment (such as a smart watch, smart glasses or a smart helmet, etc.), augmented reality (augmented reality, AR) \virtual reality (VR) equipment, smart home equipment, a vehicle-mounted computer, etc., and the embodiment of the application does not limit the method.

Taking the mobile phone 100 as an example of the electronic device, fig. 3 shows a schematic structural diagram of the mobile phone 100.

As shown in fig. 3, the mobile phone 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, a subscriber identity module (subscriber identification module, SIM) card interface 195, and the like.

The sensor module 180 may include a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, and the like.

It should be understood that the structure illustrated in this embodiment is not limited to the specific configuration of the mobile phone 100. In other embodiments of the present application, the handset 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components may be provided. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller may be a neural center and a command center of the mobile phone 100, and is a decision maker for commanding each component of the mobile phone 100 to work in coordination according to the instruction. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

The application processor may have an operating system of the mobile phone 100 installed thereon for managing hardware and software resources of the mobile phone 100. Such as managing and configuring memory, prioritizing system resources, managing file systems, managing drivers, etc. The operating system may also be used to provide an operator interface for a user to interact with the system. Various types of software, such as drivers, applications (apps), etc., may be installed in the operating system. For example, the operating system of the mobile phone 100 may be an Android system, a Linux system, or the like.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive a charging input of a wired charger through the USB interface 130. In some wireless charging embodiments, the charge management module 140 may receive wireless charging input through a wireless charging coil of the cell phone 100. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.

The wireless communication function of the mobile phone 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the handset 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc. applied to the handset 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc. applied to the handset 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, the antenna 1 and the mobile communication module 150 of the handset 100 are coupled, and the antenna 2 and the wireless communication module 160 are coupled, so that the handset 100 can communicate with a network and other devices through wireless communication technology. The wireless communication techniques may include the Global System for Mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a beidou satellite navigation system (beidou navigation satellite system, BDS), a quasi zenith satellite system (quasi-zenith satellite system, QZSS) and/or a satellite based augmentation system (satellite based augmentation systems, SBAS).

The mobile phone 100 implements display functions through a GPU, a display 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the cell phone 100 may include 1 or N display screens 194, N being a positive integer greater than 1. In the embodiment of the present application, the display screen 194 may be used to display images, such as a source image, a reference image, and the like.

The mobile phone 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display 194, an application processor, and the like. In some embodiments, the mobile phone 100 may take a picture of a face through the ISP, the camera 193, the video codec, the GPU, and the application processor and take a face image obtained by the picture as a source image.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, the cell phone 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the handset 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, etc.

Video codecs are used to compress or decompress digital video. The handset 100 may support one or more video codecs. In this way, the mobile phone 100 can play or record video in multiple coding formats, for example: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent cognition of the mobile phone 100 can be realized through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

In some embodiments, the NPU computing processor may run an image generation model to process the source image and the reference image to generate the target image.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capabilities of the handset 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the cellular phone 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data (e.g., audio data, phonebook, etc.) created during use of the handset 100, etc. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

The handset 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The handset 100 may listen to music, or to hands-free calls, through the speaker 170A.

A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When the handset 100 is answering a telephone call or voice message, the voice can be received by placing the receiver 170B close to the human ear.

Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 170C through the mouth, inputting a sound signal to the microphone 170C. The handset 100 may be provided with at least one microphone 170C. In other embodiments, the mobile phone 100 may be provided with two microphones 170C, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the mobile phone 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify the source of sound, implement directional recording, etc.

The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be a USB interface 130 or a 3.5mm open mobile electronic device platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The handset 100 may receive key inputs, generating key signal inputs related to user settings and function control of the handset 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also correspond to different vibration feedback effects by touching different areas of the display screen 194. Different application scenarios (such as time reminding, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195 or removed from the SIM card interface 195 to enable contact and separation with the handset 100. The handset 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support Nano SIM cards, micro SIM cards, and the like. The same SIM card interface 195 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The mobile phone 100 interacts with the network through the SIM card to realize functions such as call and data communication. In some embodiments, handset 100 employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the handset 100 and cannot be separated from the handset 100.

The following describes in detail a makeup migration method provided in the embodiment of the present application, taking a mobile phone as an example of an electronic device, with reference to the accompanying drawings.

Fig. 4 is a schematic flow chart of a makeup migration method provided in an embodiment of the present application. The makeup migration method shown in fig. 4 may include the steps of:

s401, acquiring a source image and a reference image.

The source image is a face image which needs to be made up. The source image may be a face image of a male or female, for example. The source image may be a digital picture or a digital video frame. The face image in the source image does not include a makeup. For example, the source image 102 in fig. 1 has no eyelashes and no lipstick.

The reference image represents a face image with a makeup. The reference image may be a face image of a male or female, for example. The reference image may be a digital picture or a digital video frame. For example, as shown in fig. 1, the reference image 101 includes a makeup, and the makeup of the reference image 101 includes eyelashes and lipstick.

The face image representation image comprises at least one face, and the face in the face image can be in a front direction, a side direction and the like.

The reference image differs from the source image in the category of people. The category of a person represents a category generated by classification based on gender, age, geographic location, morphological characteristics, and the like. The categories of people may include men, women, adults, the elderly, people in different areas. For example, the reference image is male and the source image is female. For another example, the reference image is a person in a plateau region and the source image is a person in a coastal region. For another example, the reference image is a person in a high latitude region, and the source image is a person in a low latitude region. For another example, the reference image is an adult woman and the source image is a non-adult woman.

In some embodiments, the handset 100 may acquire the source and reference images in a conventional manner, e.g., the handset 100 may acquire the source and reference images through a data interface, a network, etc.

S402, extracting a salient region of the source image to obtain salient features of the source image, and extracting a salient region of the reference image to obtain salient features of the reference image.

The salient region represents a region with salient features in the face image, for example, the salient region in the face image may include a region where the five sense organs in the face image are located. For example, the salient region in the face image may be a region where one or more parts of the five sense organs in the face image are located, for example, a region where the mouth is located, a region where the eyes are located, and the like.

The salient features represent features contained in the salient region. In some embodiments, salient features may be represented by a matrix, vector, or numerical value. The salient features include information of the facial features. For example, the salient features include a makeup feature, a five sense organ feature, a category feature, and the like.

The cosmetic features represent features of the face that are present after makeup of the five sense organs. In some embodiments, the cosmetic features include eye shadow features, lip color features, eyelash features, and the like. Eye shadow features may include color, size, contour, shape, etc. of the eye shadow. Eyelash characteristics may include length, degree of curvature, number, etc. of the eyelashes. In some embodiments, the cosmetic feature may be a no-make-up.

The features of the five sense organs include the size of each part of the five sense organs, the position of each part of the five sense organs in the face, the morphology of the five sense organs, the outline of the five sense organs, etc.

Category characteristics represent physiological characteristics (e.g., facial morphology, facial contours) of people that different categories of people have. The category may be determined based on the facial features and other physiological features of the person. By way of example, people may be divided into categories according to different category characteristics. For example, people can be classified into three categories by category characteristics, the first category being characterized by a higher nose bridge, a larger eye, and a more prominent five sense organs. The class characteristics of the second class of people indicate lower nose bridge, smaller eyes and flatter five sense organs. The third category is characterized by a flatter cheekbone, a medium size in the bridge of the nose, a low bridge of the nose, and a shallow eyebrow. For another example, men may have a category characterized by rounded and heavy upper orbital rims, prominent cheekbones, a more forward lower jaw, a more backward forehead, a higher arch, and women may have a category characterized by sharper upper orbital rims, less visible cheekbones, a more rounded lower jaw, a flatter forehead, and a lower arch.

The different people have different types and characteristics due to different five sense organs, and the different types and characteristics are main reasons for overflow or missing of the makeup in the makeup migration process shown in fig. 1.

The first salient feature is a source image salient feature. The second salient features are the salient features of the reference image,

In some embodiments, the source image salient features include facial features, cosmetic features, category features of the source image. The cosmetic feature in the source image may be a make-up free feature. In some embodiments, the reference image salient features include five sense features, a make-up feature, a category feature of the reference image.

S403, removing the category features in the source image salient features to obtain category-removed source image features, and removing the category features in the reference image salient features to obtain category-removed reference image features.

The de-classified source image features represent features that remain after the classified features in the source image salient features are removed. For example, the declassification source image retains the facial features of the source image.

The de-classified reference image features represent features that remain after class features in the reference image salient features are removed. For example, the declassification reference image features preserve facial features and cosmetic features in the reference image.

Removing category characteristics means removing the mismatch of makeup caused by the difference of categories of people in the source image and the reference image. By way of example only, an adult's face pack is to be migrated to an minor's face. After the makeup feature and the category feature of the adult are extracted, the makeup feature is left after the category feature of the minor is extracted, namely, what the makeup essence of the adult is (such as long eyelashes and pink color) is known, and the specific position and the specific size of the makeup of the adult on the five sense organs of the adult are not paid attention to. The makeup migration is performed after the class characteristics of adults and minors are removed, so that the influence of the makeup migration caused by reducing the size of five sense organs can be avoided.

The class characteristics are removed, so that the phenomenon of mismatch of the makeup caused by inconsistent five sense organs in the source image and the reference image during the subsequent makeup migration can be avoided.

S404, generating a target image based on the de-category source image features and the de-category reference image features.

The target image is an image after the makeup of the reference image is migrated to the source image. The target image contains the cosmetic features in the reference image.

There are various implementations of the steps shown in fig. 4, in one implementation, the method shown in fig. 4 described above may be implemented based on a trained image generation model. As shown in fig. 5, the corresponding implementation manners of S401 to S405 are S501 to S505.

The image generation model is a deep learning model, and comprises an extraction sub-model, a mask sub-model, a fusion sub-model and a decoding sub-model. The input of the image generation model is a source image and a reference image, and the output is a target image. Training of the image generation model can be seen in fig. 6, fig. 7 and the associated description.

S501, acquiring a source image and a reference image.

The specific implementation of S501 may refer to S401, and will not be described herein.

S502, the first extraction sub-model extracts the salient region of the source image to obtain the salient features of the source image, and the second extraction sub-model extracts the salient region of the reference image to obtain the salient features of the reference image.

The extraction sub-model is a neural network model, and comprises a first extraction sub-model and a second extraction sub-model. The first extraction model and the second extraction model may be the same model or different models. The first extraction sub-model is used for extracting the salient region of the source image to obtain the salient features of the source image. The second extraction sub-model is used for extracting the salient region of the reference image to obtain the salient features of the reference image. The parameters of the first extraction sub-model may not be consistent with the parameters of the second extraction sub-model. In some embodiments, the extraction sub-model includes more than two residual blocks for extracting the region of significance, and the residual blocks of the extraction sub-model may be residual blocks in a residual neural network. More than two residual blocks are connected by concatenation. The concatenation indicates that the output of the last residual block can be used as the input of the next residual block. In other embodiments, the extraction submodel may include one or more convolution kernels. In other embodiments, the extraction sub-model may also include a graph neural network.

The trained extraction sub-model can be focused on the salient region in the image through training, and the salient region in the image can be extracted. For example only, the trained extraction submodel may perform various operations such as convolution operations on the source image and the reference image based on more than two residual blocks, applying an activation function, etc., to extract the salient regions and output resulting source image salient feature matrices and reference image salient feature matrices. The convolution operation includes a matrix multiplication operation between the input image and the weight matrix.

Illustratively, as shown in fig. 5, the salient features are represented by salient feature matrices, which are multidimensional matrices. The matrix comprises information such as five sense organs characteristics, dressing characteristics, category characteristics and the like. In some embodiments, a row or column in the matrix may be represented as a category feature. For example, the leftmost column of the salient feature matrix 410 in fig. 5 represents class features (represented by dashed lines). In some embodiments, the second extraction sub-model may extract the cosmetic features of the reference image, and the cosmetic features may be represented using a cosmetic feature matrix. In some embodiments, the cosmetic feature matrix is a Gram (Gram) matrix.

S503, removing the category characteristics of the source image by the mask sub-model to obtain the category-removed source image characteristics, and removing the category characteristics of the reference image by the mask sub-model to obtain the category-removed reference image characteristics.

In one implementation, the masking sub-model is a neural network model, which may include a plurality of fully connected layers. The mask submodel is used to perform a masking operation. The masking operation represents selecting sample data and selectively removing some features in the sample data.

The trained mask sub-model can be processed based on the source image salient features and the reference image salient features, category features are identified, category features are removed after mask operation is performed, and category-removed source image features and category-removed reference image features are obtained respectively.

In some embodiments, masking the sub-model to remove the class feature may be removing a portion of the salient feature matrix where the class feature exists. Illustratively, as shown in FIG. 5, the mask submodel removes the category features in the salient feature matrix (left-most column, indicated by the dashed line).

S504, fusing the sub-model based on the category-removed source image features and the category-removed reference image features to obtain fused salient features.

In one implementation, the fusion sub-model is a neural network model. In some embodiments, the fusion submodel may include one or more convolution kernels and full join layers. Illustratively, the convolution kernel has a size of 1×1. In some embodiments, the fusion sub-model may be a fully connected layer. The input of the fusion sub-model is the category-removed source image characteristic and the category-removed reference image characteristic, and the output is the fusion salient characteristic.

In some embodiments, the trained fusion sub-model fuses the facial features in the de-classified source image features and the cosmetic features in the de-classified reference image features to obtain fused salient features. For example, the lipstick color in the reference image in fig. 2 is fused to the position of the lips of the source image, and the eyelashes in the reference image are migrated to the position of the eyes of the source image. The fusion salient features are fused with the facial features in the source image features and the makeup features in the declassified reference image features, so that the makeup in the reference image can be considered to be migrated to the facial features in the source image.

Illustratively, as shown in FIG. 5, the fusion sub-model fuses the de-category source image feature matrix and the de-category reference image feature matrix into one fused salient feature matrix.

S505, the decoding sub-model decodes the fused salient features to generate a target image.

The decoding sub-model is a neural network model, and in some embodiments, may be a decoder in an encoder-decoder architecture in a neural network. The input of the decoding submodel is the fusion salient features, and the output is the target image.

The generated target image keeps the dressing style in the previous process, so that the category characteristics are removed, namely, the influence caused by inconsistent five sense organs of the source image and the reference image is avoided, the fusion sub-model can better transfer the dressing in the reference image to the source image, and the phenomenon of mismatch of the dressing in the target image is avoided, namely, the dressing of the generated target image can adapt to the five sense organs of the source image.

The image generation model is obtained through a first training and a second training, and fig. 6 is a schematic diagram of the first training. Fig. 7 is a second training schematic.

The first training is the training of the extraction sub-model and the mask sub-model. The purpose of the first training is to extract salient regions in the image, identify the locations where the category features are located, and then remove the category features. The image generation model comprises a plurality of submodels, and the extraction submodel and the mask submodel are firstly taken out independently to perform first training so as to optimize the whole training process. And after the first training is finished, the second training is performed, and the second training can be performed by directly using the parameters of the extraction submodel and the mask submodel, so that the condition that the training parameters are too many, the training effect is poor and the training progress is too slow caused by directly training all submodels is avoided.

The first training may include S601-S605.

S601, a first extraction sub-model extracts a salient region of a source image to obtain salient features of the source image, and a second extraction sub-model extracts a salient region of a reference image to obtain salient features of the reference image.

The extraction sub-model includes a first extraction sub-model and a second extraction sub-model. S601 is the same as S502, and will not be described again.

S602, removing the same dimension features in the source image salient features and the reference image salient features by the mask submodel according to a preset rule, and then respectively obtaining the source image salient features and the reference image salient features with the same dimension removed.

The preset rule represents a logic rule according to which the same dimension feature is removed when the source image salient feature and the reference image salient feature are removed. As an example, the same dimension in the source image salient feature matrix and the reference image salient feature matrix may be the same column or row region in the two matrices. For example, the preset rule may be to sequentially remove the same column or row of each of the plurality of matrices corresponding to the source image salient features and the reference image salient features, and as an example, as shown in fig. 5, S502 removes the leftmost column of the two matrices. For another example, the preset rule may be to sequentially remove the same one of the plurality of matrices corresponding to the salient features of the source image and the salient features of the reference image. For another example, the preset rule may be to remove a central region in a plurality of matrices corresponding to the salient features of the source image and the salient features of the reference image.

And S603, judging whether the people corresponding to the source image salient features and the reference image salient features after the same dimension is removed belong to the same category or not by the neural network.

And judging the neural network as the trained neural network. In some embodiments, the discriminating neural network may be a two-class neural network, where the input of the discriminating neural network is the source image salient feature and the reference image salient feature after the same dimension is removed, and the output is the same class or different classes, and the discriminating neural network may determine whether the person corresponding to the source image salient feature and the reference image salient feature after the same dimension is removed belongs to the same class.

Since the mask submodel is the same dimensional feature among the salient features removed according to the preset rule, the user does not know what content in the image is specifically referred to by the removed same dimensional feature. Therefore, whether the people corresponding to the source image salient features and the reference image salient features obtained after the same dimension features are removed belong to the same category or not is judged through judging the neural network, and the removed same dimension features can be further judged to belong to the category features. For example, the category features are located in the leftmost column in the plurality of salient feature matrices, the discrimination neural network discriminates that the people corresponding to the source image salient features and the reference image salient features after removing the same dimension features belong to the same category after removing the leftmost column, and the discrimination neural network discriminates that the people corresponding to the source image salient features and the reference image salient features after removing the same dimension features do not belong to the same category.

If the discriminating neural network determines that the source image salient feature and the reference image salient feature, from which the same dimensional feature is removed, belong to the same category, S604 is performed. If the source image features and the reference image features of the neural network after removing the same dimension features are not in the same category, S605 is executed.

S604, fixing the current first extraction sub-model, the second extraction sub-model and the mask sub-model parameters, and completing the first training.

If the distinguishing neural network judges that the people corresponding to the source image salient features and the reference image salient features after the same dimension features are removed belong to the same category, fixing the parameters of the extraction sub-model and the mask sub-model, and completing the first training. The extraction sub-model and mask sub-model parameters at this time can be considered to extract the salient region in the image to obtain salient features, and the category features are removed from the salient features at the positions where the category features are identified, i.e., S502-S503 can be completed.

And S605, continuously adjusting the parameters of the first extraction submodel, the second extraction submodel and the mask submodel, and repeatedly executing S601-S605.

And if the neural network judges that the people corresponding to the source image salient features and the reference image salient features after the same dimension features are removed do not belong to the same category, continuing to adjust the parameters of the extraction submodel and the mask submodel, and repeatedly executing S601-S605. For example, the extraction sub-model and the mask sub-model continuously adjust parameters of the extraction sub-model and the mask sub-model by gradient descent. As an example, the mask sub-model removes the leftmost column in the matrix corresponding to the source image salient features and the reference image salient features, if it is determined that the leftmost column is not the category feature, the parameters of the extraction sub-model and the mask sub-model are continuously adjusted according to the preset rule to remove the left second column, and step S603 is continuously performed after removing the left second column.

The second training is training of the image generation model, and the second training may include S701-S704:

s701, inputting a plurality of sample source images and sample reference images into an image generation model to obtain an output image.

The sample source image and the sample reference image are representative of inputs as training samples. The output image represents an output image of the image generation model before training is completed.

S702, determining a contrast loss function value; determining a dressing loss function value; a perceptual loss function value is determined.

In some embodiments, the output image and the label image may be passed through a arbiter in the antagonistic neural network and the antagonistic loss function value may be constructed based on the authenticity of the arbiter. The label image is a real image after the artificially set dressing feature is migrated, and is used as a label during training. The fight loss function includes a first pair of fight losses and a second fight loss. In some embodiments, a first contrast loss training arbiter may be constructed based on the tag image and the output image, and after training is completed, a second contrast loss may be constructed based on the tag image and the output image to train the image generation model. In some embodiments, the contrast loss function value may be determined based on the constructed contrast loss function. The contrast loss function value may represent a magnitude of a degree of realism of the output image. The larger the contrast loss function value is, the smaller the degree of realism of the image is, and the smaller the contrast loss function value is, the larger the degree of realism of the image is.

The make-up loss function value reflects the difference between the make-up feature matrix of the reference image and the tag make-up feature matrix. The tag cosmetic feature matrix is represented as a cosmetic feature matrix in the tag image. In some embodiments, the look-up feature matrix in the label image may be extracted as a label look-up feature matrix. The makeup feature matrix in the reference image is extracted based on the second extraction submodel. The make-up loss function may be constructed based on the difference between the make-up feature matrix of the reference image and the tag make-up feature matrix. In some embodiments, the make-up loss function value may be determined based on the constructed make-up loss function.

The perceptual loss function value reflects the difference between the output image and the label image. The perceptual loss function may be constructed based on differences between the reference image and the label image, and in some embodiments, the perceptual loss function value may be determined based on the constructed perceptual loss function.

S703, adjusting parameters of the image generation model so that the sum of the contrast loss function value, the make-up loss function value, and the perception loss function value is minimized.

During training, parameters of the image generation model may be adjusted to target minimizing the total loss function value. Total loss function value = challenge loss function value + make-up loss function value + perception loss function value. In some embodiments, the parameters of the updated image generation model may be adjusted by gradient descent. For example, the total loss function value may be minimized by adjusting parameters of the individual sub-models in the image generation model.

And S704, after the adjustment is finished, fixing parameters of the image generation model.

Training is ended when the parameters of the adjusted image generation model minimize the total loss function. And taking the image generation model at the moment as the image generation model after training is completed.

It will be appreciated that, in order to achieve the above-mentioned functions, the electronic device includes corresponding hardware structures and/or software modules for performing the respective functions. Those of skill in the art will readily appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The embodiment of the application may divide the functional modules of the electronic device according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

In case of an integrated unit, fig. 8 shows a schematic diagram of one possible structure of the electronic device involved in the above-described embodiment. The electronic device 800 includes: a processing unit 801, a storage unit 802, and a display unit 803.

The processing unit 801 is configured to control and manage an operation of the electronic device 800. For example, it may be used to perform the processing steps of S401, S402, S403, and S404 in fig. 4; and/or other processes for the techniques described herein.

A storage unit 802 for storing instructions and data of the electronic device 800, and the processing unit 801 calls the program code stored in the storage unit 802 to perform the steps in the above method embodiments. For example, the above instructions may be used to perform various steps as in FIG. 4 and corresponding embodiments. The data may include source images, reference images, target images, and the like.

A display unit 803 for displaying an image or video of the electronic device 800. For example, the display unit 803 may be used for a source image, a reference image, a target image.

Of course, the unit modules in the above-described electronic device 800 include, but are not limited to, the above-described processing unit 801 and storage unit 802. For example, a communication unit, a power supply unit, and the like may also be included in the electronic device 800. A communication unit for supporting communication of the electronic device 800 with other network entities; for example, it may be used to support the electronic device 800 in communication with a server, query a server for saved source or reference images, etc. The power supply unit is used to power the electronic device 800.

The processing unit 801 may be a processor or controller, such as a central processing unit (central processing unit, CPU), a digital signal processor (digital signal processor, DSP), an application-specific integrated circuit (application-specific integrated circuit, ASIC), a field programmable gate array (field programmable gate array, FPGA) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. The memory unit 802 may be a memory. The communication unit may be a transceiver, a transceiving circuit, etc.

For example, processing unit 801 is a processor (e.g., processor 110 shown in FIG. 3); the storage unit 802 may be a memory (such as the internal memory 121 shown in fig. 3); the communication unit may be referred to as a communication interface, including a mobile communication module (e.g., mobile communication module 150 shown in fig. 3) and a wireless communication module (e.g., wireless communication module 160 shown in fig. 3). The electronic device 800 provided in the embodiment of the present application may be the mobile phone 100 shown in fig. 3. Wherein the processors, memories, communication interfaces, etc. may be coupled together, such as by a bus. The processor invokes the memory-stored program code to perform the steps in the method embodiments above.

The present application also provides a computer-readable storage medium having stored therein computer program code which, when executed by the above-mentioned processor, causes the electronic device to perform the method of the above-mentioned embodiments.

The present application also provides a computer program product which, when run on a computer, causes the computer to perform the method of the above embodiments.

The electronic device 800, the computer readable storage medium, or the computer program product provided in the embodiments of the present application are configured to perform the corresponding methods provided above, and therefore, the advantages achieved by the method may refer to the advantages in the corresponding methods provided above, which are not described herein.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units described above may be implemented either in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a magnetic disk or an optical disk.

The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A cosmetic migration method, characterized by comprising:

acquiring a source image and a reference image, wherein the source image and the reference image respectively comprise a face image;

extracting the features of the salient region of the source image to obtain first salient features, and extracting the features of the salient region of the reference image to obtain second salient features; the salient features comprise five sense organs features, category features and makeup features; the salient region represents a region with prominent characteristics in the face image; a salient feature represents a feature contained in a salient region, the salient feature being represented by a matrix, vector, or value; the salient features are multi-dimensional matrices, one row or one column of which is the category feature; the category characteristics represent physiological characteristics of people of different categories; wherein the different categories of people are classified based on gender, age, geographic location, and morphological characteristics, including men, women, adults, elderly people, people in plateau areas, people in coastal areas, people in high latitude areas, and people in low latitude areas; the dressing features comprise the color, the position and the shape of the dressing, and do not comprise the size of the dressing;

Removing the category features in the first significant features to obtain category-removed source image features, and removing the category features in the second significant features to obtain category-removed reference image features;

and generating a target image based on the classified source image characteristics and the classified reference image characteristics, wherein the target image is an image after the makeup of the reference image is migrated to the source image, and the target image contains the makeup characteristics in the reference image.

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the extracting the features of the salient region of the source image to obtain a first salient feature comprises the following steps:

inputting the source image into a first extraction sub-model, and outputting the first salient features; wherein the first extraction sub-model has the ability to extract features of a salient region of the source image;

the extracting the features of the salient region of the reference image to obtain a second salient feature comprises the following steps:

inputting the reference image into a second extraction sub-model, and outputting the second salient features; wherein the second extraction sub-model has the ability to extract features of a salient region of the reference image.

3. The method of claim 1, wherein said removing the category features in the first salient feature results in a de-category source image feature and removing the category features in the second salient feature results in a de-category reference image feature, comprising:

Inputting the first salient features into a mask submodel, and outputting the image features of which the types are removed;

inputting the second salient features into a mask submodel, and outputting the de-classified reference image features;

the masking submodel has the ability to remove class features from salient features.

4. The method of claim 1, wherein the generating a target image based on the de-classified source image features and the de-classified reference image features comprises:

inputting the de-classification source image features and the de-classification reference image features into a fusion sub-model, and outputting fusion salient features; the fusion sub-model has the capability of fusing the cosmetic features in the declassification reference image features to the five-sense organ features in the declassification source image features;

and inputting the fused salient features into a decoding submodel, and outputting the target image.

5. The method of any one of claims 1-4, wherein the facial features comprise at least one of facial feature size, facial location, facial morphology, facial contours.

6. An electronic device, the electronic device comprising: a processor and a memory; the memory is coupled with the processor; the memory is used for storing computer program codes; the computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the method of any of claims 1-5.

7. A computer readable storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the method of any of claims 1-5.