WO2022161234A1 - 图像处理方法及装置、电子设备、存储介质 - Google Patents

图像处理方法及装置、电子设备、存储介质 Download PDF

Info

Publication number
WO2022161234A1
WO2022161234A1 PCT/CN2022/072892 CN2022072892W WO2022161234A1 WO 2022161234 A1 WO2022161234 A1 WO 2022161234A1 CN 2022072892 W CN2022072892 W CN 2022072892W WO 2022161234 A1 WO2022161234 A1 WO 2022161234A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
model
clothing
target
Prior art date
Application number
PCT/CN2022/072892
Other languages
English (en)
French (fr)
Inventor
宋奕兵
葛玉莹
刘威
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2023522881A priority Critical patent/JP2023545189A/ja
Publication of WO2022161234A1 publication Critical patent/WO2022161234A1/zh
Priority to US18/051,408 priority patent/US20230077356A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present application relates to the technical field of image processing, and in particular, to an image processing method and apparatus, an electronic device, and a computer-readable storage medium.
  • Virtual dress-up technology refers to the fusion of human body images and clothing images through technical means to obtain images after users put on clothes, so as to understand the effect of users wearing clothes without the need for users to wear real clothes.
  • Virtual dress-up technology is widely used in online shopping, clothing display, clothing design or virtual try-on of offline shopping.
  • An ideal virtual dress-up dataset should contain images of a specified person wearing any clothing, images containing the target clothing, and images containing the specified person wearing the target clothing, but because the same person keeps the exact same action wearing two different clothing images. It is difficult to obtain, so that the currently used virtual dress-up dataset only contains images of the specified person wearing the target clothing, and the human body analysis results need to be used to clear the target clothing area of the specified person, and then use the image containing the target clothing to reconstruct the human image.
  • the embodiments of the present application provide an image processing method and device, electronic equipment, and a computer-readable storage medium, which do not need to rely on the results of human body analysis to perform virtual dressing, thereby avoiding relying on the results of human body analysis.
  • the efficiency of virtual dressing is improved, and real-time virtual dressing is realized.
  • an image processing method the method being executed by a computer device, including: acquiring a first image including a target person and a second image including a target clothing; according to the first image
  • the image features of the second image and the image features of the second image are used to generate a target appearance flow feature that is used to characterize the deformation of the target clothing to adapt to the human body of the target person, and generate the target appearance flow feature based on the target appearance flow feature.
  • the clothing is adapted to the deformed image of the human body; a virtual dress-up image is generated according to the fusion of the deformed image and the first image, and in the virtual dress-up image, the target person wears clothes suitable for the human body target clothing.
  • an image processing apparatus including: an image acquisition module configured to acquire a first image containing a target person and a second image containing a target clothing; an information generation module configured to According to the image features of the first image and the image features of the second image, a target appearance flow feature for representing the deformation of the target clothing adapted to the human body of the target person is generated, and based on the target appearance
  • the flow feature generates a deformed image of the target clothing adapted to the human body;
  • a virtual dress-up module is configured to generate a virtual dress-up image based on the fusion of the deformed image and the first image, and in the virtual dress-up image wherein, the target person wears target clothing adapted to the human body.
  • an electronic device including a processor and a memory, where computer-readable instructions are stored in the memory, and the computer-readable instructions are executed by the processor to achieve the above-mentioned image processing method.
  • a computer-readable storage medium on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor of a computer, the computer is made to execute the above-mentioned image processing method.
  • a computer program product or computer program where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image processing methods provided in the various optional embodiments described above.
  • FIG. 1 is a schematic diagram of an implementation environment involved in the present application.
  • FIG. 2 is a schematic structural diagram of a virtual dress-up student model shown in an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of the first clothing deformation sub-model 11 shown in FIG. 2 in an embodiment
  • Fig. 4 is a schematic flowchart of the appearance flow feature prediction performed by the "FN-2" module shown in Fig. 3 in the second image feature layer;
  • FIG. 5 is a flowchart of an image processing method shown in another embodiment of the present application.
  • FIG. 6 is a schematic diagram of the training process of the virtual dress-up student model shown in an embodiment of the present application.
  • FIG. 7 is a block diagram of an image processing apparatus according to an embodiment of the present application.
  • FIG. 8 shows a schematic structural diagram of a computer system suitable for implementing the electronic device according to the embodiment of the present application.
  • first, second, etc. used in this application may be used herein to describe various concepts, but these concepts are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first image could be referred to as a second image, and, similarly, a second image could be referred to as a first image, without departing from the scope of this application.
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology.
  • the basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technologies include natural language processing technology and machine learning.
  • Machine Learning is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance.
  • Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other techniques.
  • Computer Vision Technology (Computer Vision, CV) Computer vision is a science that studies how to make machines "see”. Further, it refers to the use of cameras and computers instead of human eyes to identify, track and measure targets. Machine vision, And further do graphics processing, so that computer processing becomes more suitable for human eye observation or transmission to the instrument detection image. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multidimensional data. Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, virtual reality, augmented reality, simultaneous localization and map construction and other technologies. Including common face recognition, fingerprint recognition and other biometric identification technologies.
  • the embodiment of the present application provides an image processing method, the execution subject is a computer device, and the human body image and the clothing image can be fused.
  • the computer device is a terminal, and the terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted computer, or the like.
  • the computer device is a server
  • the server may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, wherein multiple servers may form a blockchain
  • the server is a node on the blockchain, and the server can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, and big data and Cloud servers for basic cloud computing services such as artificial intelligence platforms.
  • the image processing method provided by the embodiment of the present application can be applied to any scene in which a human body image and a clothing image are fused.
  • a human body image and a clothing image are fused.
  • the human body image and the clothing image are processed to obtain the human body image of the user wearing the clothing, thereby realizing online virtual dressing without the user wearing real clothing.
  • the image processing method provided by the embodiments of the present application can also be applied to scenarios such as clothing design, clothing display, or virtual try-on of offline shopping, so as to provide a real-time virtual dress-up function, which will not be carried out here one by one. enumerate.
  • FIG. 1 is a flowchart of an image processing method according to an embodiment of the present application.
  • the image processing method includes at least S110 to S150, and the S110 to S150 may specifically be implemented as a virtual dress-up student model.
  • the virtual dress-up student model is an artificial intelligence model, which can realize the virtual dress-up of the target person without relying on the results of human body analysis. It can not only generate high-quality virtual dress-up images, but also improve the real-time performance of virtual dress-up. .
  • S110 Acquire a first image including the target person and a second image including the target clothing.
  • the target character mentioned in this embodiment refers to the character to be virtually dressed, and the target clothing refers to the clothing that the target person wants to wear.
  • the target person is the user who is currently shopping online
  • the first image is the user's own body image provided by the user
  • the second image may be the target clothing loaded on the shopping platform. picture.
  • the target person contained in the first image and the target clothing contained in the second image may be determined according to actual application scenarios, which are not limited herein.
  • S130 according to the image features of the first image and the image features of the second image, generate a target appearance flow feature that is used to characterize the deformation of the target clothing adapted to the human body of the target person, and generate the target clothing suitable for the human body based on the target appearance flow features.
  • the deformed image according to the image features of the first image and the image features of the second image, generate a target appearance flow feature that is used to characterize the deformation of the target clothing adapted to the human body of the target person, and generate the target clothing suitable for the human body based on the target appearance flow features.
  • the image features of the first image are obtained by performing image feature extraction on the first image
  • the image features of the second image are obtained by performing image feature extraction on the second image.
  • the first image may be input into the first image feature extraction model
  • the second image may be input into the second image feature extraction model (that is, the first image is used as the input signal of the first image feature extraction model, and using the second image as the input signal of the second image feature extraction model)
  • the first image feature extraction model and the second image feature extraction model are all configured with image feature extraction algorithms, thus obtaining the first image feature extraction model for the first image feature extraction model.
  • An image feature output from an image, and an image feature output by a second image feature extraction model for the second image is obtained.
  • the image features of the first image output by the first image feature extraction model and the image features of the second image output by the second image feature extraction model may be multi-layer image features, where the multi-layer image features refer to A plurality of feature maps obtained sequentially in the process of performing image feature extraction on the second image.
  • the first image feature extraction model and the second image feature extraction model may be a pyramid feature extraction model
  • the pyramid feature extraction model is configured with a feature pyramid network (Feature Pyramid Networks, FPN), and the feature map output by the feature pyramid network.
  • FPN Feature Pyramid Networks
  • Pyramid is also the multi-layer image feature corresponding to the image.
  • the bottom-up part of the pyramid feature extraction model can be used to perform image feature extraction on the first image and the second image, and the bottom-up part is understood as using a convolutional network to perform image feature extraction Extraction, with the deepening of the convolution, the spatial resolution of the image is less, and the spatial information is lost, but the high-level semantic information is enriched, so that the size of the feature map is sorted into multi-layer image features from large to small.
  • the Appearance Flow feature refers to a two-dimensional coordinate vector, which is usually used to indicate which pixel of the source image can be used to reconstruct the specified pixel of the target image.
  • the target character needs to be constructed. Therefore, in this embodiment, the source image refers to the second image, which may specifically refer to the target in the second image.
  • the target image to be reconstructed refers to the deformed image generated by the target clothing being adapted to the human body of the target person in the first image.
  • the target appearance flow feature can represent the deformation of the target clothing adapted to the human body of the target person in the first image.
  • the image features of the first image are multi-layer image features output via the first image feature extraction model
  • the image features of the second image are multi-layer image features output via the second image feature extraction model
  • the first image The feature extraction model and the multi-layer image features output by the second image feature extraction, extract the appearance flow features layer by layer, and use the appearance flow features extracted for the last image feature layer as the final generated target appearance flow features.
  • the appearance flow features used to characterize the deformation of the target clothing to adapt to the target person's body can be extracted.
  • the appearance flow feature corresponding to the previous image feature layer is optimized according to the image features output by the first image feature extraction model and the second image feature extraction model. The appearance flow feature corresponding to the current image feature layer can be obtained.
  • the appearance flow features can also be extracted according to preset second-order smoothing constraints.
  • the second-order smooth constraint is a constraint set for the linear correspondence between adjacent appearance streams, so as to further retain the characteristics of the target clothing pattern, stripes, etc., thereby improving the generated target clothing to adapt to the human body of the target person.
  • the image quality of the warped image is a constraint set for the linear correspondence between adjacent appearance streams, so as to further retain the characteristics of the target clothing pattern, stripes, etc., thereby improving the generated target clothing to adapt to the human body of the target person.
  • a virtual dress-up image is generated by fusing the deformed image and the first image where the target clothing is adapted to the human body of the target person.
  • the target person wears the target clothing suitable for the human body.
  • the image of the target clothing can be fused to generate a virtual dress-up image. This is restricted.
  • the technical solution provided in the embodiment does not need to rely on the results of human body analysis to perform virtual dressing, but generates the target clothing by obtaining the target appearance flow feature that the target clothing adapts to the deformation of the target person's body.
  • Adapt to the deformation of the human body and finally fuse the image of the deformed target clothing (such as the deformed image) with the first image containing the target person to obtain a virtual dress-up image, thereby avoiding the occurrence of virtual dress-up relying on the results of human body analysis.
  • the resulting problems such as low image quality of virtual dress-up and weak real-time performance of virtual dress-up can realize high-quality virtual dress-up.
  • the efficiency of virtual dressing is improved, and real-time virtual dressing is realized.
  • FIG. 2 is a schematic structural diagram of a virtual dress-up student model according to an embodiment of the present application.
  • the exemplary virtual dress-up student model 10 includes a first clothing deformation sub-model 11 and a first dress-up generation sub-model 12, wherein the first clothing deformation sub-model 11 can perform S130 in the embodiment shown in FIG. 1, the first The dress-up generation sub-model 12 may execute S150 in the embodiment shown in FIG. 1 .
  • the virtual dress-up student model 10 can output the corresponding virtual dress-up image, In the output virtual dress-up image, the target person wears target clothing adapted to the human body.
  • the virtual dress-up student model 10 does not need other additional input signals, and there is no need to input the human body analysis result of the target person contained in the first image into the virtual dress-up student model 10 .
  • FIG. 3 is a schematic structural diagram of the first clothing deformation sub-model 11 shown in FIG. 2 in an embodiment.
  • the first clothing deformation sub-model 11 includes a first image feature extraction model, a second image feature extraction model and an appearance flow feature prediction model.
  • the first image feature extraction model is used to extract image features of the first image
  • the second image feature extraction model is used to extract image features of the second image.
  • the first image feature extraction model extracts image features from the first image, and sequentially obtains the multi-layer image features shown in c1 to c3.
  • the second image feature extraction model extracts image features from the second image, and sequentially The multi-layer image features shown in p1-p3 are obtained.
  • the multi-layer image features shown in FIG. 3 are only examples, and the number of layers of the image features of the input image extracted by the first image feature extraction model and the second image feature extraction model can be set according to actual needs. The example does not limit this.
  • the appearance flow feature prediction model is used to extract the appearance flow features layer by layer according to the multi-layer image features output by the first image feature extraction model and the second image feature extraction model, and extract the appearance flow features for the last image feature layer. as the final generated target appearance flow features.
  • the "FN-1" module shown in Figure 3 is used to predict appearance flow features in the first image feature layer
  • the "FN-2” module is used to predict appearance flow features in the second image feature layer
  • "FN -3" module for appearance flow feature prediction at the third image feature layer. That is, the appearance flow feature prediction model is a progressive appearance flow feature prediction model.
  • the appearance flow feature prediction model is in the first image feature layer according to the image features output by the first image feature extraction model and the second image feature extraction model, and the first extraction is used to characterize that the target clothing is suitable for the target person
  • the appearance flow characteristics of the deformation generated by the human body In each image feature layer after the first image feature layer, the appearance flow feature prediction model is based on the image features output by the first image feature extraction model and the second image feature extraction model. The features are optimized to obtain the appearance flow features corresponding to the current image feature layer.
  • the feature information contained in the appearance flow features obtained by the layer is also more and more abundant and accurate. For example, in the appearance flow features f1 to f3 shown in Figure 3, the feature information contained is gradually enriched and gradually adapted to the target person's body. .
  • the appearance flow feature obtained by the appearance flow feature prediction model in the last image feature layer can very accurately reflect the deformation caused by the target clothing adapting to the target person's body, and the appearance flow feature prediction model based on the appearance flow feature prediction model in the last image.
  • the deformed image corresponding to the target clothing generated by the appearance flow features obtained by the feature layer can establish an accurate and close corresponding relationship with the target person's body, so that the follow-up can be carried out according to the accurate deformation of the target clothing and the target person's body. Fusion to get high quality virtual dressup images.
  • FIG. 4 is a schematic flowchart of the appearance flow feature prediction performed by the “FN-2” module shown in FIG. 3 in the second image feature layer.
  • firstly perform up-sampling processing on the appearance flow feature f1 corresponding to the previous image feature layer to obtain the up-sampling feature f1', and then according to the up-sampling feature f1', the image of the second image corresponding to the current feature layer is The feature c2 is subjected to the first deformation process to obtain the first deformed feature c2'.
  • upsampling the appearance flow feature output by the previous image feature layer is beneficial to improve the resolution of the appearance flow feature of the current image feature layer.
  • Subsequent two deformation processing and two convolution calculations can further refine the feature information contained in the upsampling feature, which is equivalent to adding the appearance flow feature based on the appearance flow feature output by the previous image feature layer.
  • the spatial information is used to optimize the appearance flow features output by the previous image feature layer, and the appearance flow features that can further reflect the target clothing's adaptation to the deformation of the target person's body are obtained.
  • the appearance flow feature prediction model is in the process of extracting appearance flow features layer by layer according to the multi-layer image features output by the first image feature extraction model and the second image feature extraction model. , and also extract appearance flow features according to the second-order smooth constraints preset for the linear correspondence between adjacent appearance flows, so as to further retain the pattern, stripes and other features of the target clothing.
  • FIG. 5 is a flowchart of an image processing method according to another embodiment of the present application. As shown in FIG. 5 , on the basis of the embodiment shown in FIG. 1 , the method further includes S210 to S250 , which are described in detail as follows:
  • S210 call the virtual dress-up teaching assistant model, input the human body analysis result corresponding to the character image containing the designated person, and the first clothing image containing the clothes to be changed into the virtual dress-up teaching assistant model, and obtain the teaching assistant image output by the virtual dress-up teaching assistant model , in the teaching assistant image, the designated person wears the clothing to be replaced which is suitable for the human body of the designated person.
  • this embodiment discloses a process of training the virtual dress-up student model shown in FIG. 2 .
  • the virtual dress-up assistant model needs to be called for auxiliary training.
  • the virtual dress-up assistant model is an artificial intelligence model that relies on the results of human body analysis.
  • the human body analysis result corresponding to the character image containing the designated person and the first clothing image containing the clothing to be replaced, and the virtual dress-up teaching assistant model can output the corresponding teaching assistant image.
  • the designated person wears the clothing to be replaced which is adapted to the human body of the designated person.
  • the virtual dress-up data set is an image data set composed of a person image including a specified person, a first clothing image including the clothing to be replaced, and a second clothing image including the original clothing worn by the specified person.
  • the number of the person image, the first clothing image, and the second clothing image may be multiple, and the designated person contained in different person images may be the same or different, which is not limited in this embodiment.
  • this embodiment uses virtual dress-up A teaching assistant model is installed to guide the virtual dress-up student model for training.
  • the virtual dress-up student model is trained by means of knowledge distillation.
  • knowledge distillation refers to using the intrinsic information of the teacher network to train the student network.
  • the teacher network is a virtual dress-up teaching assistant model
  • the internal information of the teacher network refers to the virtual dress-up teaching assistant model extracted according to the results of human body analysis. Feature representation and semantic information.
  • the trained virtual dress-up student model has fully learned the accurate and dense correspondence between the human body and clothing, so in practical applications, it is not necessary to obtain the human body analysis results of the target person, and the virtual dress-up student model can still be based on the input.
  • the first image containing the target person and the second image containing the target clothing output a high-quality virtual dress-up image.
  • the teaching assistant image output by the virtual dress-up teaching assistant model is input into the virtual dress-up student model to be trained as teaching assistant knowledge, and the second dress image containing the original dress is input into the virtual dress-up student to be trained.
  • the virtual dress-up student model to be trained is made to output student images.
  • the designated person in the student image wears original clothing adapted to the human body of the designated person in the teaching assistant image.
  • the person image is used as the teacher image to supervise the training process of the virtual dress-up student model. That is to say, the virtual dress-up student model can be directly supervised by the teacher's image during the training process, which is conducive to improving the performance of the virtual dress-up student model, so that the final training virtual dress-up student model can be used in practical applications.
  • a high-quality virtual dress-up image can be output according to the first image and the second image in the input.
  • the image loss information between the student image and the teacher image may be obtained by calculating the loss function value of the student image and the teacher image.
  • the image loss value of the student image relative to the teacher image may be obtained, and the image loss value may include at least one of a pixel distance loss function value, a perceptual loss function value, and an adversarial loss function value, and then sum the image loss values.
  • Calculate the image loss and value of the student image relative to the teacher image and finally use the image loss and value as the image loss information between the student image and the teacher image, and update the parameters of the virtual dress-up student model to be trained, thus completing a virtual Dress up the training of the student model.
  • the model performance of the virtual dress-up student model is gradually improved by training the virtual dress-up student model to be trained for many times.
  • image loss information between the student image and the teacher image is less than or equal to the preset image loss threshold, it means After the virtual dress-up student model has reached a better model performance, the training process of the virtual dress-up student model can be ended.
  • the human body parsing results can include human body key points, human body pose heatmaps, dense pose estimation and other information.
  • the virtual dress-up teaching assistant model can extract richer semantics based on the human body parsing results. Therefore, the image quality of the teaching assistant image output by the virtual dress-up teaching assistant model should be higher than that of the student image output by the virtual dress-up student model.
  • the image quality difference between the teaching assistant image and the student image before S250, if it is determined that the image quality difference is positive, it means that the image quality of the teaching assistant image is greater than the image quality of the student image, and then execute S250, to train a virtual dress-up student model based on the teaching assistant image. If it is judged that the image quality difference is a negative value or zero, it means that the image quality of the teaching assistant image is not greater than that of the student image, and the human body analysis result input into the virtual dress-up assistant model may be completely wrong, so the execution of S250 is terminated, Enter the training process of the next round of virtual dress-up student models.
  • FIG. 6 is a schematic diagram of a training flow of a virtual dress-up student model according to an embodiment of the present application.
  • the virtual dress-up teaching assistant model 20 is used as an auxiliary model for training the virtual dress-up student model 10, and the virtual dress-up teaching assistant model 20 is based on the first clothing image inputted into it and the image of the person (that is, the teacher image) of the human body analysis result obtained by performing human body analysis, and output the corresponding teaching assistant image.
  • the teaching assistant image and the second clothing image output by the virtual dress-up teaching assistant model 20 are input into the virtual dress-up student model 10 to obtain the student image output by the virtual dress-up student model 10 .
  • the parameters of the virtual dress-up student model 10 can be updated.
  • the virtual dress-up teaching assistant model 20 includes a second clothing deformation sub-model 21 and a second clothing generation sub-model 22.
  • the clothing is adapted to the deformed image of the human body of the specified person.
  • a teaching assistant can be generated by merging the deformed image corresponding to the clothing to be replaced output by the second clothing deformation sub-model, as well as other image areas in the character image except the area where the original clothing is worn. image.
  • first clothing deformation sub-model contained in the virtual dress-up student model and the second clothing deformation sub-model contained in the virtual dress-up teaching assistant model may have the same network structure, such as the one shown in FIG. 3 . network structure.
  • the first dress-up generation sub-model contained in the virtual dress-up student model and the second dress-up generation sub-model contained in the virtual dress-up assistant model may also have the same network structure, for example, the first dress-up generation sub-model and
  • the second replacement generation sub-model can be composed of an encoder-decoder network and a residual network, and the residual network is used to normalize the upper-layer network it is connected to, thereby facilitating the parameterization during the model training process. Optimized processing.
  • this application uses a novel "teacher-teacher-student" knowledge distillation mechanism to train a virtual dress-up student model that does not depend on the results of human analysis.
  • the virtual dress-up student model is affected by the teacher during the training process. Image supervision enables the final trained virtual dress-up student model to generate high-fidelity virtual dress-up results without relying on human body analysis results, and achieve high-quality virtual dress-up without relying on human body analysis results.
  • FIG. 7 is a block diagram of an image processing apparatus according to an embodiment of the present application. As shown in FIG. 7, in an exemplary embodiment, the image processing apparatus includes:
  • the image acquisition module 310 is configured to acquire a first image containing the target person and a second image containing the target clothing; the information generation module 330 is configured to generate a method based on the image features of the first image and the image features of the second image. It is used to characterize the target appearance flow feature that the target clothing is adapted to the deformation of the human body of the target person, and based on the target appearance flow feature, a deformed image of the target clothing suitable for the human body is generated; the virtual dressing module 350 is configured to An image is fused to generate a virtual dress-up image. In the virtual dress-up image, the target person wears target clothes suitable for the human body.
  • the information generation module 330 includes:
  • the multi-layer image feature acquisition unit is configured to use the first image as the input signal of the first image feature extraction model, and use the second image as the input signal of the second image feature extraction model, through the first image feature extraction model and the second image feature extraction model.
  • the image feature extraction model extracts the multi-layer image features corresponding to the input signal respectively;
  • the appearance flow feature extraction unit is configured to perform the appearance flow feature layer by layer according to the multi-layer image features output by the first image feature extraction model and the second image feature extraction model
  • the appearance flow feature extracted from the last image feature layer is used as the target appearance flow feature.
  • the appearance flow feature extraction unit includes:
  • the first feature extraction subunit is configured to extract the image features output by the first image feature extraction model and the second image feature extraction model at the first image feature layer to extract the deformation used to characterize the target clothing adapting to the human body of the target person
  • the appearance flow feature of the The appearance flow feature corresponding to the feature layer is optimized to obtain the appearance flow feature corresponding to the current image feature layer.
  • the second feature extraction subunit includes:
  • the first deformation processing subunit is configured to perform upsampling processing according to the appearance flow feature corresponding to the previous image feature layer to obtain the upsampling feature, and perform a first deformation on the image feature of the second image corresponding to the current image feature layer according to the upsampling feature processing to obtain the first deformed feature;
  • the correction processing subunit is configured to perform correction processing on the first deformed feature based on the image feature of the first image corresponding to the current image feature layer, and perform a first correction on the corrected feature obtained by the correction processing.
  • the second deformation processing obtains the second deformed feature
  • the appearance flow feature acquisition subunit is configured to perform the second convolution calculation for the second deformed feature, and perform the second convolution feature obtained by the calculation with the first convolution feature. Stitching to get the appearance flow features corresponding to the current image feature layer.
  • the information generating module 330 further includes:
  • the second-order smoothing constraint unit is configured to perform layer-by-layer extraction of appearance flow features according to the multi-layer image features output by the first image feature extraction model and the second image feature extraction model, and also according to second-order smoothing constraints.
  • the second-order smooth constraint is a preset constraint for the linear correspondence between adjacent appearance flows.
  • the information generation module 330 is configured to virtually dress up the first clothing deformation sub-model contained in the student model
  • the virtual dress-up module 350 is configured to virtually dress up the first dress-up contained in the student model Generate submodels.
  • the image processing apparatus further includes:
  • the teaching assistant image acquisition module is configured to call the virtual dress-up teaching assistant model, and input the human body analysis result corresponding to the character image containing the designated person and the first clothing image containing the clothes to be replaced into the virtual dress-up teaching assistant model, and obtain the virtual dress-up teaching assistant.
  • the student image output by the virtual dress-up student model to be trained is obtained, and the designated person in the student image wears the original clothing adapted to the human body of the designated person in the teaching assistant image, and the original clothing is the designated person in the character.
  • the clothing worn in the image; the parameter update module is configured to use the person image as the teacher image, and update the parameters of the virtual dress-up student model to be trained according to the image loss information between the student image and the teacher image.
  • the image processing apparatus further includes:
  • the image quality difference acquisition module is configured to acquire the image quality difference between the teaching assistant image and the student image. If the image quality difference is positive, the image quality of the person is taken as the teacher image, and according to the difference between the student image and the teacher image, the image quality difference value is positive. Image loss information, the steps to update the parameters of the virtual dress-up student model to be trained.
  • the teaching assistant image acquisition module includes:
  • the second clothing deformation sub-model calling unit is configured to call the second clothing deformation sub-model in the virtual dress-up teaching assistant model, and according to the analysis result of the human body and the image features of the first clothing image, generate a model of the clothing to be replaced that is adapted to the human body of the designated person. a deformed image; a second dress-up generation sub-model calling unit, configured to call the second dress-up generation sub-model in the virtual dress-up model, and output the deformed image corresponding to the clothing to be replaced according to the second clothing deformation sub-model, and Other image areas in the person image except the area wearing the original clothing are fused to generate the teaching assistant image.
  • the teaching assistant image acquisition module further includes:
  • the image area information acquisition unit is configured to call the second dress-up generation sub-model in the virtual dress-up model, and according to the results of the human body analysis, clear the area of the original clothing worn by the designated person contained in the person image, so as to obtain the removal of the clothes in the person image. Areas of the image other than those with the original clothing.
  • the parameter update module includes:
  • the image loss value obtaining unit is configured to obtain the image loss value of the student image relative to the teacher image, the image loss value includes at least one of the pixel distance loss function value, the perceptual loss function value, and the confrontation loss function value; the loss value summation unit, It is configured to perform a sum operation on the image loss values to obtain the image loss sum value of the student image relative to the teacher image; the model parameter updating unit is configured to use the image loss sum value as the image loss information between the student image and the teacher image, and is ready for training.
  • the virtual dress-up student model for parameter updates is configured to obtain the image loss value of the student image relative to the teacher image, the image loss value includes at least one of the pixel distance loss function value, the perceptual loss function value, and the confrontation loss function value; the loss value summation unit, It is configured to perform a sum operation on the image loss values to obtain the image loss sum value of the student image relative to the teacher image; the model parameter updating unit is configured to use the image loss sum value as the image loss information between the student image
  • the first dressing generation sub-model consists of an encoder-decoder network and a residual network, and the residual network is used to normalize the connected upper layer network.
  • Embodiments of the present application also provide an electronic device, including a processor and a memory, wherein the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, implement the image processing method as described above.
  • FIG. 8 shows a schematic structural diagram of a computer system suitable for implementing the electronic device according to the embodiment of the present application.
  • the computer system 1600 includes a central processing unit (Central Processing Unit, CPU) 1601, which can be loaded into a random device according to a program stored in a read-only memory (Read-Only Memory, ROM) 1602 or from a storage part 1608
  • a program in a memory (Random Access Memory, RAM) 1603 is accessed to perform various appropriate actions and processes, such as performing the methods described in the above embodiments.
  • RAM Random Access Memory
  • various programs and data required for system operation are also stored.
  • the CPU 1601, the ROM 1602, and the RAM 1603 are connected to each other through a bus 1604.
  • An Input/Output (I/O) interface 1605 is also connected to the bus 1604 .
  • the following components are connected to the I/O interface 1605: an input section 1606 including a keyboard, a mouse, etc.; an output section 1607 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc. ; a storage part 1608 including a hard disk and the like; and a communication part 1609 including a network interface card such as a LAN (Local Area Network) card, a modem, and the like.
  • the communication section 1609 performs communication processing via a network such as the Internet.
  • Drivers 1610 are also connected to I/O interface 1605 as needed.
  • a removable medium 1611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 1610 as needed so that a computer program read therefrom is installed into the storage section 1608 as needed.
  • embodiments of the present application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program comprising a computer program for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication portion 1609, and/or installed from the removable medium 1611.
  • CPU central processing unit
  • the computer-readable medium shown in the embodiments of the present application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Erasable Programmable Read Only Memory (EPROM), flash memory, optical fiber, portable Compact Disc Read-Only Memory (CD-ROM), optical storage device, magnetic storage device, or any suitable of the above The combination.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying a computer-readable computer program therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • a computer program embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of code, and the above-mentioned module, program segment, or part of code contains one or more executables for realizing the specified logical function instruction.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • the units involved in the embodiments of the present application may be implemented in software or hardware, and the described units may also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • Another aspect of the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the aforementioned image processing method.
  • the computer-readable storage medium may be included in the electronic device described in the above embodiments, or may exist alone without being assembled into the electronic device.
  • Another aspect of the present application also provides a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image processing methods provided in the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)
  • Image Processing (AREA)

Abstract

一种图像处理方法及装置,该方法包括:获取包含目标人物的第一图像和包含目标服装的第二图像(S110);根据所述第一图像的图像特征和所述第二图像的图像特征,生成用于表征所述目标服装适应于所述目标人物的人体所产生形变的目标外观流特征,并基于所述目标外观流特征生成所述目标服装适应于所述人体的形变后图像(S130);根据所述形变后图像和所述第一图像融合生成虚拟换装图像,在所述虚拟换装图像中,所述目标人物穿戴有适应于所述人体的目标服装(S150)。所述方法无需依赖于人体解析结果即可实现高质量的虚拟换装。

Description

图像处理方法及装置、电子设备、存储介质
本申请要求于2021年01月27日提交中国专利局、申请号202110141360.3、申请名称为“图像处理方法及装置、电子设备、存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,具体而言,涉及一种图像处理方法及装置、电子设备、计算机可读存储介质。
背景技术
虚拟换装技术是指通过技术手段将人体图像和衣物图像进行融合,得到用户穿上衣物后的图像,便于了解用户穿上衣物后的效果,而无需用户穿上真实的衣物。虚拟换装技术广泛应用于网络购物、服装展示、服装设计或者线下购物的虚拟试穿等场景中。
目前的虚拟换装技术中,需依赖于人体图像的人体解析结果。一个理想的虚拟换装数据集应包含指定人物穿戴任意服装的图像、包含目标服装的图像、以及包含指定人物穿戴目标服装的图像,但由于同一个人保持完全一样的动作穿戴两件不同服装的图像难以获取,导致目前所采用的虚拟换装数据集中只包含指定人物穿戴目标服装的图像,且需要利用人体解析结果来清除指定人物的目标服装区域,再利用包含目标服装的图像重构人体图像。
由此可以看出,这种技术实现高度依赖于人体解析结果,当人体解析结果不准确时,会生成指定人物与目标服装不相匹配的虚拟换装图像。并且在实际的应用场景下,人体解析的过程需要较长耗时,导致无法得到实时的虚拟换装结果。
发明内容
为解决上述技术问题,本申请的实施例提供了一种图像处理方法及装置、电子设备、计算机可读存储介质,无需依赖于人体解析结果来进行虚拟换装,进而避免依赖于人体解析结果进行虚拟换装所导致的各种问题,实现高质量的虚拟换装。同时提高虚拟换装的效率,实现实时虚拟换装。
根据本申请实施例的一个方面,提供了一种图像处理方法,所述方法由计算机设备执行,包括:获取包含目标人物的第一图像和包含目标服装的第二图像;根据所述第一图像的图像特征和所述第二图像的图像特征,生成用于表征所述目标服装适应于所述目标人物的人体所产生形变的目标外观流特征,并基于所述目标外观流特征生成所述目标服装适应于所述人体的形变后图像;根据所述形变后图像和所述第一图像融合生成虚拟换装图像,在所述虚拟换装图像中,所述目标人物穿戴有适应于所述人体的目标服装。
根据本申请实施例的一个方面,提供了一种图像处理装置,包括:图像获取模块,配置为获取包含有目标人物的第一图像和包含有目标服装的第二图像;信息生成模块,配置 为根据所述第一图像的图像特征和所述第二图像的图像特征,生成用于表征所述目标服装适应于所述目标人物的人体所产生形变的目标外观流特征,并基于所述目标外观流特征生成所述目标服装适应于所述人体的形变后图像;虚拟换装模块,配置为根据所述形变后图像和所述第一图像融合生成虚拟换装图像,在所述虚拟换装图像中,所述目标人物穿戴有适应于所述人体的目标服装。
根据本申请实施例的一个方面,提供了一种电子设备,包括处理器及存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令被所述处理器执行时实现如上所述的图像处理方法。
根据本申请实施例的一个方面,提供了一种计算机可读存储介质,其上存储有计算机可读指令,当所述计算机可读指令被计算机的处理器执行时,使计算机执行如上所述的图像处理方法。
根据本申请实施例的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述各种可选实施例中提供的图像处理方法。
在本申请的实施例提供的技术方案中,无需依赖于人体解析结果来进行虚拟换装,而是通过获取目标服装适应于目标人物的人体所产生形变的目标外观流特征来对目标服装生成适应于人体的形变,最后将形变后的目标服装的图像(例如形变后图像)与包含目标人物的第一图像进行融合得到虚拟换装图像,由此解决相关技术实现中依赖于人体解析结果进行虚拟换装所导致的各种问题,实现高质量的虚拟换装。同时提高虚拟换装的效率,实现实时虚拟换装。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术者来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1是本申请涉及的一种实施环境的示意图;
图2是本申请的一实施例示出的虚拟换装学生模型的结构示意图;
图3是图2所示的第一服装形变子模型11在一实施例中的结构示意图;
图4是图3所示的“FN-2”模块在第二个图像特征层进行的外观流特征预测的流程示意图;
图5是本申请的另一实施例示出的图像处理方法的流程图;
图6是本申请的一实施例示出的虚拟换装学生模型的训练流程示意图;
图7是本申请的一实施例示出的图像处理装置的框图;
图8示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
具体实施方式
这里将详细地对示例性实施例执行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
还需要说明的是:在本申请中提及的“多个”是指两个或者两个以上。本申请所使用的术语“第一”、“第二”等可在本文中用于描述各种概念,但除非特别说明,这些概念不受这些术语限制。这些术语仅用于将一个概念与另一个概念区分。举例来说,在不脱离本申请的范围的情况下,可以将第一图像称为第二图像,且类似地,可将第二图像称为第一图像。
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术包括自然语言处理技术和机器学习。
机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。 机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、示教学习等技术。
计算机视觉技术(Computer Vision,CV)计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
以下将基于人工智能技术和计算机视觉技术,对本申请实施例提供的图像处理方法进行说明。
本申请实施例提供了一种图像处理方法,执行主体为计算机设备,可以将人体图像和衣物图像进行融合。在一种可能实现方式中,该计算机设备为终端,终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、车载计算机等。在另一种可能实现方式中,该计算机设备为服务器,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,其中多个服务器可组成一区块链,而服务器为区块链上的节点,服务器还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、以及大数据和人工智能平台等基础云计算服务的云服务器。
本申请实施例提供的图像处理方法,可以应用于将人体图像和衣物图像进行融合的任一场景下。例如,在网络购物时进行虚拟换装的场景下,若用户想要了解穿上某一衣物的效果,仅需提供该用户的人体图像,和该衣物的衣物图像,采用本申请实施例提供的方法,对人体图像和衣物图像进行处理,即可得到该用户穿上该衣物的人体图像,实现了线上虚拟换装,无需用户穿上真实的衣物。
除此之外,本申请实施例提供的图像处理方法还可以应用于服装设计、服装展示或者线下购物的虚拟试穿等场景中,以提供实时的虚拟换装功能,本处不一一进行列举。
请参阅图1,图1是本申请的一实施例示出的图像处理方法的流程图。该图像处理方法至少包括S110至S150,该S110至S150具体可以实现为虚拟换装学生模型。其中,虚拟换装学生模型是一种人工智能模型,无需依赖于人体解析结果即可实现目标人物的虚拟换装,不仅能够生成高质量的虚拟换装图像,还能够提升虚拟换装的实时性。
以下对于图1所示的图像处理方法进行详细描述:
S110,获取包含目标人物的第一图像和包含目标服装的第二图像。
本实施例中提及的目标人物是指待进行虚拟换装的人物,目标服装是指目标人物想要穿戴的服装。
例如在网络购物时进行虚拟换装的场景下,目标人物为当前进行网络购物的用户,第一图像为该用户提供的用户自身的人体图像,第二图像则可以是购物平台中加载的目标服装图片。需要说明的是,第一图像中含有的目标人物以及第二图像中含有的目标服装可以根据实际的应用场景进行确定,本处不对此进行限制。
S130,根据第一图像的图像特征和第二图像的图像特征,生成用于表征目标服装适应于目标人物的人体所产生形变的目标外观流特征,并基于目标外观流特征生成目标服装适应于人体的形变后图像。
首先,第一图像的图像特征是对第一图像进行图像特征提取得到的,第二图像的图像特征是对第二图像进行图像特征提取得到。例如在一些实施例中,可以将第一图像输入第一图像特征提取模型中,以及将第二图像输入第二图像特征提取模型中(即将第一图像作为第一图像特征提取模型的输入信号,以及将第二图像作为第二图像特征提取模型的输入信号),第一图像特征提取模型和第二图像特征提取模型中均配置有图像特征提取算法,由此获得第一图像特征提取模型针对第一图像输出的图像特征,以及获取第二图像特征提取模型针对第二图像输出的图像特征。
第一图像特征提取模型输出的第一图像的图像特征、以及第二图像特征提取模型输出的第二图像的图像特征可以是多层图像特征,该多层图像特征是指在对第一图像以及第二图像进行图像特征提取的过程中所依次得到的多个特征图。
示例性的,第一图像特征提取模型和第二图像特征提取模型可以是金字塔特征提取模型,金字塔特征提取模型中配置有特征金字塔网络(Feature Pyramid Networks,FPN),特征金字塔网络所输出的特征图金字塔也即为图像对应的多层图像特征。例如在一些实施例中,可以采用金字塔特征提取模型中自下而上的部分对第一图像和第二图像进行图像特征提取,该自下而上的部分理解为是使用卷积网络进行图像特征提取,随着卷积的深入,图像空间分辨率较少,空间信息丢失,但丰富了高级语义信息,从而得到特征图大小排序为由大至小的多层图像特征。
外观流(Appearance Flow)特征是指二维坐标向量,通常用于指示源图像的哪个像素可以用来重构目标图像的指定像素,本实施例为了实现高质量的虚拟换装,需构建目标人物的人体与目标服装之间准确而又密集的对应关系,从而使目标服装产生适应人体的形变,因此在本实施例中,源图像是指第二图像,具体可以是指第二图像中的目标服装区域,需重构的目标图像是指目标服装适应于第一图像中目标人物的人体而产生形变后的图像。
由此可知,目标外观流特征能够表征目标服装适应于第一图像中目标人物的人体所产生的形变,根据所得到的目标外观流特征,即可生成目标服装适应于人体的形变后图像。
当第一图像的图像特征是经由第一图像特征提取模型输出的多层图像特征,以及第二图像的图像特征是经由第二图像特征提取模型输出的多层图像特征时,可以根据第一图像特征提取模型和第二图像特征提取输出的多层图像特征,逐层进行外观流特征的提取,将针对最后一个图像特征层提取得到的外观流特征作为最终生成的目标外观流特征。
示例性的,可以在第一个图像特征层根据第一图像特征提取模型和第二图像特征提取模型输出的图像特征,提取用于表征目标服装适应于目标人物的人体所产生形变的外观流特征。而在第一个图像特征层之后的各个图像特征层,根据第一图像特征提取模型和第二图像特征提取模型输出的图像特征,对上一个图像特征层所对应的外观流特征进行优化处理,即可得到对应于当前图像特征层的外观流特征。
在根据第一图像特征提取模型和第二图像特征提取模型输出的多层图像特征,逐层进行外观流特征的提取的过程中,还可以根据预设的二阶平滑约束条件进行外观流特征的提取。其中,二阶平滑约束条件是针对相邻外观流之间的线性对应关系所设置的约束条件,以进一步保留目标服装图案、条纹等特征,由此提升生成的目标服装适应于目标人物的人体的形变后图像的图像质量。
S150,根据目标服装适应于目标人物的人体的形变后图像和第一图像融合生成虚拟换装图像,在虚拟换装图像中,目标人物穿戴有适应于人体的目标服装。
根据目标服装适应于目标人物的人体的形变后图像和第一图像融合生成虚拟换装图像,可以通过适用于虚拟换装的图像融合算法具体实现,例如可以采用Res-UNet算法,本实施例不对此进行限制。
由上可以看出,在实施例提供的技术方案无需依赖于人体解析结果来进行虚拟换装,而是通过获取目标服装适应于目标人物的人体所产生形变的目标外观流特征来对目标服装生成适应于人体的形变,最后将形变后的目标服装的图像(例如形变后图像)与包含目标人物的第一图像进行融合得到虚拟换装图像,由此避免出现依赖于人体解析结果进行虚拟换装所导致的虚拟换装图像质量不高、虚拟换装的实时性较弱等问题,实现高质量的虚拟换装。同时提高虚拟换装的效率,实现实时虚拟换装。
请参阅图2,图2是本申请的一实施例示出的虚拟换装学生模型的结构示意图。该示例性的虚拟换装学生模型10包括第一服装形变子模型11和第一换装生成子模型12,其中第一服装形变子模型11可以执行图1所示实施例中的S130,第一换装生成子模型12可以执行图1所示实施例中的S150。
如图2所示,通过将包含有目标人物的第一图像和包含有目标服装的第二图像输入虚拟换装学生模型10中,虚拟换装学生模型10即可输出相应的虚拟换装图像,在所输出的虚拟换装图像中,目标人物穿戴有适应于人体的目标服装。
除第一图像和第二图像以外,虚拟换装学生模型10无需其他额外的输入信号,无需向虚拟换装学生模型10中输入第一图像中含有的目标人物的人体解析结果。
图3是图2所示的第一服装形变子模型11在一实施例中的结构示意图。如图3所示,第一服装形变子模型11中含有第一图像特征提取模型、第二图像特征提取模型和外观流特征预测模型。
其中第一图像特征提取模型用于提取第一图像的图像特征,第二图像特征提取模型用于提取第二图像的图像特征。如图3所示,第一图像特征提取模型对第一图像进行图像特征提取,依次得到c1~c3所示的多层图像特征,第二图像特征提取模型对第二图像进行图像特征提取,依次得到p1~p3所示的多层图像特征。
需要说明的是,图3所示的多层图像特征仅为示例,通过第一图像特征提取模型和第二图像特征提取模型提取输入图像的图像特征的层数可以根据实际需要进行设置,本实施例不对此进行限制。
外观流特征预测模型用于根据第一图像特征提取模型和第二图像特征提取模型输出的多层图像特征,逐层进行外观流特征的提取,将针对最后一个图像特征层提取得到的外观流特征作为最终生成的目标外观流特征。例如,图3所示“FN-1”模块用于在第一个图像特征层进行外观流特征预测,“FN-2”模块用于在第二个图像特征层进行外观流特征预测,“FN-3”模块用于在第三个图像特征层进行外观流特征预测。也即,外观流特征预测模型为渐进式的外观流特征预测模型。
如图3所示,外观流特征预测模型在第一个图像特征层根据第一图像特征提取模型和第二图像特征提取模型输出的图像特征,第一次提取用于表征目标服装适应于目标人物的人体所产生形变的外观流特征。在第一个图像特征层之后的各个图像特征层,外观流特征预测模型根据第一图像特征提取模型和第二图像特征提取模型输出的图像特征,对针对上一个图像特征层所输出的外观流特征进行优化处理,得到对应于当前图像特征层的外观流特征。
通过如此的渐进式处理方式,由于多层图像特征是随着卷积的不断深入,图像空间分辨率逐渐减少,空间信息也逐渐丢失,但丰富了高级语义信息,使得在外观流特征预测模型逐层得到的外观流特征中含有的特征信息也越来越丰富和准确,例如在图3所示的外观流特征f1~f3中,所含有的特征信息逐渐丰富且逐渐与目标人物的人体相适应。
由此可知,外观流特征预测模型在最后一个图像特征层所得到的外观流特征能够非常准确地反映目标服装适应于目标人物的人体所产生的形变,而基于外观流特征预测模型在最后一个图像特征层所得到的外观流特征所生成的目标服装对应的形变后图像,则能够与目标人物的人体建立准确而又紧密的对应关系,使得后续能够根据目标服装产生的准确形变以及目标人物人体进行融合,得到高质量的虚拟换装图像。
图4是图3所示的“FN-2”模块在第二个图像特征层进行的外观流特征预测的流程示意图。如图4所示,首先对上一个图像特征层对应的外观流特征f1进行上采样处理,得到上采样特征f1',然后根据上采样特征f1',对当前特征层对应的第二图像的图像特征c2进 行第一形变处理,得到第一形变后特征c2'。接下来,基于当前图像特征层对应的第一图像的图像特征p2对第一形变后特征c2'进行校正处理,得到校正后特征r2,并对校正后特征r2进行卷积计算得到第一卷积特征f2”'。接下来,根据第一卷积特征f2”'和上采样特征f1'拼接得到的特征f2”,对当前图像特征层对应的第二图像的图像特征c2进行第二形变处理,得到第二形变后特征p2c2”,第二形变后特征也即是当前图像特征层输出的第一图像的图像特征p2与另一特征c2”的拼合。最后,针对第二形变后特征p2c2”进行第二卷积计算,并将计算得到第二卷积特征f2'与第一卷积特征f2”进行拼接,则可以得到对应于当前图像特征层的外观流特征f2。
由上可以看出,通过对上一个图像特征层输出的外观流特征进行上采样处理,有利于提升当前图像特征层的外观流特征的分辨率。后续通过进行两次形变处理和两次卷积计算,能够进一步细化上采样特征中含有的特征信息,相当于在上一个图像特征层输出的外观流特征的基础上新增了外观流特征的空间信息,以此实现对上一个图像特征层输出的外观流特征的优化,得到能够进一步反映目标服装适应于目标人物人体的形变的外观流特征。
另外提及的是,在一些实施例中,外观流特征预测模型在根据第一图像特征提取模型和第二图像特征提取模型输出的多层图像特征,逐层进行外观流特征的提取的过程中,还根据针对相邻外观流之间的线性对应关系所预设的二阶平滑约束条件进行外观流特征的提取,以进一步保留目标服装的图案、条纹等特征。
图5是本申请的另一实施例示出的图像处理方法的流程图。如图5所示,该方法在图1所示实施例的基础上还包括S210至S250,详细介绍如下:
S210,调用虚拟换装助教模型,将包含有指定人物的人物图像对应的人体解析结果,以及包含有待更换服装的第一服装图像输入虚拟换装助教模型,得到虚拟换装助教模型输出的助教图像,在助教图像中指定人物穿戴有适应于指定人物的人体的待更换服装。
首先说明的是,本实施例揭示了对图2所示的虚拟换装学生模型进行训练的过程。在虚拟换装学生模型的训练阶段,需调用虚拟换装助教模型进行辅助训练,具体来说,虚拟换装助教模型是依赖于人体解析结果的人工智能模型,通过向虚拟换装助教模型中输入包含有指定人物的人物图像对应的人体解析结果以及包含有待更换服装的第一服装图像,虚拟换装助教模型则可以输出对应的助教图像。在助教图像中指定人物穿戴有适应于指定人物的人体的待更换服装。
在本实施例中,虚拟换装数据集是由包含指定人物的人物图像、包含待更换服装的第一服装图像、以及包含指定人物所穿戴原始服装的第二服装图像所构成的图像数据集。其中人物图像、第一服装图像和第二服装图像的数量均可以是多张,不同的人物图像中含有的指定人物可以相同,也可以不相同,本实施例不对此进行限制。
S230,将包含有原始服装的第二服装图像以及助教图像输入待训练的虚拟换装学生模型中,得到待训练的虚拟换装学生模型输出的学生图像,在学生图像中指定人物穿戴有适 应于助教图像中指定人物的人体的原始服装,原始服装是指定人物在人物图像中穿戴的服装。
由于虚拟换装学生模型并不依赖于人体解析结果来实现虚拟换装,而虚拟换装助教模型基于人体解析结果提取的特征会包含更加丰富的语义信息和特征表达,因此本实施例使用虚拟换装助教模型来指导虚拟换装学生模型进行训练。
也即,本实施例是采取知识蒸馏的方式对虚拟换装学生模型进行训练。
其中知识蒸馏是指利用教师网络的内在信息来训练学生网络,在实施例中,教师网络是虚拟换装助教模型,教师网络的内在信息是指虚拟换装助教模型根据人体解析结果所提取到的特征表达和语义信息。
训练好的虚拟换装学生模型充分学习了人体以及服装之间准确而又密集的对应关系,因此在实际应用中,无需获取目标人物的人体解析结果,虚拟换装学生模型仍能够根据输入其中的包含有目标人物的第一图像和包含有目标服装的第二图像,输出高质量的虚拟换装图像。
具体来说,本实施例将虚拟换装助教模型输出的助教图像作为助教知识输入待训练的虚拟换装学生模型中,并将包含有原始服装的第二服装图像输入待训练的虚拟换装学生模型中,使得待训练的虚拟换装学生模型输出学生图像。在学生图像中指定人物穿戴有适应于助教图像中指定人物的人体的原始服装。
S250,将人物图像作为老师图像,根据学生图像与老师图像之间的图像损失信息,对待训练的虚拟换装学生模型进行参数更新。
本实施例将人物图像作为老师图像,对虚拟换装学生模型的训练过程进行监督。也即是说,虚拟换装学生模型在训练过程中可以直接受到老师图像的监督,有利于提升虚拟换装学生模型的性能,使得最终训练得到的虚拟换装学生模型在实际应用中,能够拜托对于人体解析结果的依赖,根据输入其中的第一图像和第二图像即可输出高质量的虚拟换装图像。
学生图像与老师图像之间的图像损失信息可以是通过对学生图像与老师图像进行损失函数值计算所得到的。示例性的,可以获取学生图像相对老师图像的图像损失值,图像损失值可以包括像素距离损失函数值、感知损失函数值、对抗损失函数值中的至少一种,然后对图像损失值进行求和运算,得到学生图像相对老师图像的图像损失和值,最后将图像损失和值作为学生图像与老师图像之间的图像损失信息,对待训练的虚拟换装学生模型进行参数更新,由此完成一次虚拟换装学生模型的训练。
通过对待训练的虚拟换装学生模型进行多次训练,以逐步提升虚拟换装学生模型的模型性能,当学生图像与老师图像之间的图像损失信息小于或等于预设的图像损失阈值时,表示虚拟换装学生模型已达到较佳的模型性能,则可以结束虚拟换装学生模型的训练过程。
另外还需提及的是,人体解析结果可以包括人体关键点、人体姿势热图、密集姿势估计等信息,在大多数情况下,虚拟换装助教模型根据人体解析结果可以提取到更加丰富的语义信息,所预测得到的外观流特征也会更加准确,因此虚拟换装助教模型输出的助教图像的图像质量应当高于虚拟换装学生模型输出的学生图像。
如果输入虚拟换装助教模型中的人体解析结果不准确,会导致在虚拟换装学生模型的训练过程中,虚拟换装助教模型向虚拟换装学生模型提供完全错误的指导,因此有必要设置可调节的知识蒸馏机制来确保只有准确的助教图像可以用于训练虚拟换装学生模型。
具体来说,通过在S250之前进行获取助教图像与学生图像之间的图像质量差值,若判断此图像质量差值为正值,则表示助教图像的图像质量大于学生图像的图像质量,进而执行S250,以基于此助教图像对虚拟换装学生模型进行训练。若判断此图像质量差值为负值或零,则表示助教图像的图像质量并非大于学生图像的图像质量,输入虚拟换装助教模型中的人体解析结果可能是完全错误的,因此终止执行S250,进入下一轮次的虚拟换装学生模型的训练过程。
图6是本申请的一实施例示出的虚拟换装学生模型的训练流程示意图。如图6所示,是将虚拟换装助教模型20作为用于训练虚拟换装学生模型10的辅助模型,虚拟换装助教模型20根据输入其中的第一服装图像以及对人物图像(也即老师图像)进行人体解析得到的人体解析结果,输出对应的助教图像。然后,将虚拟换装助教模型20输出的助教图像和第二服装图像输入至虚拟换装学生模型10中,得到虚拟换装学生模型10输出的学生图像。根据学生图像与老师图像之间的图像损失信息,即可对虚拟换装学生模型10进行参数更新。
虚拟换装助教模型20包括第二服装形变子模型21和第二服装生成子模型22,通过调用第二服装形变子模型21,可以根据人体解析结果和第一服装图像的图像特征,生成待更换服装适应于指定人物的人体的形变后图像。详细的过程可以参见图3和图4所对应的实施例描述,本处不再赘述。通过调用第二换装生成子模型22,则可以根据第二服装形变子模型输出的待更换服装对应的形变后图像,以及人物图像中除穿戴了原始服装的区域以外的其它图像区域融合生成助教图像。
在另外的实施例中,通过调用第二换装生成子模型22,还可以根据人体解析结果,清除人物图像中含有的指定人物所穿戴原始服装的区域,以得到人物图像中除穿戴了原始服装的区域以外的其它图像区域。
需要说明的是,虚拟换装学生模型中含有的第一服装形变子模型与虚拟换装助教模型中含有的第二服装形变子模型之间可以具有相同的网络结构,例如具有图3所示的网络结构。虚拟换装学生模型中含有的第一换装生成子模型与虚拟换装助教模型中含有的第二换装生成子模型之间也可以具有相同的网络结构,例如第一换装生成子模型和第二换装生成子模型可以由编码器-解码器网络和残差网络组成,残差网络用于对其所连接的上层网络进行归一化处理,由此便于在模型训练过程中进行参数的优化处理。
由上可以看出,本申请通过一种新颖的“老师-助教-学生”的知识蒸馏机制来训练无需依赖于人体解析结果的虚拟换装学生模型,虚拟换装学生模型在训练过程中受到老师图像的监督,使得最终训练得到的虚拟换装学生模型无需依赖于人体解析结果即可生成高逼真的虚拟换装结果,在无需依赖于人体解析结果的情况下实现高质量的虚拟换装。
图7是本申请的一实施例示出的图像处理装置的框图。如图7所示,在一示例性的实施例中,该图像处理装置包括:
图像获取模块310,配置为获取包含有目标人物的第一图像和包含有目标服装的第二图像;信息生成模块330,配置为根据第一图像的图像特征和第二图像的图像特征,生成用于表征目标服装适应于目标人物的人体所产生形变的目标外观流特征,并基于目标外观流特征生成目标服装适应于人体的形变后图像;虚拟换装模块350,配置为根据形变后图像和第一图像融合生成虚拟换装图像,在虚拟换装图像中,目标人物穿戴有适应于人体的目标服装。
在另一示例性的实施例中,信息生成模块330包括:
多层图像特征获取单元,配置为将第一图像作为第一图像特征提取模型的输入信号,以及将第二图像作为第二图像特征提取模型的输入信号,通过第一图像特征提取模型和第二图像特征提取模型分别提取输入信号对应的多层图像特征;外观流特征提取单元,配置为根据第一图像特征提取模型和第二图像特征提取模型输出的多层图像特征,逐层进行外观流特征的提取,将针对最后一个图像特征层提取得到的外观流特征作为目标外观流特征。
在另一示例性的实施例中,外观流特征提取单元包括:
第一特征提取子单元,配置为在第一个图像特征层根据第一图像特征提取模型和第二图像特征提取模型输出的图像特征,提取用于表征目标服装适应于目标人物的人体所产生形变的外观流特征;第二特征提取子单元,配置为在第一个图像特征层之后的各个图像特征层根据第一图像特征提取模型和第二图像特征提取模型输出的图像特征,对上一个图像特征层所对应的外观流特征进行优化处理,得到对应于当前图像特征层的外观流特征。
在另一示例性的实施例中,第二特征提取子单元包括:
第一形变处理子单元,配置为根据上一个图像特征层对应的外观流特征进行上采样处理得到上采样特征,根据上采样特征对当前图像特征层对应的第二图像的图像特征进行第一形变处理,得到第一形变后特征;校正处理子单元,配置为基于当前图像特征层对应的第一图像的图像特征对第一形变后特征进行校正处理,并对校正处理得到的校正后特征进行第一卷积计算,得到第一卷积特征;第二形变处理子单元,配置为根据第一卷积特征和上采样特征拼接得到的特征,对当前图像特征层对应的第二图像的图像特征进行第二形变处理,得到第二形变后特征;外观流特征获取子单元,配置为针对第二形变后特征进行第二卷积计算,并将计算得到第二卷积特征与第一卷积特征进行拼接,以得到对应于当前图像特征层的外观流特征。
在另一示例性的实施例中,信息生成模块330还包括:
二阶平滑约束单元,配置为在根据第一图像特征提取模型和第二图像特征提取模型输出的多层图像特征,逐层进行外观流特征的提取的过程中,还根据二阶平滑约束条件进行外观流特征的提取,二阶平滑约束条件是针对相邻外观流之间的线性对应关系所预设的约束条件。
在另一示例性的实施例中,信息生成模块330配置为虚拟换装学生模型中含有的第一服装形变子模型,虚拟换装模块350配置为虚拟换装学生模型中含有的第一换装生成子模型。
在另一示例性的实施例中,该图像处理装置还包括:
助教图像获取模块,配置为调用虚拟换装助教模型,将包含有指定人物的人物图像对应的人体解析结果,以及包含有待更换服装的第一服装图像输入虚拟换装助教模型,得到虚拟换装助教模型输出的助教图像,在助教图像中指定人物穿戴有适应于指定人物的人体的待更换服装;学生图像获取模块,配置为将包含有原始服装的第二服装图像以及助教图像输入待训练的虚拟换装学生模型中,得到待训练的虚拟换装学生模型输出的学生图像,在学生图像中指定人物穿戴有适应于助教图像中指定人物的人体的原始服装,原始服装是所述指定人物在人物图像中穿戴的服装;参数更新模块,配置为将人物图像作为老师图像,根据学生图像与老师图像之间的图像损失信息,对待训练的虚拟换装学生模型进行参数更新。
在另一示例性的实施例中,该图像处理装置还包括:
图像质量差值获取模块,配置为获取助教图像与学生图像之间的图像质量差值,若图像质量差值为正值,则执行将人物图像作为老师图像,根据学生图像与老师图像之间的图像损失信息,对待训练的虚拟换装学生模型进行参数更新的步骤。
在另一示例性的实施例中,助教图像获取模块包括:
第二服装形变子模型调用单元,配置为调用虚拟换装助教模型中的第二服装形变子模型,根据人体解析结果和第一服装图像的图像特征,生成待更换服装适应于指定人物的人体的形变后图像;第二换装生成子模型调用单元,配置为调用虚拟换装模型中的第二换装生成子模型,根据第二服装形变子模型输出的待更换服装对应的形变后图像,以及人物图像中除穿戴有原始服装的区域以外的其它图像区域融合生成助教图像。
在另一示例性的实施例中,助教图像获取模块还包括:
图像区域信息获取单元,配置为调用虚拟换装模型中的第二换装生成子模型,根据人体解析结果,清除人物图像中含有的指定人物所穿戴原始服装的区域,以得到人物图像中除穿戴有原始服装的区域以外的其它图像区域。
在另一示例性的实施例中,参数更新模块包括:
图像损失值获取单元,配置为获取学生图像相对老师图像的图像损失值,图像损失值包括像素距离损失函数值、感知损失函数值、对抗损失函数值中的至少一种;损失值求和单元,配置为对图像损失值进行求和运算,得到学生图像相对老师图像的图像损失和值;模型参数更新单元,配置为将图像损失和值作为学生图像与老师图像之间的图像损失信息,对待训练的虚拟换装学生模型进行参数更新。
在另一示例性的实施例中,第一换装生成子模型由编码器-解码器网络和残差网络组成,残差网络用于对所连接的上层网络进行归一化处理。
需要说明的是,上述实施例所提供的装置与上述实施例所提供的方法属于同一构思,其中各个模块和单元执行操作的具体方式已经在方法实施例中进行了详细描述,此处不再赘述。
本申请的实施例还提供了一种电子设备,包括处理器和存储器,其中,存储器上存储有计算机可读指令,该计算机可读指令被处理器执行时实现如前所述的图像处理方法。
图8示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
需要说明的是,图8示出的电子设备的计算机系统1600仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图8所示,计算机系统1600包括中央处理单元(Central Processing Unit,CPU)1601,其可以根据存储在只读存储器(Read-Only Memory,ROM)1602中的程序或者从储存部分1608加载到随机访问存储器(Random Access Memory,RAM)1603中的程序而执行各种适当的动作和处理,例如执行上述实施例中所述的方法。在RAM 1603中,还存储有系统操作所需的各种程序和数据。CPU 1601、ROM 1602以及RAM 1603通过总线1604彼此相连。输入/输出(Input/Output,I/O)接口1605也连接至总线1604。
以下部件连接至I/O接口1605:包括键盘、鼠标等的输入部分1606;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分1607;包括硬盘等的储存部分1608;以及包括诸如LAN(Local Area Network,局域网)卡、调制解调器等的网络接口卡的通信部分1609。通信部分1609经由诸如因特网的网络执行通信处理。驱动器1610也根据需要连接至I/O接口1605。可拆卸介质1611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1610上,以便于从其上读出的计算机程序根据需要被安装入储存部分1608。
特别地,根据本申请的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的计算机程序。在这样的实施例中,该计算机程序可以通过通信部分1609从网络上被下载和安装,和/或从可拆卸介质1611被安装。在该计算机程序被中央处理单元(CPU)1601执行时,执行本申请的系统中限定的各种功能。
需要说明的是,本申请实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的计算机程序。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的计算机程序可以用任何适当的介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。其中,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。
本申请的另一方面还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如前所述的图像处理方法。该计算机可读存储介质可以是上述实施例中描述的电子设备中所包含的,也可以是单独存在,而未装配入该电子设备中。
本申请的另一方面还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述各个实施例中提供的图像处理方法。
上述内容,仅为本申请的较佳示例性实施例,并非用于限制本申请的实施方案,本领域普通技术人员根据本申请的主要构思和精神,可以十分方便地进行相应的变通或修改,故本申请的保护范围应以权利要求书所要求的保护范围为准。

Claims (16)

  1. 一种图像处理方法,所述方法由计算机设备执行,所述方法包括:
    获取包含目标人物的第一图像和包含目标服装的第二图像;
    根据所述第一图像的图像特征和所述第二图像的图像特征,生成用于表征所述目标服装适应于所述目标人物的人体所产生形变的目标外观流特征,并基于所述目标外观流特征生成所述目标服装适应于所述人体的形变后图像;
    根据所述形变后图像和所述第一图像融合生成虚拟换装图像,在所述虚拟换装图像中,所述目标人物穿戴有适应于所述人体的目标服装。
  2. 根据权利要求1所述的方法,所述根据所述第一图像的图像特征和所述第二图像的图像特征,生成用于表征所述目标服装适应于所述目标人物的人体所产生形变的目标外观流特征,包括:
    将所述第一图像作为第一图像特征提取模型的输入信号,以及将所述第二图像作为第二图像特征提取模型的输入信号,通过所述第一图像特征提取模型和所述第二图像特征提取模型分别提取输入信号对应的多层图像特征;
    根据所述第一图像特征提取模型和所述第二图像特征提取模型输出的多层图像特征,逐层进行外观流特征的提取,将针对最后一个图像特征层提取得到的外观流特征作为所述目标外观流特征。
  3. 根据权利要求2所述的方法,所述根据所述第一图像特征提取模型和所述第二图像特征提取模型输出的多层图像特征,逐层进行外观流特征的提取,包括:
    在第一个图像特征层根据所述第一图像特征提取模型和所述第二图像特征提取模型输出的图像特征,提取用于表征所述目标服装适应于所述目标人物的人体所产生形变的外观流特征;
    在所述第一个图像特征层之后的各个图像特征层根据所述第一图像特征提取模型和所述第二图像特征提取模型输出的图像特征,对上一个图像特征层所对应的外观流特征进行优化处理,得到对应于当前图像特征层的外观流特征。
  4. 根据权利要求3所述的方法,所述在所述第一个图像特征层之后的各个图像特征层根据所述第一图像特征提取模型和所述第二图像特征提取模型输出的图像特征,对上一个图像特征层所对应的外观流特征进行优化处理,得到对应于当前图像特征层的外观流特征,包括:
    根据上一个图像特征层对应的外观流特征进行上采样处理得到上采样特征;
    根据所述上采样特征对当前图像特征层对应的第二图像的图像特征进行第一形变处理,得到第一形变后特征;
    基于当前图像特征层对应的第一图像的图像特征对所述第一形变后特征进行校正处理,并对校正处理得到的校正后特征进行第一卷积计算,得到第一卷积特征;
    根据所述第一卷积特征和所述上采样特征拼接得到的特征,对所述当前图像特征层对应的第二图像的图像特征进行第二形变处理,得到第二形变后特征;
    针对所述第二形变后特征进行第二卷积计算,并将计算得到第二卷积特征与所述第一卷积特征进行拼接,以得到对应于当前图像特征层的外观流特征。
  5. 根据权利要求2所述的方法,在根据所述第一图像特征提取模型和所述第二图像特征提取模型输出的多层图像特征,逐层进行外观流特征的提取的过程中,还根据二阶平滑约束条件进行所述外观流特征的提取,所述二阶平滑约束条件是针对相邻外观流之间的线性对应关系所预设的约束条件。
  6. 根据权利要求1所述的方法,所述根据所述第一图像的图像特征和所述第二图像的图像特征,生成用于表征所述目标服装适应于所述目标人物的人体所产生形变的目标外观流特征,并基于所述目标外观流特征生成所述目标服装适应于所述人体的形变后图像的步骤由虚拟换装学生模型中的第一服装形变子模型执行;
    所述根据所述形变后图像和所述第一图像融合生成虚拟换装图像的步骤,由所述虚拟换装学生模型中的第一换装生成子模型执行。
  7. 根据权利要求6所述的方法,所述方法还包括:
    调用虚拟换装助教模型,将包含有指定人物的人物图像对应的人体解析结果,以及包含有待更换服装的第一服装图像输入所述虚拟换装助教模型,得到所述虚拟换装助教模型输出的助教图像,在所述助教图像中所述指定人物穿戴有适应于所述指定人物的人体的待更换服装;
    将包含有原始服装的第二服装图像以及所述助教图像输入待训练的虚拟换装学生模型中,得到所述待训练的虚拟换装学生模型输出的学生图像,在所述学生图像中所述指定人物穿戴有适应于所述助教图像中指定人物的人体的原始服装,所述原始服装是所述指定人物在人物图像中穿戴的服装;
    将所述人物图像作为老师图像,根据所述学生图像与所述老师图像之间的图像损失信息,对所述待训练的虚拟换装学生模型进行参数更新。
  8. 根据权利要求7所述的方法,在将所述人物图像作为老师图像,根据所述学生图像与所述老师图像之间的图像损失信息,对所述待训练的虚拟换装学生模型进行参数更新之前,所述方法还包括:
    获取所述助教图像与所述学生图像之间的图像质量差值;
    若所述图像质量差值为正值,则执行所述将所述人物图像作为老师图像,根据所述学生图像与所述老师图像之间的图像损失信息,对所述待训练的虚拟换装学生模型进行参数更新的步骤。
  9. 根据权利要求7所述的方法,所述调用虚拟换装助教模型,将包含有人物的人物图像对应的人体解析结果,以及包含有待更换服装的第一服装图像输入所述虚拟换装助教模型,得到所述虚拟换装助教模型输出的助教图像,包括:
    调用所述虚拟换装助教模型中的第二服装形变子模型,根据所述人体解析结果和所述第一服装图像的图像特征,生成所述待更换服装适应于所述指定人物的人体的形变后图像;
    调用所述虚拟换装模型中的第二换装生成子模型,根据所述第二服装形变子模型输出的所述待更换服装对应的形变后图像,以及所述人物图像中除穿戴有原始服装的区域以外的其它图像区域融合生成所述助教图像。
  10. 根据权利要求9所述的方法,所述方法还包括:
    调用所述虚拟换装模型中的第二换装生成子模型,根据所述人体解析结果,清除所述人物图像中含有的所述指定人物所穿戴原始服装的区域,以得到所述人物图像中除穿戴有原始服装的区域以外的其它图像区域。
  11. 根据权利要求7所述的方法,所述根据所述学生图像与所述老师图像之间的图像损失信息,对所述待训练的虚拟换装学生模型进行参数更新,包括:
    获取所述学生图像相对所述老师图像的图像损失值,所述图像损失值包括像素距离损失函数值、感知损失函数值、对抗损失函数值中的至少一种;
    对所述图像损失值进行求和运算,得到所述学生图像相对所述老师图像的图像损失和值;
    将所述图像损失和值作为所述学生图像与所述老师图像之间的图像损失信息,对所述待训练的虚拟换装学生模型进行参数更新。
  12. 根据权利要求6所述的方法,所述第一换装生成子模型由编码器-解码器网络和残差网络组成,所述残差网络用于对所连接的上层网络进行归一化处理。
  13. 一种图像处理装置,所述装置部署在计算机设备上,所述装置包括:
    图像获取模块,配置为获取包含有目标人物的第一图像和包含有目标服装的第二图像;
    信息生成模块,配置为根据所述第一图像的图像特征和所述第二图像的图像特征,生成用于表征所述目标服装适应于所述目标人物的人体所产生形变的目标外观流特征,并基于所述目标外观流特征生成所述目标服装适应于所述人体的形变后图像;
    虚拟换装模块,配置为根据所述形变后图像和所述第一图像融合生成虚拟换装图像,在所述虚拟换装图像中,所述目标人物穿戴有适应于所述人体的目标服装。
  14. 一种电子设备,包括:
    存储器,存储有计算机可读指令;
    处理器,读取存储器存储的计算机可读指令,以执行权利要求1-12中的任一项所述的方法。
  15. 一种计算机可读存储介质,其上存储有计算机可读指令,当所述计算机可读指令被计算机的处理器执行时,使计算机执行权利要求1-12中的任一项所述的方法。
  16. 一种计算机程序产品,当所述计算机程序产品被执行时,用于执行权利要求1-12任一项所述的方法。
PCT/CN2022/072892 2021-01-27 2022-01-20 图像处理方法及装置、电子设备、存储介质 WO2022161234A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2023522881A JP2023545189A (ja) 2021-01-27 2022-01-20 画像処理方法、装置、及び電子機器
US18/051,408 US20230077356A1 (en) 2021-01-27 2022-10-31 Method, apparatus, electronic device, and storage medium for processing image

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110141360.3 2021-01-27
CN202110141360.3A CN113570685A (zh) 2021-01-27 2021-01-27 图像处理方法及装置、电子设备、存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/051,408 Continuation US20230077356A1 (en) 2021-01-27 2022-10-31 Method, apparatus, electronic device, and storage medium for processing image

Publications (1)

Publication Number Publication Date
WO2022161234A1 true WO2022161234A1 (zh) 2022-08-04

Family

ID=78161097

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/072892 WO2022161234A1 (zh) 2021-01-27 2022-01-20 图像处理方法及装置、电子设备、存储介质

Country Status (4)

Country Link
US (1) US20230077356A1 (zh)
JP (1) JP2023545189A (zh)
CN (1) CN113570685A (zh)
WO (1) WO2022161234A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113570685A (zh) * 2021-01-27 2021-10-29 腾讯科技(深圳)有限公司 图像处理方法及装置、电子设备、存储介质
CN114125271B (zh) * 2021-11-02 2024-05-14 西安维沃软件技术有限公司 图像处理方法、装置及电子设备
CN115082295B (zh) * 2022-06-23 2024-04-02 天津大学 一种基于自注意力机制的图像编辑方法及装置
CN115861488B (zh) * 2022-12-22 2023-06-09 中国科学技术大学 高分辨率虚拟换装方法、系统、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876814A (zh) * 2018-01-11 2018-11-23 南京大学 一种生成姿态流图像的方法
CN110211196A (zh) * 2019-05-28 2019-09-06 山东大学 一种基于姿势引导的虚拟试穿方法及装置
CN111709874A (zh) * 2020-06-16 2020-09-25 北京百度网讯科技有限公司 图像调整的方法、装置、电子设备及存储介质
CN113570685A (zh) * 2021-01-27 2021-10-29 腾讯科技(深圳)有限公司 图像处理方法及装置、电子设备、存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876814A (zh) * 2018-01-11 2018-11-23 南京大学 一种生成姿态流图像的方法
CN110211196A (zh) * 2019-05-28 2019-09-06 山东大学 一种基于姿势引导的虚拟试穿方法及装置
CN111709874A (zh) * 2020-06-16 2020-09-25 北京百度网讯科技有限公司 图像调整的方法、装置、电子设备及存储介质
CN113570685A (zh) * 2021-01-27 2021-10-29 腾讯科技(深圳)有限公司 图像处理方法及装置、电子设备、存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HAN XINTONG; HUANG WEILIN; HU XIAOJUN; SCOTT MATTHEW: "ClothFlow: A Flow-Based Model for Clothed Person Generation", 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 27 October 2019 (2019-10-27), pages 10470 - 10479, XP033723893, DOI: 10.1109/ICCV.2019.01057 *

Also Published As

Publication number Publication date
JP2023545189A (ja) 2023-10-26
CN113570685A (zh) 2021-10-29
US20230077356A1 (en) 2023-03-16

Similar Documents

Publication Publication Date Title
WO2022161234A1 (zh) 图像处理方法及装置、电子设备、存储介质
JP2022524891A (ja) 画像処理方法及び装置、電子機器並びにコンピュータプログラム
CN110659723B (zh) 基于人工智能的数据处理方法、装置、介质及电子设备
CN110909680A (zh) 人脸图像的表情识别方法、装置、电子设备及存储介质
US20230100427A1 (en) Face image processing method, face image processing model training method, apparatus, device, storage medium, and program product
CN112132770A (zh) 图像修复的方法、装置、计算机可读介质及电子设备
CN116310318A (zh) 交互式的图像分割方法、装置、计算机设备和存储介质
CN116704079A (zh) 图像生成方法、装置、设备及存储介质
CN115147261A (zh) 图像处理方法、装置、存储介质、设备及产品
CN113822114A (zh) 一种图像处理方法、相关设备及计算机可读存储介质
US20230115765A1 (en) Method and apparatus of transferring image, and method and apparatus of training image transfer model
CN113705317B (zh) 图像处理模型训练方法、图像处理方法及相关设备
CN116978057A (zh) 图像中人体姿态迁移方法、装置、计算机设备和存储介质
CN117011156A (zh) 图像处理方法、装置、设备及存储介质
CN113223128B (zh) 用于生成图像的方法和装置
Zuo Implementation of HCI software interface based on image identification and segmentation algorithms
Zhou et al. Improved GCN framework for human motion recognition
CN117557699B (zh) 动画数据生成方法、装置、计算机设备和存储介质
CN116542292B (zh) 图像生成模型的训练方法、装置、设备及存储介质
CN116704588B (zh) 面部图像的替换方法、装置、设备及存储介质
CN113505866B (zh) 基于边缘素材数据增强的图像分析方法和装置
Zhang et al. Generative facial prior and semantic guidance for iterative face inpainting
CN117132777B (zh) 图像分割方法、装置、电子设备及存储介质
US20220189050A1 (en) Synthesizing 3d hand pose based on multi-modal guided generative networks
CN116129499A (zh) 一种遮挡人脸识别方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22745109

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023522881

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11.12.2023)