CN114881893B - Image processing method, device, equipment and computer readable storage medium - Google Patents

Image processing method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN114881893B
CN114881893B CN202210781883.9A CN202210781883A CN114881893B CN 114881893 B CN114881893 B CN 114881893B CN 202210781883 A CN202210781883 A CN 202210781883A CN 114881893 B CN114881893 B CN 114881893B
Authority
CN
China
Prior art keywords
image
training
model
key point
point set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210781883.9A
Other languages
Chinese (zh)
Other versions
CN114881893A (en
Inventor
康洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210781883.9A priority Critical patent/CN114881893B/en
Publication of CN114881893A publication Critical patent/CN114881893A/en
Application granted granted Critical
Publication of CN114881893B publication Critical patent/CN114881893B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an image processing method, an image processing device, an image processing apparatus and a computer readable storage medium; the method comprises the following steps: acquiring an image to be processed, a trained first model and a trained second model; predicting the image to be processed by using the trained first model to obtain a first key point set; determining a target area image from the image to be processed based on the first key point set; predicting the target area image by using the trained second model to obtain a second key point set, wherein the first key point set is a subset of the second key point set; and performing special effect processing on the target area image based on the second key point set to obtain a processed image. By the method and the device, the image processing precision can be improved.

Description

Image processing method, device, equipment and computer readable storage medium
Technical Field
The present application relates to image processing technologies, and in particular, to an image processing method, an image processing apparatus, an image processing device, and a computer-readable storage medium.
Background
With the development of mobile communication technology, especially the popularization and application of mobile networks and intelligent terminals, the field of mobile internet develops rapidly. At present, the function of the intelligent terminal is not simply used for meeting the mutual contact of people, and is an important mobile personal entertainment terminal in daily life.
Nowadays, photographing and video recording functions become indispensable functions of intelligent terminals, and people can record wonderful moments anytime and anywhere. Most of the current intelligent terminals comprise front-mounted cameras, and therefore users can conveniently take pictures by themselves. Beauty is the nature of people, and people want photos shot by themselves to be more beautiful, so that various beauty methods are promoted. At present, in the related art, when beautifying, key points of five sense organs need to be determined, and then the special beautifying effect is performed based on the key points. The accuracy of the key points of the five sense organs is therefore closely related to the degree of fit and reality to the cosmetic effect.
Disclosure of Invention
The embodiment of the application provides an image processing method, an image processing device and a computer readable storage medium, which can improve the image processing precision.
The technical scheme of the embodiment of the application is realized as follows:
an embodiment of the present application provides an image processing method, including:
acquiring an image to be processed, a trained first model and a trained second model;
predicting the image to be processed by using the trained first model to obtain a first key point set;
determining a target area image from the image to be processed based on the first key point set;
predicting the target area image by using the trained second model to obtain a second key point set, wherein the first key point set is a subset of the second key point set;
and carrying out special effect processing on the target area image based on the second key point set to obtain a processed image.
An embodiment of the present application provides an image processing apparatus, including:
the first acquisition module is used for acquiring an image to be processed, a trained first model and a trained second model;
the first prediction module is used for performing prediction processing on the image to be processed by using the trained first model to obtain a first key point set;
a first determining module, configured to determine a target area image from the image to be processed based on the first keypoint set;
the second prediction module is used for performing prediction processing on the target area image by using the trained second model to obtain a second key point set, and the first key point set is a subset of the second key point set;
and the special effect processing module is used for carrying out special effect processing on the target area image based on the second key point set to obtain a processed image.
In some embodiments, the target region image comprises a first target sub-region image and a second target sub-region image that are symmetric, and the second prediction module is further configured to:
predicting the first target sub-region image by using the trained second model to obtain a first predicted point set;
turning the second target subregion image to obtain a processed second target subregion image;
predicting the processed second target sub-region image by using the trained second model to obtain a second prediction point set;
determining a union of the first predicted point set and the second predicted point set as the second keypoint set.
In some embodiments, the apparatus further comprises:
the second acquisition module is used for acquiring a preset first model and training data, wherein the training data comprises a plurality of training images and a labeling key point set in the training images, and the labeling key point set comprises a first labeling key point subset;
the first processing module is used for carrying out alignment and enhancement processing on the training image to obtain a processed training image;
the third prediction module is used for performing prediction processing on the processed training image by using the preset first model to obtain a prediction key point corresponding to the processed training image;
and the first training module is used for training the first model based on the prediction key points of the processed training images and the first labeling key point subset to obtain a trained first model.
In some embodiments, the first processing module is further configured to:
acquiring a reference key point set in a reference image;
determining an affine matrix for performing registration processing based on the labeling key point set and the reference key point set of the training image;
carrying out alignment processing on the training image by utilizing the affine matrix to obtain an aligned training image;
determining an enhancement matrix based on the center coordinates, the rotation angle and the scaling factor of the aligned training images;
and performing enhancement processing on the aligned training image by using the enhancement matrix to obtain a processed training image.
In some embodiments, the apparatus further comprises:
a third obtaining module, configured to obtain a preset second model and training data, where the training data includes a plurality of training images and a set of labeled key points in the training images;
the second determining module is used for determining a training area image from each training image based on the labeling key point set of each training image;
the second processing module is used for carrying out alignment and enhancement processing on the training area image to obtain a processed training area image;
the fourth prediction module is used for performing prediction processing on the processed training area image by using the preset second model to obtain a prediction key point corresponding to the processed training area image;
and the second training module is used for training the second model based on the prediction key points and the labeling key point set of each processed training area image to obtain a trained second model.
In some embodiments, the apparatus further comprises:
the fourth acquisition module is used for acquiring a plurality of artificially labeled training images and the artificial labeling information of each artificially labeled training image;
the third training module is used for training a preset annotation model by using the training images of the artificial annotations and the artificial annotation information of the training images of the artificial annotations to obtain a trained annotation model;
a fifth obtaining module, configured to obtain a plurality of training images to be labeled;
the fifth prediction module is used for performing prediction processing on the training images to be labeled by using the trained labeling model to obtain the prediction labeling information of the training images to be labeled;
and the third determining module is used for determining the training images to be labeled, the artificial labeling information of the training images to be labeled, the training images to be labeled and the prediction labeling information of the training images to be labeled as training data when the prediction labeling information of the training images to be labeled meets the labeling conditions.
In some embodiments, the apparatus further comprises:
the fourth determination module is used for determining the training image to be labeled as a target training image and acquiring the updating operation of the prediction labeling information aiming at the target training image when the training image to be labeled does not meet the labeling condition;
the updating module is used for responding to the updating operation and updating the prediction marking information of the target training image to obtain updated prediction marking information;
and the fourth determining module is used for determining the training images marked artificially, the artificial marking information of the training images marked artificially, the training images to be marked meeting marking conditions, the prediction marking information of the training images to be marked meeting marking conditions, the target training images and the updated prediction marking information of the target training images as training data.
In some embodiments, the apparatus further comprises:
a sixth obtaining module, configured to obtain an original image, and perform face detection on the original image to obtain a detection result;
and the fifth determining module is used for determining that the detection result represents that the original image comprises a face region image and determining the original image as an image to be processed.
An embodiment of the present application provides a computer device, including:
a memory for storing executable instructions;
and the processor is used for realizing the method provided by the embodiment of the application when executing the executable instructions stored in the memory.
Embodiments of the present application provide a computer-readable storage medium, which stores executable instructions for causing a processor to implement the method provided by the embodiments of the present application when the processor executes the executable instructions.
Embodiments of the present application provide a computer program product, which includes a computer program or instructions, and when the computer program or instructions are executed by a processor, the computer program or instructions implement the method provided by embodiments of the present application.
The embodiment of the application has the following beneficial effects:
after an image to be processed is obtained, firstly, a trained first model is used for conducting prediction processing on the image to be processed to obtain a first key point set, then a target area image is determined from the image to be processed based on the first key point set, prediction processing is conducted on the target area image by using a trained second model to obtain a second key point set, the first key point set is a subset of the second key point set, namely the first key point set is a sparse key point, the second key point set is a dense key point, and finally special effect processing is conducted on the target area image based on the second key point set to obtain a processed image. When the key point in the image to be processed is determined, the two cascaded models are used for prediction determination, so that the problem of data coupling caused by the fact that one model is used for determining the key point can be solved, the accuracy of the determined key point can be improved, and the accuracy of image processing is further improved.
Drawings
FIG. 1 is a schematic diagram of sparse key points and dense key points of human faces and eyes in the related art;
FIG. 2 is a network architecture diagram of an image processing system architecture according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a terminal 400 provided in an embodiment of the present application;
fig. 4 is a schematic flowchart of an implementation of an image processing method according to an embodiment of the present application;
fig. 5A is a schematic flow chart of an implementation process of training a first model according to an embodiment of the present application;
fig. 5B is a schematic implementation flow diagram of training a second model according to an embodiment of the present application;
fig. 6 is a schematic flowchart of another implementation of the image processing method according to the embodiment of the present application;
FIG. 7 is a schematic diagram of eye sparse keypoints provided by an embodiment of the present application;
FIG. 8 is a schematic diagram of dense ocular keypoints provided by an embodiment of the present application;
FIG. 9 is a flowchart illustrating a training process for a first model according to an embodiment of the present disclosure;
FIG. 10 is a diagram illustrating the effect of the module for aligning and enhancing training pictures;
FIG. 11 is a diagram illustrating the alignment and enhancement of an eye image by the module;
FIG. 12 is a schematic flow chart illustrating an implementation of using a model to perform keypoint prediction according to an embodiment of the present application;
FIG. 13 is a schematic diagram illustrating the effect of eye makeup processing using an image processing method according to the related art;
fig. 14 is a schematic diagram illustrating an effect of performing make-up treatment by using the image processing method according to the embodiment of the present application.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
In order to better understand the image processing method provided by the embodiment of the present application, first, the image processing method and the existing disadvantages in the related art will be described.
In the embodiment of the present application, an example of performing special effect processing on an eye region in an image is described. In the related art, 106 key points of a face are directly predicted through a model. In the related art, 240 dense key points of the face can be predicted by using a model, wherein the eye key points among the 240 key points of the face are shown as 102 in fig. 1. After the eye key points are determined, special effect processing is carried out on the eye region based on the eye key points.
In the related art, all key points of the face are directly regressed through one model, and a data coupling problem exists, and the main reason of the problem is caused by insufficient distribution diversity of data. For example: most of the face data is in the case of double closed eyes, and then the key points of the double closed eyes are output when the scheme of one model meets the condition of single closed eye. The scheme of adopting a model usually needs great manpower to collect various data, so the scheme is simple, but cannot meet the requirement of eye makeup special effect in precision.
Embodiments of the present application provide an image processing method, an image processing apparatus, a device, and a computer-readable storage medium, which can improve the precision of image processing, and an exemplary application of a computer device provided in the embodiments of the present application is described below, where the device provided in the embodiments of the present application can be implemented as various types of user terminals such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, and a portable game device), and can also be implemented as a server. In the following, an exemplary application will be explained when the device is implemented as a terminal.
Referring to fig. 2, fig. 2 is a schematic diagram of a network architecture of an image processing system 100 according to an embodiment of the present application, and as shown in fig. 2, the network architecture includes: server 200, network 300 and terminal 400. Wherein, the terminal 400 is connected to the server 200 through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.
The terminal 400 may be installed with various applications such as a photographing application, a video viewing application, an instant messaging application, an image processing application, and the like. The image processing method provided by the embodiment of the application can be used as an additional function in a photographing application program, a live broadcast application program or an image processing application program in a plug-in mode and is used for beautifying the face image.
When the terminal 400 receives an operation instruction for image processing and determines that the image processing is performed by using the image processing method provided by the embodiment of the application based on the operation instruction, after an image to be processed is obtained, a trained first model and a trained second model are obtained, where the trained first model and the trained second model may be obtained from the server 200, then the trained first model is used to perform prediction processing on the image to be processed to obtain a first key point set, and a target area image is determined from the image to be processed based on the first key point set; then, the trained second model is used for carrying out prediction processing on the target area image to obtain a second key point set, and the first key point set is a subset of the second key point set; and finally, carrying out special effect processing on the target area image based on the second key point set to obtain a processed image.
In some embodiments, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted smart terminal, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a terminal 400 according to an embodiment of the present application, where the terminal 400 shown in fig. 3 includes: at least one processor 410, memory 450, at least one network interface 420, and a user interface 430. The various components in the terminal 400 are coupled together by a bus system 440. It is understood that the bus system 440 is used to enable communications among the components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 440 in FIG. 3.
The Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc., wherein the general purpose Processor may be a microprocessor or any conventional Processor, etc.
The user interface 430 includes one or more output devices 431, including one or more speakers and/or one or more visual displays, that enable the presentation of media content. The user interface 430 also includes one or more input devices 432, including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display screen, camera, other input buttons and controls.
The memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 450 optionally includes one or more storage devices physically located remote from processor 410.
The memory 450 includes both volatile memory and nonvolatile memory, and can include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 450 described in embodiments herein is intended to comprise any suitable type of memory.
In some embodiments, memory 450 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 451, including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;
a network communication module 452 for communicating to other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
a presentation module 453 for enabling presentation of information (e.g., a user interface for operating peripheral devices and displaying content and information) via one or more output devices 431 (e.g., a display screen, speakers, etc.) associated with user interface 430;
an input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.
In some embodiments, the apparatus provided by the embodiments of the present application may be implemented in software, and fig. 3 illustrates an image processing apparatus 455 stored in the memory 450, which may be software in the form of programs and plug-ins, and the like, and includes the following software modules: the first obtaining module 4551, the first predicting module 4552, the first determining module 4553, the second predicting module 4554, and the special effects processing module 4555, which are logical and thus may be arbitrarily combined or further divided according to the functions implemented. The functions of the respective modules will be explained below.
In other embodiments, the apparatus provided in the embodiments of the present Application may be implemented in hardware, and for example, the apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the image processing method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), field Programmable Gate Arrays (FPGAs), or other electronic components.
In some embodiments, the terminal or the server may implement the image processing method provided by the embodiment of the present application by running a computer program. For example, the computer program may be a native program or a software module in an operating system; the Application program may be a local (Native) Application program (APP), that is, a program that needs to be installed in an operating system to run, such as a live APP, a video viewing APP, or an instant messaging APP; or may be an applet, i.e. a program that can be run only by downloading it to the browser environment; but also an applet that can be embedded into any APP. In general, the computer programs described above may be any form of application, module or plug-in.
The image processing method provided by the embodiment of the present application will be described in conjunction with exemplary applications and implementations of the terminal provided by the embodiment of the present application.
An embodiment of the present application provides an image processing method, which is applied to a computer device, where the computer device may be a terminal or a server, and in the embodiment of the present application, the computer device is taken as an example for description. Fig. 4 is a schematic flow chart of an implementation of the image processing method provided in the embodiment of the present application, and the image processing method provided in the embodiment of the present application will be described below with reference to the steps shown in fig. 4.
Step S101, obtaining an image to be processed, a trained first model and a trained second model.
In the embodiment of the application, the to-be-processed image acquisition may be an image acquired by an image acquisition device of the acquisition terminal in real time, or each video frame image in a video acquired by the image acquisition device, and at this time, the to-be-processed image is a real face image.
The trained first model and the trained second model may be neural network models, deep learning network models, convolution network models, or the like. The training process of the first model and the second model may be implemented by the server, and when implemented, the terminal may obtain the trained first model and the trained second model from the server.
And S102, performing prediction processing on the image to be processed by using the trained first model to obtain a first key point set.
In the embodiment of the present application, the first model may be a lightweight neural network model, for example, a mobileNet series neural network model: mobilonetv 2, mobilonetv 1, mobilonetv 3, etc., although the first model may be other types of neural network models, the type of the first model is not limited in the embodiments of the present application.
The first model is exemplified as mobileNetV 2. When the step is realized, the trained first model firstly expands the input low-dimensional compressed representation to a high-dimensional compressed representation, then uses a lightweight deep convolution for filtering, and finally uses a linear convolution to project the features back to the low-dimensional representation, thereby obtaining a first key point set.
In this embodiment of the application, the first keypoint set may be a keypoint corresponding to a certain organ in facial features, for example, a keypoint corresponding to an eye region, a keypoint corresponding to an eyebrow region, or a keypoint corresponding to an ear region. The first keypoint set comprises the first keypoints and the position information of the first keypoints.
Step S103, determining a target area image from the image to be processed based on the first key point set.
The step is realized by connecting adjacent first key points based on the position information of the first key points in the first key point set, so as to determine the target area image from the image to be processed.
And step S104, performing prediction processing on the target area image by using the trained second model to obtain a second key point set.
Here, the first set of keypoints is a subset of the second set of keypoints, that is, the first keypoints are sparse keypoints, and the second set of keypoints is dense keypoints. The first set of key points is used to preliminarily specify the position of a portion to be subjected to special effect processing. The second key point set is used for carrying out special effect processing on the target area image.
When the step is realized, the target area image is used as the input of a trained second model, and the trained second model is used for carrying out prediction processing on the target area image to obtain a second key point set of the target area image. In this embodiment of the present application, the second model may also be a lightweight neural network model, and the second model and the first model may be neural network models with the same structure, for example, both are a mobileNetV2 model, or may be neural network models with different structures, for example, the first model is a mobileNetV1 model, and the second model is a mobileNetV2 model, which is not limited in this embodiment of the present application.
Step S105, performing special effect processing on the target area image based on the second key point set to obtain a processed image.
When the step is implemented, the target area image can be subjected to special effect processing based on the second key point set according to a preset special effect processing algorithm, so that a processed image is obtained. For example, the special effect processing algorithm is a large-eye special effect algorithm, when the method is implemented, the position information of each second key point in the second key point set can be used for triangulating the target area image, then local scaling is performed on a plurality of triangular patches obtained by triangulation, each updated second key point corresponding to each second key point after scaling is determined, the updated target area image is determined according to the updated second key point, and the pixel value of each pixel point in the updated target area image is determined, so that the special effect processing of the target area image is implemented, and the processed image is obtained.
In the image processing method provided in the embodiment of the present application, after an image to be processed is obtained, a trained first model is first used to perform prediction processing on the image to be processed to obtain a first key point set, a target area image is determined from the image to be processed based on the first key point set, a trained second model is used to perform prediction processing on the target area image to obtain a second key point set, the first key point set is a subset of the second key point set, that is, the first key point set is a sparse key point, the second key point set is a dense key point, and finally, special effect processing is performed on the image to be processed based on the second key point set to obtain a processed image. When the key points in the image to be processed are determined, the two cascaded models are used for prediction determination, so that the problem of data coupling caused by the fact that one model is used for determining the key points can be solved, the accuracy of the determined key points can be improved, and the accuracy of image processing is further improved.
In some embodiments, the target region image includes a first target sub-region image and a second target sub-region image that are symmetric, and the step S104 "performs prediction processing on the target region image by using the trained second model to obtain a second keypoint set" may be implemented by:
and S1041, performing prediction processing on the first target sub-region image by using the trained second model to obtain a first prediction point set.
In the embodiment of the present application, the target area image may be an image of an area where eyes are located, an image of an area where a nose is located, an area of an image where eyebrows are located, an image of an area where a mouth is located, an image of an area where ears are located, or the like. When the target area image is an image of an area where eyes are located, an image of an area where eyebrows are located, or an image of an area where ears are located, the target area image includes two symmetric sub-images: the image processing method comprises the steps of obtaining a first target subregion image and a second target subregion image, wherein the first target subregion image is an image of a region where a left eye is located, and the second target subregion image is a region of an image where a right eye is located; or the first target subregion image is an image of a region where the left eyebrow is located, and the second target subregion image is an image of a region where the right eyebrow is located. At this time, the trained second model may perform prediction processing only on the first target sub-region image, so as to obtain a first prediction point set corresponding to the first target sub-region image.
In some embodiments, when the image processing method is to perform special effect processing on a target region image including two symmetric target sub-images, then when training the second model, the second model may be trained using only the first target sub-region image in the training image.
Step S1042, performing a flipping process on the second target sub-region image to obtain a processed second target sub-region image.
Since the target region image is a part of the face region image, if the first target subregion image and the second target subregion image are included in the target region image in symmetry, the first target subregion image and the second target subregion image are generally in bilateral symmetry, for example, two eyes are in bilateral symmetry and two eyebrows are also in bilateral symmetry. Then, in order to enable the trained second model to accurately predict the key points in the second target sub-region image, the second target sub-model needs to be flipped left and right, so as to obtain a processed second target sub-region image.
And S1043, performing prediction processing on the processed second target sub-region image by using the trained second model to obtain a second prediction point set.
Step S1044, determining a union of the first predicted point set and the second predicted point set as the second keypoint set.
In the above steps S1041 to S1044, for a target area image including two symmetric sub-areas, a trained second model is used to perform prediction processing on a first target sub-area image therein to obtain a corresponding first prediction point set, then a second target sub-area image is subjected to inversion processing, and a trained second model is used to perform prediction processing on a second target sub-area image subjected to inversion processing to obtain a second prediction point set.
In some embodiments, the trained first model may be obtained through steps S001 to S005 shown in fig. 5A before step S101, and an implementation flow of training the first model is described below with reference to fig. 5A.
And S001, acquiring a preset first model and training data.
The training data comprises a plurality of training images and labeling key point sets in the training images, and the labeling key point sets in the training images comprise labeling key points of the whole face area. The set of annotation keypoints comprises a first subset of annotation keypoints. That is, the number of the key points in the first labeled key point subset is less than that of the key points in the labeled key point set, that is, the first labeled key point subset includes labeled sparse key points, and the labeled key point set includes labeled dense key points.
And step S002, performing alignment and enhancement processing on the training image to obtain a processed training image.
When the step is implemented, firstly, an affine matrix is calculated by using a training image and a reference image, and the training image is subjected to alignment processing by using the affine matrix to obtain an aligned training image; and then determining an enhancement matrix by using the center coordinates, the rotation angle and the scaling factor of the aligned training image, and enhancing the processed training image by using the enhancement matrix to obtain the processed training image.
And S003, performing prediction processing on the processed training image by using the preset first model to obtain a prediction key point corresponding to the processed training image.
Here, the processed training image is used as an input of a preset first model, and the prediction processing is performed on the processed training image by using the first model to obtain a prediction key point corresponding to the processed training image. In the embodiment of the present application, the predicted keypoints corresponding to the processed training images obtained in this step are sparse keypoints.
And step S004, training the first model based on the prediction key points of the processed training images and the first labeling key point subset to obtain a trained first model.
Here, a difference value between the predicted keypoint and the first labeled keypoint subset may be determined based on the predicted keypoint, the first labeled keypoint subset, and a preset loss function (for example, may be an L1 loss function) of each processed training image, and then the first model is subjected to back propagation training based on the difference value, that is, model parameters of the first model are adjusted, so as to obtain a trained first model.
In some embodiments, the step S002 "of performing alignment and enhancement processing on the training image to obtain a processed training image" can be implemented by the following steps:
step S0021, a reference key point set in a reference image is obtained.
The reference image is also a standard face image, which is an image in which the face is directly in front and is not tilted. The set of reference keypoints in the reference image comprises the respective reference keypoints in the reference image (standard face image).
Step S0022, determining an affine matrix for performing registration processing based on the labeling key point set and the reference key point set of the training image.
When implemented, it can be calculated
Figure 831768DEST_PATH_IMAGE001
To determine an affine matrix, wherein the affine matrix is
Figure 280067DEST_PATH_IMAGE002
The affine matrix is represented by 4 parameters
Figure 964864DEST_PATH_IMAGE003
The 4 parameters can be solved by a least square method.
And S0023, performing alignment processing on the training image by using the affine matrix to obtain an aligned training image.
When the affine matrix is used to perform the alignment processing on the training image, the pixel matrix corresponding to the training image may be multiplied by the affine matrix, and the obtained multiplication result is also the aligned training image.
And step S0024, determining an enhancement matrix based on the center coordinate, the rotation angle and the scaling factor of the aligned training image.
In the embodiment of the present application, the rotation angle may be determined by the formula (1-1):
Figure 156811DEST_PATH_IMAGE004
(1-1);
where rot _ std is a hyper-parameter, which may be set to 10 degrees in general. [ -45,45]Indicating the range of rotation. The image scaling factor can be obtained by utilizing the network input size of the first model, the variable for controlling the scaling factor and the maximum value of the length and the width of the face bounding box
Figure 377708DEST_PATH_IMAGE005
Calculated, where maxhw is the maximum value of the length and width of the face bounding box, netInputSize is the size of the network input, typically 128 × 128, sc is a variable that controls the scaling factor, sc can be determined by equation (1-2):
Figure 731329DEST_PATH_IMAGE006
(1-2);
where scale _ std is a hyper-parameter, which may be implemented as a function of a random number, 0.05,randn ().
And step S0025, enhancing the aligned training image by using the enhancement matrix to obtain a processed training image.
Similar to the implementation process of step S0023, the aligned training image is enhanced by using the enhancement matrix, and when the enhancement is implemented, the aligned training image and the enhancement matrix may be subjected to product operation, and an obtained product result is also the processed training image.
Through the steps S0021 to S0025, the training image can be aligned and enhanced, so that the face region image in the processed training image is a positive and clear face image, and the training efficiency and the prediction accuracy of the trained model are improved in the training process.
In some embodiments, the trained second model may be obtained through steps S011 to S015 shown in fig. 5B before step S101, and the implementation flow of training the second model is described below with reference to fig. 5B.
And S011, acquiring a preset second model and training data.
The training data includes a plurality of training images and a set of labeled keypoints in the training images. The set of labeled keypoints in the training image includes labeled keypoints for each part of the whole face region. The annotation key point set comprises a first annotation key point subset which is also a key point comprising each part of the whole face, but the first annotation key point subset is a subset of the annotation key point set, so that the first annotation key point subset comprises sparse key points and the annotation key point set comprises dense key points.
In step S012, a training area image is specified from each of the training images based on the labeling key point set of each of the training images.
When the method is realized, a first target key point set corresponding to the target part is determined from the labeling key point set of each training image, then adjacent key points in the first target key point set of each training image are connected to obtain a training area image in each training image, at this moment, the adjacent dense key points are connected to obtain each training area image, and therefore the accuracy of the training area images is ensured to be higher.
For example, when the target portion is an eye, a first target keypoint set of the eye portion may be determined from the labeling keypoint set of each training image, and the first target keypoint set may include dense keypoints of the eye portion (left eye and right eye) or dense keypoints of the left eye. When the step is implemented, adjacent key points in the first target key point set can be connected, so that the training area image corresponding to the eye part in each training image is determined.
In some embodiments, a second target keypoint set corresponding to the target portion may be determined from the first labeling keypoint subset of each training image, and then adjacent keypoints in the second target keypoint set are connected to obtain a training image of each training image, that is, adjacent sparse keypoints are connected to obtain each training region image, so that the complexity of determining the training region image can be reduced, and the processing efficiency can be improved.
And S013, performing alignment and enhancement processing on the training region image to obtain a processed training region image.
When the step is realized, a reference image and reference key points corresponding to a target part in the reference image can be obtained, an affine matrix and an enhancement matrix are determined based on a key point set corresponding to the training area image and a reference key point set corresponding to the target part in the reference image, and then the training area image is aligned and enhanced by using the affine matrix and the enhancement matrix to obtain a processed training area image.
In some embodiments, the processed training image may be acquired based on the above steps S0021 to S0025, and then the region image corresponding to the target portion in the processed training image may be determined as the training region image.
Step S014, performing prediction processing on the processed training area image by using the preset second model to obtain a prediction key point corresponding to the processed training area image.
Here, the processed training area image is used as an input of a preset second model, and the second model is used to perform prediction processing on the processed training area image to obtain a prediction key point corresponding to the processed training area image. In the embodiment of the present application, the predicted keypoints corresponding to the processed training region image obtained in the step are dense keypoints.
Step S015, training the second model based on the prediction key points and the labeling key point set of each processed training area image to obtain a trained second model.
During implementation, a difference value between the prediction key point and the labeling key point corresponding to the target portion in the set of the labeling key points and a preset loss function (for example, an L1 loss function) may be determined based on the prediction key point of each processed training region image, the labeling key point corresponding to the target portion in the set of the labeling key points, and then the second model is subjected to back propagation training based on the difference value, that is, model parameters of the second model are adjusted, so that the trained second model is obtained.
In some embodiments, prior to training the first model and the second model, training data may be obtained by:
step S201, acquiring a plurality of artificially labeled training images and artificial labeling information of each artificially labeled training image.
In the embodiment of the present application, the artificially labeled training image includes a face image, and the artificially labeled training image includes the key points of the artificially labeled face region image, where the key points may be dense key points.
Step S202, training a preset annotation model by using the training images of the artificial annotations and the artificial annotation information of the training images of the artificial annotations to obtain a trained annotation model.
Here, the preset labeling model is a neural network model, and may be, for example, a deep learning neural network model, a convolutional neural network model, or the like. The method comprises the steps of utilizing a preset labeling model to carry out prediction processing on each artificially labeled training image to obtain a prediction key point of each artificially labeled training image, then utilizing the prediction key point, the artificial labeling information of each artificially labeled training image and a preset loss function to determine a difference value between the prediction key point and the artificial labeling information, and utilizing the difference value to adjust parameters of the labeling model until the trained labeling model is obtained.
Step S203, a plurality of training images to be labeled are obtained.
The training image to be labeled comprises a face region image, but key points of the face region image are not labeled.
And step S204, carrying out prediction processing on the training images to be labeled by utilizing the trained labeling model to obtain the prediction labeling information of the training images to be labeled.
Step S205, determining whether the prediction annotation information of the training image to be annotated meets the annotation condition.
When the method is implemented, after the prediction annotation information of the training image to be annotated is determined, the training image to be annotated with the prediction annotation information can be output, and determining whether the prediction annotation information of the training image to be annotated meets the annotation condition can be determining whether a confirmation operation meeting the annotation condition is received, and when the confirmation operation is received, determining that the prediction annotation information of the training image to be annotated meets the annotation condition, entering step S206; and when the confirmation operation is not received, determining that the training image to be labeled does not meet the labeling condition, and entering step S207.
Step S206, determining the training images to be labeled, the artificial labeling information of the training images to be labeled, the training images to be labeled and the prediction labeling information of the training images to be labeled as training data.
Step S207, determining the training image to be labeled as a target training image, and obtaining an update operation of the prediction labeling information for the target training image.
When the training image to be labeled does not meet the labeling condition, the prediction key points needing to be adjusted exist in the prediction labeling information in the training image to be labeled, and at the moment, the partial prediction key points output by using the labeling model can be adjusted manually. In the embodiment of the present application, the operation of updating the prediction annotation information for the target training image may be a moving operation on a prediction key point in the target training image.
Step S208, in response to the updating operation, updating the prediction labeling information of the target training image to obtain updated prediction labeling information.
Step S209, determining the artificially labeled training images, the artificial labeling information of the artificially labeled training images, the to-be-labeled training images satisfying the labeling conditions, the predictive labeling information of the to-be-labeled training images satisfying the labeling conditions, and the updated predictive labeling information of the target training images and the target training images as training data.
Through the steps S201 to S209, when training data for training the first model and the second model are obtained, only a small number of training images need to be manually marked, then the marking model of the automatically marked image is trained by using the manually marked training images, and then the trained marking model is used to automatically mark the training image to be marked, when it is determined that the marked key points need to be adjusted, manual adjustment is performed, and finally the manually marked training image and the updated automatically marked training image are determined as the training data.
Based on the foregoing embodiments, an embodiment of the present application further provides an image processing method, which is applied to the network architecture shown in fig. 2, and fig. 6 is a schematic diagram of another implementation flow of the image processing method provided in the embodiment of the present application, as shown in fig. 6, the flow includes:
in step S301, the terminal obtains an original image in response to the received image processing instruction.
And step S302, the terminal carries out face detection on the original image to obtain a detection result.
The detection result can represent whether the original image includes a face region image, wherein when the detection result is determined to represent that the original image includes the face region image, the image processing method provided by the embodiment of the application can be used for image processing, and then the step S303 is performed; if the detection result indicates that the original image does not include the face region image, the image processing method provided by the embodiment of the application cannot be used for image processing, and the flow is ended.
Step S303, determining that the detection result represents that the original image comprises a face region image, and determining the original image as an image to be processed.
And step S304, the terminal acquires the trained first model and the trained second model from the server.
The trained first model and the trained second model can be neural network models, deep learning network models, convolution network models and the like. The training process of the first model and the second model may be implemented by a server.
And S305, the terminal carries out prediction processing on the image to be processed by using the trained first model to obtain a first key point set.
When the step is realized, the trained first model firstly expands the input low-dimensional compressed representation to a high dimension, then uses a lightweight deep convolution for filtering, and finally uses a linear convolution to project the features back to the low-dimensional representation, thereby obtaining a first key point set.
In this embodiment of the application, the first keypoint set may be a keypoint corresponding to a certain organ in facial features, for example, a keypoint corresponding to an eye region, a keypoint corresponding to an eyebrow region, or a keypoint corresponding to an ear region. The first keypoint set comprises the first keypoints and the position information of the first keypoints.
Step S306, the terminal determines a target area image from the image to be processed based on the first key point set.
And step S307, the terminal carries out prediction processing on the target area image by using the trained second model to obtain a second key point set.
The first set of keypoints is a subset of the second set of keypoints.
And step S308, the terminal performs special effect processing on the target area image based on the second key point set to obtain a processed image.
It should be noted that, in the embodiments of the present application, the implementation procedures of the technical terms and steps that are the same as those of other embodiments may refer to other embodiments.
In the image processing method provided by the embodiment of the application, after receiving an operation instruction for image processing, a terminal firstly performs face detection on an acquired image, and determines the image as an image to be processed when the image is determined to contain a face area image, so that the image to be processed contains the face area image, and the accuracy of processing the image to be processed by using the image processing method provided by the embodiment of the application can be ensured. Then, firstly, a trained first model is used for conducting prediction processing on the image to be processed to obtain a first key point set, then a target area image is determined from the image to be processed based on the first key point set, prediction processing is conducted on the target area image by using a trained second model to obtain a second key point set, the first key point set is a subset of the second key point set, namely the first key point set is a sparse key point, the second key point set is a dense key point, and finally special effect processing is conducted on the target area image based on the second key point set to obtain a processed image. When the key points in the image to be processed are determined, the two cascaded models are used for prediction determination, so that the problem of data coupling caused by the fact that one model is used for determining the key points can be solved, the accuracy of the determined key points can be improved, and the accuracy of image processing is further improved.
Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
An embodiment of the present application provides an image processing method, and an example in which the image processing method is applied to an eye region is described in the embodiment of the present application. The image processing method can be an eye makeup special effect scheme aiming at real-time eye key points of a mobile terminal. The image processing method provided by the embodiment of the application can be realized by the following three steps:
step S501, data definition and labeling.
Step S502, model training.
Step S503, inference logic.
The following describes the implementation of each step.
And S501, defining and labeling data.
In the embodiment of the present application, two types of sparse and dense eye key points are defined, respectively, where the sparse eye point is defined as shown in fig. 7: the left eye and the right eye are both composed of 9 points, wherein 8 points are distributed on the contour line of the eyes, 1 point is distributed in the center of the pupil, and sparse points only provide basic positioning of the eyes. The eye contour curve cannot be well expressed only by 8 eye contour points, so that the precision requirement of eye makeup fitting cannot be met based on sparse point positions. In the embodiment of the present application, newly inserting 2 points between every two sparse points constitutes the definition of dense eye key points as shown in fig. 8: the left eye and the right eye are formed by 27 points, the positions of No. 1 to No. 11 points in the upper row of the left eye are the positions of black lines of eyelashes, and the positions of No. 13 to No. 23 points in the lower row are on a boundary line of white eyes and meat.
The strict semantics of the definition key points are very important, so that the consistency of manual labeling can be ensured to be high, the difficulty of model learning can be reduced, and the model precision is improved. The well-defined eye key points are the key for improving the eye makeup special effect, a makeup engineer and a visual algorithm engineer are often required to debug continuously, and the definition of the sparse points and the dense points provided in the embodiment of the application is a set of eye key points defined by summarizing a plurality of effect acceptance, polishing and contrast competitions. In addition, pupil positioning is preliminarily given by points 24, 26, 51 and 53 for pupil addition, and the points are points for supporting the subsequent special effect of the beautiful pupil.
In the embodiment of the application, the sparse key points are the subset of the dense key points, so that the complexity can be greatly reduced when data annotation is carried out, and the sparse key points can be obtained only by annotating the dense key points.
In order to continue to accelerate the labeling speed and save the labeling cost, in the embodiment of the application, an eye key point model scheme based on a heat map is designed for pre-labeling. The method comprises the following implementation steps:
in step S5011, about several thousand pieces of eye key point data are manually marked.
Step S5012, training an eye keypoint model of the heat map, and predicting an unmarked eye picture using the model.
Step S5013, the pre-marked eye images are manually repaired, and a training set is added.
Step S5014, repeat steps S5012, S5013.
The labeling scheme can improve the labeling speed, and can effectively improve the quality and consistency of the labeling of the eye key points, so that the iteration is carried out until the data volume and the diversity basically meet the requirements.
Step S502, model training.
In the related art, a scheme employing one model to fully regress dense eye keypoints may present a problem of component coupling. In order to solve this problem, in the embodiment of the present application, a cascading scheme is adopted, that is, there are two models. When a face picture is input by the first model, eye sparse points can be predicted, and the positions of eyes are preliminarily positioned. The image of the eye region is cut out based on the sparse points and used as the input of a second model to predict the dense eye key points. The structure adopted by the first model implemented in the present application is mobilenetv2_025, the classification layers of 1000 classes are changed to regression key point coordinates, fig. 9 is a flowchart for implementing the training process of the first model provided in the embodiment of the present application, and the following describes each step with reference to fig. 9.
Step S5021, a training picture is obtained.
The training picture is a picture for labeling.
Step S5022, randomly turning the training pictures horizontally.
And step S5023, aligning the face blocks.
And step S5024, enhancing the face blocks.
Step S5025, a preset first model is obtained.
Step S5026, predicting the key points using the first model.
Step S5027, calculates the loss.
In implementation, the loss is calculated using the predicted keypoints and the labeled keypoints to train the first model.
Because the image received by the first model in the training process is the image obtained by enhancing the face block image. Because the key points of the human face are marked in the training picture
Figure 404887DEST_PATH_IMAGE007
K is the number of key points, assuming standard face key points
Figure 134945DEST_PATH_IMAGE008
In the normal operation, the following affine matrix is calculated, so that:
Figure 538245DEST_PATH_IMAGE009
the affine matrix is represented by 4 parameters
Figure 938133DEST_PATH_IMAGE003
The 4 parameters can be solved by a least square method, and the input of the parameters is key points of the face to be aligned and the standard face. The alignment operation needs to be enhanced in scale, rotation and translation during training, so that the positive affine matrix M needs to be multiplied by a scale matrix S, a rotation matrix R and a translation matrix T. Acting on key points of human face by using alignment matrix M
Figure 223621DEST_PATH_IMAGE010
To obtain the face key points of the alignment key points
Figure 865693DEST_PATH_IMAGE011
Then calculate the center of the center
Figure 185816DEST_PATH_IMAGE012
The enhancement matrix a can be calculated as follows:
Figure 22185DEST_PATH_IMAGE013
wherein, the first and the second end of the pipe are connected with each other,
Figure 794969DEST_PATH_IMAGE014
a center point representing a rotation; rot represents the angle of rotation;
Figure 804513DEST_PATH_IMAGE005
representing the image scaling factor, maxhw is the maximum value in the length and width of the face bounding box, netInputSize is the network input size, typically 128 × 128, sc are variables that control the scaling factor, and the rotation angle rot can be determined by equation (1-1):
Figure 854509DEST_PATH_IMAGE004
(1-1);
where rot _ std is a hyper-parameter, which may be set to 10 degrees in general. [ -45,45] denotes the rotation range. sc can be determined by the formula (1-2):
Figure 720833DEST_PATH_IMAGE006
(1-2);
where scale _ std is a hyper-parameter, taking 0.05, randn () as a function of the random number.
A × M is still an affine operation, and a translation enhancement parameter may be added to the translation term to obtain formula (1-3):
Figure 121859DEST_PATH_IMAGE015
(1-3);
wherein, the first and the second end of the pipe are connected with each other,shin order to translate the parameters of the enhancement,shift std is a hyper-parameter, which can be taken as 0.039.
Translation in X directionTxAmount of translation in the y directionTyThis can be obtained by the following formulae (1-4), (1-5):
Figure 731832DEST_PATH_IMAGE016
(1-4);
wherein, the first and the second end of the pipe are connected with each other,sh 0 is composed ofshOf a first dimension, i.e.shThe x-direction component of (a) is,wis the width of netinputsize, typically 128.
Figure 964230DEST_PATH_IMAGE017
(1-5);
Wherein the content of the first and second substances,sh 1 is composed ofshOf a second dimension, i.e.shThe y-direction component of (a) is,his the height of netinputsize, and is typically 128.
And inputting the enhanced face block into a model, and calculating an L1 loss function by using the key points predicted by the model and the manually marked key points. FIG. 10 is a schematic diagram illustrating the effect of the module for aligning and enhancing training pictures: wherein 1001 is a training picture, 1002 is a picture obtained after alignment, and 1003 is a picture after enhancement.
The training of the second model is similar to the first model, the only difference is that 2 eyes are used, if the model size is allowed to occupy several hundred KB more, the left eye model and the right eye model can be used, however, the mobile terminal is usually very sensitive to the model of several hundred KB, in the embodiment of the application, only the left eye model is used, and the key point of the right eye can turn the right eye image into the left eye prediction. The training of the second model may also be performed with random flipping, right-average left eye alignment, enhancement, etc., which are similar to the training process of the first model, and fig. 11 is a schematic diagram illustrating the effect of the module for aligning and enhancing the eye images: where 1101 is a training picture, 1102 is a picture obtained after alignment, and 1103 is a picture after enhancement.
Fig. 12 is a schematic view of a flow for implementing the keypoint prediction by using a model according to the embodiment of the present application, where as shown in fig. 12, the flow includes:
step S5031, a picture to be predicted is obtained.
Step S5032, performing face detection on the picture to be predicted.
If the detection result obtained by the face detection indicates that the picture to be predicted comprises the face region, the step S5033 is executed; and if the detection result represents that the picture to be predicted does not comprise the face area, ending the process.
Step S5033, the first-level sparse point model is used for sparse key point prediction.
Step S5034, matting is carried out based on the determined sparse key points.
And step S5035, performing dense key point prediction by using the secondary dense point model.
In the inference logic process, a sparse point model and a dense point model are cascaded, a first-stage model predicts sparse eye points, a block diagram of the eyes can be preliminarily positioned based on the sparse eye points, the matting processing is carried out, and then the eye region image obtained after the matting processing is input into a second-stage model to obtain more accurate and dense second-stage model points.
Fig. 13 is a schematic diagram illustrating the effect of eye makeup processing by using the image processing method in the related art, where 1301 is a processing effect diagram in which both eyes are open, and 1302 is a processing effect diagram in which both eyes are closed, and it can be seen by comparing 1301 and 1302 that the image processing method in the related art has a good processing effect when both eyes are open, but has a poor processing effect when both eyes are closed.
Fig. 14 is a schematic view illustrating the effect of the image processing method according to the embodiment of the present application when performing eye makeup processing, where 1401 is a processing effect diagram in which both eyes are open, and 1402 is a processing effect diagram in which both eyes are closed, and it can be seen by comparing 1401 and 1402 that the image processing method according to the embodiment of the present application has a good processing effect when both eyes are open and both eyes are closed.
It is understood that, in the embodiments of the present application, the content related to the user information, for example, the data related to the image to be processed, etc., needs to be approved or agreed by the user when the embodiments of the present application are applied to specific products or technologies, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related countries and regions.
Continuing with the exemplary structure of the image processing apparatus 455 provided by the embodiments of the present application implemented as software modules, in some embodiments, as shown in fig. 3, the software modules stored in the image processing apparatus 455 of the memory 450 may include:
a first obtaining module 4551, configured to obtain an image to be processed, a trained first model, and a trained second model;
a first prediction module 4552, configured to perform prediction processing on the image to be processed by using the trained first model to obtain a first key point set;
a first determining module 4553, configured to determine a target area image from the image to be processed based on the first keypoint set;
a second prediction module 4554, configured to perform prediction processing on the target area image by using the trained second model to obtain a second keypoint set, where the first keypoint set is a subset of the second keypoint set;
and a special effect processing module 4555, configured to perform special effect processing on the target area image based on the second keypoint set, so as to obtain a processed image.
In some embodiments, the target region image comprises a first target sub-region image and a second target sub-region image that are symmetric, and the second prediction module is further configured to:
predicting the first target sub-region image by using the trained second model to obtain a first predicted point set;
turning the second target subregion image to obtain a processed second target subregion image;
predicting the processed second target sub-region image by using the trained second model to obtain a second prediction point set;
and determining the union of the first prediction point set and the second prediction point set as the second key point set.
In some embodiments, the apparatus further comprises:
the second acquisition module is used for acquiring a preset first model and training data, wherein the training data comprises a plurality of training images and a labeling key point set in the training images, and the labeling key point set comprises a first labeling key point subset;
the first processing module is used for carrying out alignment and enhancement processing on the training image to obtain a processed training image;
the third prediction module is used for performing prediction processing on the processed training image by using the preset first model to obtain a prediction key point corresponding to the processed training image;
and the first training module is used for training the first model based on the prediction key points of the processed training images and the first labeling key point subset to obtain a trained first model.
In some embodiments, the first processing module is further configured to:
acquiring a reference key point set in a reference image;
determining an affine matrix for performing registration processing based on the labeling key point set and the reference key point set of the training image;
performing alignment processing on the training image by using the affine matrix to obtain an aligned training image;
determining an enhancement matrix based on the center coordinates, the rotation angle and the scaling factor of the aligned training images;
and performing enhancement processing on the aligned training image by using the enhancement matrix to obtain a processed training image.
In some embodiments, the apparatus further comprises:
the third acquisition module is used for acquiring a preset second model and training data, wherein the training data comprises a plurality of training images and a labeling key point set in the training images;
the second determining module is used for determining a training area image from each training image based on the labeling key point set of each training image;
the second processing module is used for carrying out alignment and enhancement processing on the training area image to obtain a processed training area image;
the fourth prediction module is used for performing prediction processing on the processed training area image by using the preset second model to obtain a prediction key point corresponding to the processed training area image;
and the second training module is used for training the second model based on the prediction key points and the labeling key point set of each processed training area image to obtain a trained second model.
In some embodiments, the apparatus further comprises:
the fourth acquisition module is used for acquiring a plurality of artificially labeled training images and the artificial labeling information of each artificially labeled training image;
the third training module is used for training a preset annotation model by using the training images of the artificial annotations and the artificial annotation information of the training images of the artificial annotations to obtain a trained annotation model;
the fifth acquisition module is used for acquiring a plurality of training images to be labeled;
the fifth prediction module is used for performing prediction processing on the training images to be labeled by using the trained labeling model to obtain the prediction labeling information of the training images to be labeled;
and the third determining module is used for determining the training images to be labeled, the artificial labeling information of the training images to be labeled, the training images to be labeled and the prediction labeling information of the training images to be labeled as training data when the prediction labeling information of the training images to be labeled meets the labeling condition.
In some embodiments, the apparatus further comprises:
the fourth determining module is used for determining the training image to be labeled as a target training image and acquiring the updating operation of the prediction labeling information aiming at the target training image when the training image to be labeled does not meet the labeling condition;
the updating module is used for responding to the updating operation and updating the prediction marking information of the target training image to obtain updated prediction marking information;
and the fourth determining module is used for determining the training images marked artificially, the artificial marking information of the training images marked artificially, the training images to be marked meeting marking conditions, the prediction marking information of the training images to be marked meeting marking conditions, the target training images and the updated prediction marking information of the target training images as training data.
In some embodiments, the apparatus further comprises:
the sixth acquisition module is used for acquiring an original image and carrying out face detection on the original image to obtain a detection result;
and the fifth determining module is used for determining that the detection result represents that the original image comprises a face region image and determining the original image as an image to be processed.
It should be noted that, the embodiments of the present application are described with respect to an image processing apparatus, and similar to the description of the method embodiments, and have similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus, reference is made to the description of the embodiments of the method of the present application for understanding.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image processing method described in the embodiment of the present application.
Embodiments of the present application provide a computer-readable storage medium storing executable instructions, which when executed by a processor, will cause the processor to perform an image processing method provided by embodiments of the present application, for example, an image processing method as illustrated in fig. 4 and 6.
In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.
In some embodiments, the executable instructions may be in the form of a program, software module, script, or code written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may, but need not, correspond to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.
The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (11)

1. An image processing method, characterized in that the method comprises:
acquiring an image to be processed, a trained first model and a trained second model;
predicting the image to be processed by using the trained first model to obtain a first key point set, wherein the first key point set is a key point corresponding to a part to be subjected to special effect processing in facial features;
determining a target area image from the image to be processed based on the first key point set;
predicting the target area image by using the trained second model to obtain a second key point set, wherein the first key point set is a subset of the second key point set;
and performing special effect processing on the target area image based on the second key point set to obtain a processed image.
2. The method according to claim 1, wherein the target region image includes a first target sub-region image and a second target sub-region image that are symmetric, and the performing a prediction process on the target region image by using the trained second model to obtain a second keypoint set includes:
predicting the first target sub-region image by using the trained second model to obtain a first predicted point set;
turning the second target subregion image to obtain a processed second target subregion image;
predicting the processed second target sub-region image by using the trained second model to obtain a second prediction point set;
and determining the union of the first prediction point set and the second prediction point set as the second key point set.
3. The method of claim 1, further comprising:
acquiring a preset first model and training data, wherein the training data comprises a plurality of training images and a labeling key point set in the training images, and the labeling key point set comprises a first labeling key point subset;
carrying out alignment and enhancement processing on the training image to obtain a processed training image;
predicting the processed training image by using the preset first model to obtain a prediction key point corresponding to the processed training image;
and training the first model based on the prediction key points of the processed training images and the first labeling key point subset to obtain a trained first model.
4. The method of claim 3, wherein the performing the registration and enhancement processing on the training image to obtain a processed training image comprises:
acquiring a reference key point set in a reference image;
determining an affine matrix for performing registration processing based on the labeling key point set and the reference key point set of the training image;
carrying out alignment processing on the training image by utilizing the affine matrix to obtain an aligned training image;
determining an enhancement matrix based on the center coordinates, the rotation angle and the scaling factor of the aligned training image;
and performing enhancement processing on the aligned training image by using the enhancement matrix to obtain a processed training image.
5. The method of claim 1, further comprising:
acquiring a preset second model and training data, wherein the training data comprises a plurality of training images and a labeling key point set in the training images;
determining a training area image from each training image based on the labeling key point set of each training image;
carrying out alignment and enhancement processing on the training area image to obtain a processed training area image;
predicting the processed training area image by using the preset second model to obtain a prediction key point corresponding to the processed training area image;
and training the second model based on the prediction key points and the labeling key point set of each processed training area image to obtain a trained second model.
6. The method of claim 3 or 5, further comprising:
acquiring a plurality of artificially labeled training images and artificial labeling information of each artificially labeled training image;
training a preset annotation model by using the training images marked manually and the manual annotation information of the training images marked manually to obtain a trained annotation model;
acquiring a plurality of training images to be labeled;
predicting the training images to be labeled by using the trained labeling model to obtain the predicted labeling information of the training images to be labeled;
and when the prediction marking information of the training images to be marked meets the marking condition, determining the training images to be marked, the artificial marking information of the training images to be marked, the training images to be marked and the prediction marking information of the training images to be marked as training data.
7. The method of claim 6, further comprising:
when the training image to be labeled does not meet the labeling condition, determining the training image to be labeled as a target training image, and acquiring the updating operation of the prediction labeling information aiming at the target training image;
responding to the updating operation, updating the prediction marking information of the target training image to obtain updated prediction marking information;
and determining the training images marked artificially, the artificial marking information of the training images marked artificially, the training images to be marked meeting marking conditions, the prediction marking information of the training images to be marked meeting marking conditions, and the updated prediction marking information of the target training images and the target training images as training data.
8. The method according to any one of claims 1 to 4, further comprising:
acquiring an original image, and carrying out face detection on the original image to obtain a detection result;
and determining that the detection result represents that the original image comprises a face region image, and determining the original image as an image to be processed.
9. An image processing apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for acquiring an image to be processed, a trained first model and a trained second model;
the first prediction module is used for performing prediction processing on the image to be processed by utilizing the trained first model to obtain a first key point set, wherein the first key point set is a key point corresponding to a part to be subjected to special effect processing in facial features;
a first determining module, configured to determine a target area image from the image to be processed based on the first keypoint set;
the second prediction module is used for performing prediction processing on the target area image by using the trained second model to obtain a second key point set, and the first key point set is a subset of the second key point set;
and the special effect processing module is used for carrying out special effect processing on the target area image based on the second key point set to obtain a processed image.
10. A computer device, characterized in that the computer device comprises:
a memory for storing executable instructions;
a processor for implementing the method of any one of claims 1 to 8 when executing executable instructions stored in the memory.
11. A computer-readable storage medium storing executable instructions, wherein the executable instructions, when executed by a processor, implement the method of any one of claims 1 to 8.
CN202210781883.9A 2022-07-05 2022-07-05 Image processing method, device, equipment and computer readable storage medium Active CN114881893B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210781883.9A CN114881893B (en) 2022-07-05 2022-07-05 Image processing method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210781883.9A CN114881893B (en) 2022-07-05 2022-07-05 Image processing method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN114881893A CN114881893A (en) 2022-08-09
CN114881893B true CN114881893B (en) 2022-10-21

Family

ID=82683012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210781883.9A Active CN114881893B (en) 2022-07-05 2022-07-05 Image processing method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN114881893B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830196B (en) * 2022-12-09 2024-04-05 支付宝(杭州)信息技术有限公司 Virtual image processing method and device
CN115731375B (en) * 2022-12-09 2024-05-10 支付宝(杭州)信息技术有限公司 Method and device for updating virtual image

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210158023A1 (en) * 2018-05-04 2021-05-27 Northeastern University System and Method for Generating Image Landmarks
CN108810413B (en) * 2018-06-15 2020-12-01 Oppo广东移动通信有限公司 Image processing method and device, electronic equipment and computer readable storage medium
US11222196B2 (en) * 2018-07-11 2022-01-11 Samsung Electronics Co., Ltd. Simultaneous recognition of facial attributes and identity in organizing photo albums
CN109359575B (en) * 2018-09-30 2022-05-10 腾讯科技(深圳)有限公司 Face detection method, service processing method, device, terminal and medium
CN109919048A (en) * 2019-02-21 2019-06-21 北京以萨技术股份有限公司 A method of face critical point detection is realized based on cascade MobileNet-V2
CN110020633B (en) * 2019-04-12 2022-11-04 腾讯科技(深圳)有限公司 Training method of posture recognition model, image recognition method and device
US20220156554A1 (en) * 2019-06-04 2022-05-19 Northeastern University Lightweight Decompositional Convolution Neural Network
CN110517214B (en) * 2019-08-28 2022-04-12 北京百度网讯科技有限公司 Method and apparatus for generating image
CN110956082B (en) * 2019-10-17 2023-03-24 江苏科技大学 Face key point detection method and detection system based on deep learning
CN111325851B (en) * 2020-02-28 2023-05-05 腾讯科技(深圳)有限公司 Image processing method and device, electronic equipment and computer readable storage medium
AU2021313620A1 (en) * 2020-07-21 2023-03-09 Royal Bank Of Canada Facial recognition tokenization
CN112069992A (en) * 2020-09-04 2020-12-11 西安西图之光智能科技有限公司 Face detection method, system and storage medium based on multi-supervision dense alignment

Also Published As

Publication number Publication date
CN114881893A (en) 2022-08-09

Similar Documents

Publication Publication Date Title
CN114881893B (en) Image processing method, device, equipment and computer readable storage medium
WO2021047396A1 (en) Image processing method and apparatus, electronic device and computer-readable storage medium
CN110148102B (en) Image synthesis method, advertisement material synthesis method and device
EP4198814A1 (en) Gaze correction method and apparatus for image, electronic device, computer-readable storage medium, and computer program product
CN111739035B (en) Image processing method, device and equipment based on artificial intelligence and storage medium
CN111445486B (en) Image processing method, device, equipment and computer readable storage medium
WO2021213067A1 (en) Object display method and apparatus, device and storage medium
CN109472360A (en) Update method, updating device and the electronic equipment of neural network
WO2022143179A1 (en) Virtual character model creation method and apparatus, electronic device, and storage medium
CN112419170A (en) Method for training occlusion detection model and method for beautifying face image
CN110458924B (en) Three-dimensional face model establishing method and device and electronic equipment
US20230143452A1 (en) Method and apparatus for generating image, electronic device and storage medium
CN115050064A (en) Face living body detection method, device, equipment and medium
RU2671990C1 (en) Method of displaying three-dimensional face of the object and device for it
CN112162672A (en) Information flow display processing method and device, electronic equipment and storage medium
CN111507259B (en) Face feature extraction method and device and electronic equipment
CN113223137B (en) Generation method and device of perspective projection human face point cloud image and electronic equipment
CN112488054A (en) Face recognition method, face recognition device, terminal equipment and storage medium
CN116258800A (en) Expression driving method, device, equipment and medium
CN116546304A (en) Parameter configuration method, device, equipment, storage medium and product
CN114040129A (en) Video generation method, device, equipment and storage medium
CN115731326A (en) Virtual role generation method and device, computer readable medium and electronic device
CN115994944A (en) Three-dimensional key point prediction method, training method and related equipment
US11954779B2 (en) Animation generation method for tracking facial expression and neural network training method thereof
JPWO2019224947A1 (en) Learning device, image generator, learning method, image generation method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40072642

Country of ref document: HK