CN111274855A - Image processing method and device, and machine learning model training method and device - Google Patents

Image processing method and device, and machine learning model training method and device Download PDF

Info

Publication number
CN111274855A
CN111274855A CN201811480882.0A CN201811480882A CN111274855A CN 111274855 A CN111274855 A CN 111274855A CN 201811480882 A CN201811480882 A CN 201811480882A CN 111274855 A CN111274855 A CN 111274855A
Authority
CN
China
Prior art keywords
image
glasses
processed
sample
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811480882.0A
Other languages
Chinese (zh)
Other versions
CN111274855B (en
Inventor
张雪
王冲
杜瑶
张彦刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Orion Star Technology Co Ltd
Original Assignee
Beijing Orion Star Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Orion Star Technology Co Ltd filed Critical Beijing Orion Star Technology Co Ltd
Priority to CN201811480882.0A priority Critical patent/CN111274855B/en
Publication of CN111274855A publication Critical patent/CN111274855A/en
Application granted granted Critical
Publication of CN111274855B publication Critical patent/CN111274855B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/273Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion removing elements interfering with the pattern to be recognised
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Abstract

The present specification provides an image processing method, an image processing apparatus, a machine learning model training method, and an apparatus, wherein the image processing method includes: acquiring an image to be processed, wherein a shot person in the image to be processed wears glasses; processing the image to be processed through a pre-trained conversion network to obtain a processed image, wherein glasses worn by the photographed person in the processed image are removed; and performing face recognition on the shot person by using the processed image.

Description

Image processing method and device, and machine learning model training method and device
Technical Field
The present disclosure relates to the field of face recognition technologies, and in particular, to an image processing method and apparatus, and a machine learning model training method and apparatus.
Background
In face recognition, the face recognition is usually affected by wearing glasses on the face, accuracy of face recognition is reduced, and the common application in the market at present is to remove glasses based on a Principal Component Analysis (PCA) technology and a Deep Convolutional Neural Network (DCNN) technology, but the two methods are affected by many factors, and are basically processed on an infrared image, so that an actual application effect cannot be achieved.
Disclosure of Invention
In view of the above, embodiments of the present disclosure provide an image processing method, an image processing apparatus, a machine learning model training method, a machine learning model training apparatus, a computing device, and a storage medium, so as to solve technical defects in the prior art.
According to a first aspect of embodiments herein, there is provided an image processing method including:
acquiring an image to be processed, wherein a shot person in the image to be processed wears glasses;
processing the image to be processed through a pre-trained conversion network to obtain a processed image, wherein glasses worn by the photographed person in the processed image are removed;
and performing face recognition on the shot person by using the processed image.
Optionally, the acquiring the image to be processed includes:
acquiring an original image to be processed;
screening out an image of a person wearing glasses from the original image by using a first classifier;
and determining the image of the shot person wearing the glasses as the image to be processed.
According to a second aspect of embodiments herein, there is provided a machine learning model training method, including:
marking the glasses wearing state of a shot person in the original sample data by using a trained first classifier;
acquiring a first data set Sn and a second data set Sp from the marked images, wherein the images in Sn are that the shot person does not wear glasses, and the images in Sp are that the shot person wears glasses;
training a conversion network implemented by a machine learning model based on the first and second data sets Sn, Sp, the conversion network associating images of the same subject wearing glasses with images of the subject not wearing glasses.
Optionally, the first classifier is trained by:
acquiring a sample image and a sample label corresponding to the sample image, wherein the sample label is used for identifying whether a shot person in the sample image wears glasses or not, and the sample image comprises an image of the shot person not wearing glasses and an image of the shot person wearing glasses;
training a first classifier based on the training samples and the sample labels, the first classifier associating an image of the subject with a state of wearing glasses.
Optionally, before the acquiring the first data set Sn and the second data set Sp from the marked image, the method further includes:
whether the marked images have illumination or not is marked by using a second classifier, and the images which are balanced in number and do not contain illumination information and the images containing the illumination information are selected as the basis of random sampling;
and/or
And marking the gender of the face in the marked image by using a third classifier, and selecting face images with different genders in a balanced number as the basis of random sampling.
Optionally, the method further comprises:
selecting at least one sample Is in the first data set Sn and the second data set Sp;
converting each sample Is through a conversion network to generate a converted sample Ig, extracting features of the Is and the Ig to respectively obtain a first feature expression of the Is and a second feature expression of the Ig, and calculating a distance corresponding to the Is according to the first feature expression and the second feature expression;
determining a first penalty function based on the distance corresponding to the at least one sample Is;
and combining the first penalty function with a second penalty function of the conversion network to optimize the conversion network.
According to a third aspect of embodiments herein, there is provided an image processing apparatus comprising:
an acquisition module: the method comprises the steps of obtaining an image to be processed, wherein a shot person in the image to be processed wears glasses;
a processing module: the image processing device is configured to process the image to be processed through a pre-trained conversion network to obtain a processed image, wherein glasses worn by the photographed person are removed from the processed image;
an identification module: and the face recognition module is configured to perform face recognition on the shot person by using the processed image.
Optionally, the obtaining module is further configured to: acquiring an original image to be processed;
and screening out an image of the glasses worn by the shot from the original image by using a first classifier, and determining the image of the glasses worn by the shot as an image to be processed.
According to a fourth aspect of embodiments herein, there is provided a machine learning model training apparatus including:
a marking module: the glasses wearing state of the photographer in the original sample data is marked by the trained first classifier;
a first obtaining module: the system is configured to acquire a first data set Sn and a second data set Sp from marked images, wherein images in Sn are that the shot person does not wear glasses, and images in Sp are that the shot person wears glasses;
a first training module: is configured to train a conversion network implemented by a machine learning model that associates images of the same photographer wearing glasses with images of the photographer not wearing glasses based on the first and second data sets Sn and Sp.
Optionally, the apparatus further comprises:
a second training module: the method comprises the steps that a sample image and a sample label corresponding to the sample image are obtained, the sample label is used for identifying whether a shot person in the sample image wears glasses or not, and the sample image comprises an image of the shot person not wearing glasses and an image of the shot person wearing glasses; training a first classifier based on the training samples and the sample labels, the first classifier associating an image of the subject with a state of wearing glasses.
Optionally, the tagging module is further configured to: whether the marked images have illumination or not is marked by using a second classifier, and the images which are balanced in number and do not contain illumination information and the images containing the illumination information are selected as the basis of random sampling; and/or marking the gender of the face in the marked image by using a third classifier, and selecting face images with different genders in a balanced number as the basis of random sampling.
Optionally, the apparatus further comprises:
an optimization module configured to select at least one sample Is in the first data set Sn and the second data set Sp; converting each sample Is through a conversion network to generate a converted sample Ig, extracting features of the Is and the Ig to obtain a first feature expression of the Is and a second feature expression of the Ig respectively, and calculating a distance corresponding to the Is according to the first feature expression and the second feature expression; determining a first penalty function based on the distance corresponding to the at least one sample Is; and combining the first penalty function with a second penalty function of the conversion network to optimize the conversion network.
According to a fifth aspect of embodiments herein, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the first or second aspect when executing the instructions.
According to a sixth aspect of embodiments herein, there is provided a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the first or second aspect.
In the embodiment of the present specification, an image to be processed is obtained, glasses are worn by a subject in the image to be processed, the image to be processed is processed through a pre-trained conversion network, and a processed image is obtained, where the glasses worn by the subject in the processed image are removed, and the processed image is used to perform face recognition on the subject. Glasses worn by a shot person in the collected images are removed through the trained conversion network, and the shot person is subjected to face recognition based on the images after the glasses are removed, so that the interference of the glasses on the face recognition accuracy rate is reduced, and the face recognition accuracy rate is improved.
Drawings
FIG. 1 is a block diagram of a computing device provided by an embodiment of the present application;
FIG. 2 is a flowchart of an image processing method provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of a Cycle-GAN network application of the image processing method according to the embodiment of the present application;
FIG. 4 is a flowchart of a machine learning model training method of an image processing method provided by an embodiment of the present application;
FIG. 5 is a flowchart of training and optimizing a machine learning model of an image processing method provided in an embodiment of the present application;
fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a machine learning model training apparatus in an image processing apparatus according to an embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In this specification, an image processing method, an apparatus, a computing device, and a storage medium are provided, and detailed descriptions are individually made in the following embodiments.
Fig. 1 is a block diagram illustrating a configuration of a computing device 100 according to an embodiment of the present specification. The components of the computing device 100 include, but are not limited to, memory 110 and processor 120. The processor 120 is coupled to the memory 110 via a bus 130 and a database 150 is used to store data.
Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 140 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In an embodiment of the present description, the other components of the computing device 100 described above and not shown in fig. 1 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 1 is for purposes of example only and is not limiting as to the scope of the description. Those skilled in the art may add or replace other components as desired.
Computing device 100 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.
Wherein the processor 120 may perform the steps of the image processing method shown in fig. 2. Fig. 2 is a flowchart illustrating an image processing method according to an embodiment of the present specification, including step 202 to step 206.
Step 202: acquiring an image to be processed, wherein a shot person in the image to be processed wears glasses.
In an embodiment of this specification, the acquiring an image to be processed includes:
acquiring an original image to be processed;
screening out an image of a person wearing glasses from the original image by using a first classifier;
and determining the image of the shot person wearing the glasses as the image to be processed.
In practical application, if 30 persons need to be subjected to face recognition, face pictures of the 30 persons are collected firstly, wherein possibly some persons to be shot wear glasses, and some persons to be shot do not wear glasses. If the identified object wears glasses, the accuracy of identification is affected. In order to improve the accuracy of face recognition, in the embodiment of the present specification, a face image of a person wearing glasses by a photographer is screened out from the acquired face images of 30 persons by a first classifier, for example, 10 images of the person wearing glasses by the photographer are screened out, and the 10 images are to-be-processed images.
Step 204: and processing the image to be processed through a pre-trained conversion network to obtain a processed image, wherein the glasses worn by the shot person in the processed image are removed.
In an embodiment of the present specification, the conversion network may be a Cycle-GAN network composed of a generative model and a discriminant model. The Cycle-GAN network can realize the conversion of image styles, and the Cycle-GAN network enables the images in the source domain and the images in the target domain to have meaningful association relation.
Referring to fig. 3, a schematic diagram of an application of a Cycle-GAN network is shown, in which a source domain horse image generates a zebra image of a target domain through a generation model in the Cycle-GAN network.
Based on the conversion characteristics of the Cycle-GAN network, the Cycle-GAN network is trained by using images of the shot person wearing glasses and images of the shot person not wearing glasses, so that the images of the shot person wearing glasses are associated with the images of the shot person not wearing glasses. After the Cycle-GAN network training is finished, the images of the shot person wearing the glasses are input through the generation model, and the images of the shot person not wearing the glasses can be output.
In practical application, the 10 images of the subject wearing glasses are processed by a generation model in a conversion network trained in advance, and 10 corresponding images of the subject not wearing glasses are generated.
Step 206: and performing face recognition on the shot person by using the processed image.
In the processed image, the glasses worn by the subject are removed, and face recognition is performed based on the removed glasses. In practical application, various face recognition methods can be adopted for face recognition, for example, feature extraction is performed on the processed image to obtain a feature expression of the processed image, the obtained feature expression is compared with a feature expression of a reference face image in a registry, and whether the photographed person is a registered user or not is determined according to a comparison result. If the user is registered, the related information of the shot person can be further acquired.
When feature extraction is performed on the processed image, the following steps may be performed: and inputting the processed image into a pre-trained face recognition model, and acquiring the feature expression output by a hidden layer of the face recognition model as the feature expression of the processed image.
The present application does not limit the face recognition method at all.
In the embodiment of the present specification, an image to be processed is obtained, glasses are worn by a subject in the image to be processed, the image to be processed is processed by a pre-trained generative model in a conversion network composed of a generative model and a discriminant model, and a processed image is obtained, in the processed image, the glasses worn by the subject are removed. Glasses worn by a shot person in the collected images are removed through the trained conversion network, and face recognition is carried out on the shot person based on the images after the glasses are removed, so that the accuracy of face recognition of the shot person in the follow-up face recognition process is improved.
Referring to fig. 4, a flowchart of a machine learning model training process provided for an embodiment of the present specification includes steps 402 to 406.
Step 402: and marking the state of glasses worn by the photographer in the original sample data by using the trained first classifier.
In an embodiment of the present specification, the first classifier is trained by the following steps:
acquiring a sample image and a sample label corresponding to the sample image, wherein the sample label is used for identifying whether a shot person in the sample image wears glasses or not, and the sample image comprises an image of the shot person not wearing glasses and an image of the shot person wearing glasses;
training a first classifier based on the training samples and the sample labels, the first classifier associating an image of the subject with a state of wearing glasses.
In practical application, a first classifier is used for adding a tag of a state that a shot person wears glasses to a state that the shot person wears the glasses in original sample data, adding a tag of wearing the glasses to an image that the shot person wears the glasses in the original sample data, adding a tag of not wearing the glasses to an image that the shot person does not wear the glasses in the original sample data, and classifying whether the shot person wears the glasses in the original sample data.
Step 404: and acquiring a first data set Sn and a second data set Sp from the marked images, wherein the images in Sn are that the shot person does not wear glasses, and the images in Sp are that the shot person wears glasses.
In an embodiment of the present specification, before acquiring the first data set Sn and the second data set Sp from the marked image, the method further includes:
whether the marked images have illumination or not is marked by using a second classifier, and the images which are balanced in number and do not contain illumination information and the images containing the illumination information are selected as the basis of random sampling;
and/or
And marking the gender of the face in the marked image by using a third classifier, and selecting face images with different genders in a balanced number as the basis of random sampling.
In practical application, a second classifier is used for marking whether the marked image has illumination or not; marking the gender of the face in the marked image by using a third classifier; the fourth classifier may also be used to label the skin color of the face in the labeled image, and the labeled image may be labeled by more classifiers, which is not limited herein.
And further marking the marked image through the second classifier and the third classifier, so that the objective conditions of the marked image obtained in sampling are close to the same.
Optionally, the marked images are randomly sampled to obtain a first data set Sn and a second data set Sp, where the number of the images in the first data set Sn and the second data set Sp may be equal to or different from each other, and in this embodiment of the present invention, the number of the images included in the first data set Sn and the second data set Sp is not limited. By adopting random sampling, each image has the same chance of being extracted in the sampling process, so that the subjective randomness in the sampling process is eliminated, and the reliability of the sampling result is improved. Of course, in the embodiment of the present invention, the sampling manner is not limited, and for example, a sequential sampling manner may also be adopted.
Step 406: training a conversion network implemented by a machine learning model based on the first and second data sets Sn, Sp, the conversion network associating images of the same subject wearing glasses with images of the subject not wearing glasses.
The conversion network may be a Cycle-GAN network, and the description of the Cycle-GAN network specifically refers to the description corresponding to fig. 3, which is not described herein again.
In this embodiment of the present specification, the same number of images are randomly selected from the images of the first data set Sn in which the subject does not wear glasses and the images of the second data set Sp in which the subject wears glasses, so that the conversion network can associate the images of the subject wearing glasses with the images of the subject not wearing glasses, and the conversion network is trained by this method, so that the images of the face of the subject with glasses removed by the conversion network are closer to the face images of the subject in reality, thereby improving the conversion effect.
Referring to fig. 5, a flowchart of another training and optimization process of a machine learning model provided for the embodiments of the present disclosure includes steps 502 to 516.
Step 502: and marking the state of glasses worn by the photographer in the original sample data by using the trained first classifier.
Optionally, a sample tag is added to the state of glasses worn by the photographer in the original sample data by the first classifier, the state of glasses worn by the photographer in the original sample data is changed to have tags for determining whether glasses are worn, and the state of glasses worn by the photographer in the sample image is classified whether glasses are worn by the photographer.
Step 504: and acquiring a first data set Sn and a second data set Sp from the marked images, wherein the images in Sn are that the shot person does not wear glasses, and the images in Sp are that the shot person wears glasses.
Step 506: training a conversion network implemented by a machine learning model based on the first and second data sets Sn, Sp, the conversion network associating images of the same subject wearing glasses with images of the subject not wearing glasses.
Step 508: at least one sample Is selected from the first data set Sn and the second data set Sp.
Step 510: and converting each sample Is through a conversion network to generate a converted sample Ig, extracting the characteristics of the Is and the Ig to respectively obtain a first characteristic expression of the Is and a second characteristic expression of the Ig, and calculating the distance corresponding to the Is according to the first characteristic expression and the second characteristic expression.
Step 512: and determining a first penalty function based on the distance corresponding to the at least one sample Is.
In practical application, when a sample Is selected from the first data set Sn and the second data set Sp, the sample may be Is, the sample Is converted by a conversion network to generate a converted sample as Ig, feature extraction Is performed on the Is and the Ig to obtain a first feature expression of the Is and a second feature expression of the Ig, a first distance corresponding to the Is calculated according to the first feature expression and the second feature expression, and the first distance Is determined as a first penalty function.
In practical application, a pre-trained face recognition model can be used for extracting features of an input face image, and after the face image is input, a feature expression output by a hidden layer of the face recognition model is obtained and used as the feature expression of the input face image. For example, Is may be input into the face recognition model, and the feature expression output by the face recognition model hidden layer Is used as the first feature expression of Is, and Ig may be input into the face recognition model, and the feature expression output by the face recognition model hidden layer Is used as the second feature expression of Ig.
In the case of selecting a plurality of samples (i.e., two or more samples) from the first data set Sn and the second data set Sp, three samples are taken as an example for illustration, and the other number of samples are similar to this and will not be illustrated here. Assuming that the selected samples comprise Ix, Iy and Iz, converting each sample through a conversion network to generate a corresponding converted sample, converting the sample Ix through the conversion network to generate a converted sample as Im, converting the Iy through the conversion network to generate a converted sample as In, converting the Iz through the conversion network to generate a converted sample as Io, and respectively performing feature extraction on the Ix, Im, Iy, In, Iz and Io to obtain feature expressions 1 and 2 of the Ix, feature expression 3 of the Im, feature expression 4 of the In, feature expression 5 of the Iz and feature expression 6 of the Io.
Further, a first distance between the feature expression 1 and the feature expression 2, a second distance between the feature expression 3 and the feature expression 4, and a third distance between the feature expression 5 and the feature expression 6 are calculated, and a first penalty function is determined according to the first distance, the second distance, and the third distance.
In implementation, when determining the first penalty function according to the first distance, the second distance, and the third distance, an average value may be taken, and a maximum value may also be taken, which is not limited herein.
Step 514: and combining the first penalty function with a second penalty function of the conversion network to optimize the conversion network.
In an embodiment of the present specification, the conversion network has a second penalty function, and the conversion network is optimized; by adding the first penalty function and combining the first penalty function with the second penalty function in the conversion network, the optimization effect of the conversion network is improved.
In an embodiment of the present specification, the conversion network is trained by randomly selecting an equal number of images from the images of the first data set Sn where the glasses are not worn by the photographer and the images of the second data set Sp where the glasses are worn by the photographer, the conversion network is enabled to associate the images of the glasses worn by the photographer with the images of the glasses not worn by the photographer through the training of the conversion network, and by adding a first penalty function and combining the first penalty function with a second penalty function in the conversion network, the optimization of the conversion network is improved, so that the quality of the images obtained through the conversion network is better, the conversion network is utilized to perform the glasses removal processing of the photographer, so that the images from which the glasses are removed are closer to the real face images, the processing process is simple, the generated images have good quality, and the interference of the glasses on the accuracy of face recognition is reduced, the face recognition method is beneficial to improving the accuracy of face recognition of the shot person in the subsequent face recognition process.
Corresponding to the above method embodiment, the present specification also provides an image processing apparatus embodiment, and fig. 6 shows a schematic structural diagram of an image processing apparatus according to an embodiment of the present specification. As shown in fig. 6, the apparatus 600 includes:
the obtaining module 602: the method comprises the steps of obtaining an image to be processed, wherein a shot person in the image to be processed wears glasses;
the processing module 604: the image processing device is configured to process the image to be processed through a pre-trained conversion network to obtain a processed image, wherein glasses worn by the photographed person are removed from the processed image;
the identification module 606: and the face recognition module is configured to perform face recognition on the shot person by using the processed image.
In an optional embodiment, the obtaining module is further configured to: acquiring an original image to be processed;
and screening out an image of the glasses worn by the shot from the original image by using a first classifier, and determining the image of the glasses worn by the shot as an image to be processed.
In an optional embodiment, an image to be processed is obtained, glasses are worn by a shot in the image to be processed, the image to be processed is processed through a pre-trained generation model in a conversion network composed of a generation model and a discrimination model, a processed image is obtained, the glasses worn by the shot are removed from the processed image, the image to be processed is processed through the trained conversion network, the processing process is simple, the quality of the generated image is good, the interference of the glasses on the face recognition accuracy rate is reduced, and the improvement of the face recognition accuracy rate of the shot in the subsequent face recognition process is facilitated.
Corresponding to the above method embodiments, the present specification further provides an embodiment of a machine learning model training apparatus, and fig. 7 shows a schematic structural diagram of the machine learning model training apparatus according to an embodiment of the present specification. As shown in FIG. 7, the apparatus 700 includes
The marking module 702: the glasses wearing state of the photographer in the original sample data is marked by the trained first classifier;
the first obtaining module 704: configured to obtain a first data set Sn and a second data set Sp from the marked images, wherein the subject in the images in Sn does not wear glasses and the subject in the images in Sp wears glasses;
the first training module 706: is configured to train a conversion network implemented by a machine learning model that associates images of the same photographer wearing glasses with images of the photographer not wearing glasses based on the first and second data sets Sn and Sp.
In an alternative embodiment, the machine learning model training apparatus 700 further comprises:
a second training module: the method comprises the steps that a sample image and a sample label corresponding to the sample image are obtained, the sample label is used for identifying whether a shot person in the sample image wears glasses or not, and the sample image comprises an image of the shot person not wearing glasses and an image of the shot person wearing glasses; training a first classifier based on the training samples and the sample labels, the first classifier associating an image of the subject with a state of wearing glasses.
In an optional embodiment, the tagging module is further configured to: whether the marked images have illumination or not is marked by using a second classifier, and the images which are balanced in number and do not contain illumination information and the images containing the illumination information are selected as the basis of random sampling; and/or marking the gender of the face in the marked image by using a third classifier, and selecting face images with different genders in a balanced number as the basis of random sampling.
In an alternative embodiment, the machine learning model training apparatus 700 further comprises:
an optimization module configured to select at least one sample Is in the first data set Sn and the second data set Sp; converting each sample Is through a conversion network to generate a converted sample Ig, extracting features of the Is and the Ig to obtain a first feature expression of the Is and a second feature expression of the Ig respectively, and calculating a distance corresponding to the Is according to the first feature expression and the second feature expression; determining a first penalty function based on the distance corresponding to the at least one sample Is; and combining the first penalty function with a second penalty function of the conversion network to optimize the conversion network.
In an optional embodiment, a trained first classifier is used for marking the state that a photographed person wears glasses in original sample data, and a first data set Sn and a second data set Sp are obtained from marked images, wherein the photographed person in the images in Sn does not wear glasses, and the photographed person in the images in Sp wears glasses; training a conversion network realized by a machine learning model based on the first data set Sn and the second data set Sp, wherein the conversion network enables an image of a shot person wearing glasses to be associated with an image of the shot person without wearing glasses, and the shot face image of the shot person with glasses removed by the conversion network is closer to the face image of the shot person in reality, so that the conversion effect is improved.
There is also provided in an embodiment of the present specification a computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the image processing method when executing the instructions.
An embodiment of the present specification also provides a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the image processing method and the machine learning model training method.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the image processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solutions of the image processing method and the machine learning model training method.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims (10)

1. An image processing method, comprising
Acquiring an image to be processed, wherein a shot person in the image to be processed wears glasses;
processing the image to be processed through a pre-trained conversion network to obtain a processed image, wherein glasses worn by the photographed person in the processed image are removed;
and performing face recognition on the shot person by using the processed image.
2. The method of claim 1, wherein the acquiring the image to be processed comprises:
acquiring an original image to be processed;
screening out an image of a person wearing glasses from the original image by using a first classifier;
and determining the image of the shot person wearing the glasses as the image to be processed.
3. A machine learning model training method, comprising:
marking the glasses wearing state of a shot person in the original sample data by using a trained first classifier;
acquiring a first data set Sn and a second data set Sp from the marked images, wherein the images in Sn are that the shot person does not wear glasses, and the images in Sp are that the shot person wears glasses;
training a conversion network implemented by a machine learning model based on the first and second data sets Sn, Sp, the conversion network associating images of the same subject wearing glasses with images of the subject not wearing glasses.
4. The method of claim 3, wherein the first classifier is trained by:
acquiring a sample image and a sample label corresponding to the sample image, wherein the sample label is used for identifying whether a shot person in the sample image wears glasses or not, and the sample image comprises an image of the shot person not wearing glasses and an image of the shot person wearing glasses;
training a first classifier based on the training samples and the sample labels, the first classifier associating an image of the subject with a state of wearing glasses.
5. The method of claim 3, wherein the obtaining the first data set Sn and the second data set Sp from the marked image further comprises:
whether the marked images have illumination or not is marked by using a second classifier, and the images which are balanced in number and do not contain illumination information and the images containing the illumination information are selected as the basis of random sampling;
and/or
And marking the gender of the face in the marked image by using a third classifier, and selecting face images with different genders in a balanced number as the basis of random sampling.
6. The method of claim 3, further comprising:
selecting at least one sample Is in the first data set Sn and the second data set Sp;
converting each sample Is through a conversion network to generate a converted sample Ig, extracting features of the Is and the Ig to respectively obtain a first feature expression of the Is and a second feature expression of the Ig, and calculating a distance corresponding to the Is according to the first feature expression and the second feature expression;
determining a first penalty function based on the distance corresponding to the at least one sample Is;
and combining the first penalty function with a second penalty function of the conversion network to optimize the conversion network.
7. An image processing apparatus characterized by comprising:
an acquisition module: the method comprises the steps of obtaining an image to be processed, wherein a shot person in the image to be processed wears glasses;
a processing module: the image processing device is configured to process the image to be processed through a pre-trained conversion network to obtain a processed image, wherein glasses worn by the photographed person are removed from the processed image;
an identification module: and the face recognition module is configured to perform face recognition on the shot person by using the processed image.
8. The apparatus of claim 7, wherein the acquisition module is further configured to:
acquiring an original image to be processed; and screening out an image of the glasses worn by the shot from the original image by using a first classifier, and determining the image of the glasses worn by the shot as an image to be processed.
9. A machine learning model training device, comprising:
a marking module: the glasses wearing state of the photographer in the original sample data is marked by the trained first classifier;
a first obtaining module: the system is configured to acquire a first data set Sn and a second data set Sp from marked images, wherein images in Sn are that the shot person does not wear glasses, and images in Sp are that the shot person wears glasses;
a first training module: is configured to train a conversion network implemented by a machine learning model that associates images of the same photographer wearing glasses with images of the photographer not wearing glasses based on the first and second data sets Sn and Sp.
10. The apparatus of claim 9, further comprising:
a second training module: the method comprises the steps that a sample image and a sample label corresponding to the sample image are obtained, the sample label is used for identifying whether a shot person in the sample image wears glasses or not, and the sample image comprises an image of the shot person not wearing glasses and an image of the shot person wearing glasses; training a first classifier based on the training samples and the sample labels, the first classifier associating an image of the subject with a state of wearing glasses.
CN201811480882.0A 2018-12-05 2018-12-05 Image processing method, image processing device, machine learning model training method and machine learning model training device Active CN111274855B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811480882.0A CN111274855B (en) 2018-12-05 2018-12-05 Image processing method, image processing device, machine learning model training method and machine learning model training device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811480882.0A CN111274855B (en) 2018-12-05 2018-12-05 Image processing method, image processing device, machine learning model training method and machine learning model training device

Publications (2)

Publication Number Publication Date
CN111274855A true CN111274855A (en) 2020-06-12
CN111274855B CN111274855B (en) 2024-03-26

Family

ID=71003186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811480882.0A Active CN111274855B (en) 2018-12-05 2018-12-05 Image processing method, image processing device, machine learning model training method and machine learning model training device

Country Status (1)

Country Link
CN (1) CN111274855B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866421A (en) * 2010-01-08 2010-10-20 苏州市职业大学 Method for extracting characteristic of natural image based on dispersion-constrained non-negative sparse coding
US20170053423A1 (en) * 2015-08-20 2017-02-23 General Electric Company Systems and methods for emission tomography quantitation
WO2018072102A1 (en) * 2016-10-18 2018-04-26 华为技术有限公司 Method and apparatus for removing spectacles in human face image
CN108280413A (en) * 2018-01-17 2018-07-13 百度在线网络技术(北京)有限公司 Face identification method and device
CN108846355A (en) * 2018-06-11 2018-11-20 腾讯科技(深圳)有限公司 Image processing method, face identification method, device and computer equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866421A (en) * 2010-01-08 2010-10-20 苏州市职业大学 Method for extracting characteristic of natural image based on dispersion-constrained non-negative sparse coding
US20170053423A1 (en) * 2015-08-20 2017-02-23 General Electric Company Systems and methods for emission tomography quantitation
WO2018072102A1 (en) * 2016-10-18 2018-04-26 华为技术有限公司 Method and apparatus for removing spectacles in human face image
CN108280413A (en) * 2018-01-17 2018-07-13 百度在线网络技术(北京)有限公司 Face identification method and device
CN108846355A (en) * 2018-06-11 2018-11-20 腾讯科技(深圳)有限公司 Image processing method, face identification method, device and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵峰;: "基于SVM和HOG的人脸检测算法", 信息技术与信息化, no. 06, 15 December 2013 (2013-12-15) *

Also Published As

Publication number Publication date
CN111274855B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
CN107333071A (en) Video processing method and device, electronic equipment and storage medium
CN109858392B (en) Automatic face image identification method before and after makeup
WO2020207079A1 (en) Image recognition-based desensitization processing method and device
CN110163250B (en) Image desensitization processing system, method and device based on distributed scheduling
CN110555441A (en) character recognition method and device
CN110969139A (en) Face recognition model training method and related device, face recognition method and related device
TWI731919B (en) Image recognition method and device and metric learning method and device
CN112949456B (en) Video feature extraction model training and video feature extraction method and device
CN112765354B (en) Model training method, model training device, computer apparatus, and storage medium
CN111274855B (en) Image processing method, image processing device, machine learning model training method and machine learning model training device
CN111382410A (en) Face brushing verification method and system
CN116168274A (en) Object detection method and object detection model training method
CN111079013B (en) Information recommendation method and device based on recommendation model
CN114356860A (en) Dialog generation method and device
CN114187632A (en) Facial expression recognition method and device based on graph convolution neural network
CN112686156A (en) Emotion monitoring method and device, computer equipment and readable storage medium
US10762607B2 (en) Method and device for sensitive data masking based on image recognition
US20210158082A1 (en) Duplicate image detection based on image content
Venkateswarlu et al. AI-based Gender Identification using Facial Features
Bhanumathi et al. Underwater Fish Species Classification Using Alexnet
Tiwari et al. Check for updates FRLL-Beautified: A Dataset of Fun Selfie Filters with Facial Attributes
Virata et al. A Raspberry Pi-Based Identity Verification Through Face Recognition Using Constrained Images
Sikder et al. Person Adapted Emotion Recognition using Deep Learning
CN115222982A (en) Image classification model processing method
CN113239675A (en) Text processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant