WO2021078157A1 - 图像处理方法、装置、电子设备及存储介质 - Google Patents

图像处理方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2021078157A1
WO2021078157A1 PCT/CN2020/122506 CN2020122506W WO2021078157A1 WO 2021078157 A1 WO2021078157 A1 WO 2021078157A1 CN 2020122506 W CN2020122506 W CN 2020122506W WO 2021078157 A1 WO2021078157 A1 WO 2021078157A1
Authority
WO
WIPO (PCT)
Prior art keywords
image data
attribute
network
specific
sample image
Prior art date
Application number
PCT/CN2020/122506
Other languages
English (en)
French (fr)
Inventor
孙莹莹
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021078157A1 publication Critical patent/WO2021078157A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Definitions

  • This application relates to the field of image processing technology, and more specifically, to an image processing method, device, electronic device, and storage medium.
  • the existing image attribute recognition technical solutions are mainly based on traditional machine learning attribute recognition solutions and attribute recognition solutions based on convolutional neural network models.
  • the most commonly used image attribute recognition technology is based on a single model and realizes a single attribute judgment.
  • the efficiency of multi-attribute recognition is not high.
  • This application proposes an image processing method, device, electronic equipment, and storage medium to improve the above-mentioned defects.
  • an embodiment of the present application provides an image processing method, including: obtaining image data to be processed; inputting the image data to be processed into a plurality of pre-trained specific networks to obtain the corresponding image data
  • the attribute tags of each specific network are used to determine the attribute tags corresponding to the image data, and the attribute tags determined by each specific network are different from each other; each specific network is determined
  • the attribute tag of is input into a pre-trained sharing network to obtain the image recognition result, where the sharing network is used to determine the image recognition result according to each attribute label and the correlation of each attribute label; and output the image recognition result.
  • an embodiment of the present application also provides an image processing method, including: acquiring multiple sample image data, each of the sample image data corresponding to multiple attribute tags; setting a shared network and multiple specific networks, each Each of the specific networks can recognize at least one attribute tag, and the attribute tags that each specific network can recognize are different from each other; input the multiple sample image data into the shared network and multiple specific networks to perform Training to obtain a trained shared network and multiple specific networks; obtain image data to be processed, and process the image data to be processed according to the trained shared network and multiple specific networks to obtain image recognition result.
  • an embodiment of the present application also provides an image processing device, including: a data acquisition unit, an attribute determination unit, a result acquisition unit, and an output unit.
  • the data acquisition unit is used to acquire the image data to be processed.
  • the attribute determining unit is configured to input the image data to be processed into multiple pre-trained specific networks to obtain attribute labels corresponding to the image data, wherein each specific network is used to determine the image
  • the attribute labels corresponding to the data, and the attribute labels determined by each of the specific networks are different from each other.
  • the result obtaining unit is used to input the attribute label determined by each specific network into a pre-trained shared network to obtain the image recognition result, wherein the shared network is used to obtain the image recognition result according to each attribute label and each attribute label. Correlation determines the result of image recognition.
  • the output unit is used to output the image recognition result.
  • an embodiment of the present application also provides an image processing device, including: a sample acquisition unit, a setting unit, a network training unit, and a recognition unit.
  • the sample acquisition unit is configured to acquire a plurality of sample image data, and each of the sample image data corresponds to a plurality of attribute tags.
  • the setting unit is configured to set a shared network and a plurality of specific networks, each of the specific networks can recognize at least one attribute tag, and the attribute tags that can be recognized by each of the specific networks are different from each other.
  • the training unit is configured to input the multiple sample image data into the shared network and multiple specific networks for training, so as to obtain a trained shared network and multiple specific networks.
  • the recognition unit is configured to obtain image data to be processed, and process the image data to be processed according to the trained shared network and multiple specific networks to obtain an image recognition result.
  • an embodiment of the present application also provides an electronic device, including: one or more processors; a memory; one or more application programs, wherein the one or more application programs are stored in the memory And is configured to be executed by the one or more processors, and the one or more programs are configured to execute the foregoing method.
  • an embodiment of the present application also provides a computer-readable medium, the readable storage medium stores program code executable by a processor, and when multiple instructions in the program code are executed by the processor The processor is caused to execute the above-mentioned method.
  • FIG. 1 shows a method flowchart of an image processing method provided by an embodiment of the present application
  • FIG. 2 shows a method flowchart of an image processing method provided by another embodiment of the present application
  • FIG. 3 shows a method flowchart of an image processing method provided by another embodiment of the present application.
  • FIG. 4 shows a method flowchart of S310 in the image processing method shown in FIG. 3 according to an embodiment of the present application
  • FIG. 5 shows a method flowchart of S310 in the image processing method shown in FIG. 3 according to another embodiment of the present application
  • FIG. 6 shows a schematic diagram of a measurement area provided by an embodiment of the present application.
  • FIG. 7 shows a schematic diagram of the connection between a specific network and a shared network provided by an embodiment of the present application
  • FIG. 8 shows a schematic diagram of sub-image data provided by an embodiment of the present application.
  • FIG. 9 shows a schematic diagram of a face orientation provided by an embodiment of the present application.
  • FIG. 10 shows a method flowchart of an image processing method provided by still another embodiment of the present application.
  • FIG. 11 shows a block diagram of a module of an image processing device provided by an embodiment of the present application.
  • FIG. 12 shows a block diagram of a module of an image processing device provided by another embodiment of the present application.
  • FIG. 13 shows a block diagram of a module of an image processing device provided by another embodiment of the present application.
  • FIG. 14 shows a block diagram of a module of an electronic device provided by an embodiment of the present application.
  • Fig. 15 shows a storage unit provided by an embodiment of the present application for storing or carrying program code for implementing the graphics processing method according to the embodiment of the present application.
  • Face recognition is a technology for identifying different people's identities based on the appearance of human faces. It has a wide range of application scenarios, and related research and applications have been around for decades. With the development of related technologies such as big data and deep learning in recent years, the effect of face recognition has been improved by leaps and bounds, and it has become more and more widely used in scenarios such as identity authentication, video surveillance, and beauty entertainment. Among them, the issue of person ID comparison is the face recognition problem between standard ID photos and life photos. Since the identification of the target person only needs to deploy his ID photos in the database, it eliminates the need for the target person to collect life photos and register in the system. The trouble is getting more and more attention.
  • the existing face attribute recognition technical solutions are mainly based on traditional machine learning attribute recognition solutions and convolutional neural network CNN model-based attribute recognition solutions and so on.
  • the concept of multi-task learning is used for reference.
  • the convolutional neural network is used to learn the volume of the preset analysis task in the face database. Build layers to obtain a face analysis model, and complete the prediction of facial emotions.
  • multi-task cascading learning is achieved by adding auxiliary information such as gender, whether smiling, whether to wear glasses, and posture in the training process. These attributes are used as tags, and face alignment is realized by cascading.
  • the multi-task learning method not only recognizes a single face attribute, but also realizes multi-attribute prediction.
  • multi-task learning methods can be introduced into face image race and gender recognition, with different semantics as different tasks, and semantic-based multi-task feature selection can be proposed and applied to race and gender recognition.
  • Race and gender are still solved separately as two tasks.
  • the model has a lot of redundancy and cannot be predicted in real time.
  • the most commonly used face attribute recognition technology is based on a single model and realizes a single attribute judgment, that is, only one task is learned at a time under a unified model, and the complex problem is first decomposed into theoretically independent sub-problems.
  • the samples in the training set only reflect the information of a single task.
  • face images contain various attribute information such as race, gender, age, etc., and there are correlations between recognition tasks corresponding to different information, and certain relevant information is shared between various tasks in the learning process.
  • an embodiment of the present application provides an image processing method, as shown in FIG. 1, the method includes: S101 to S104.
  • the image data may be offline image files that have been downloaded in the electronic device in advance, or online image files.
  • the image data may be online image data, for example, it may be acquired in real time. image.
  • the online image data corresponds to a certain frame of image or multiple frames of images in a video file
  • the online image data is data that the video file has been sent to the electronic device, for example, the video file is a certain movie
  • the electronic device What is received is the data of the playing time of the movie from 0 to 10 minutes
  • the online image data corresponding to the movie of the movie is data of the movie playing time of 0 to 10 minutes.
  • the client can decode each online image data separately and obtain its corresponding layer to be rendered, and then merge and display it, so that multiple video images can be displayed on the screen.
  • the electronic device includes multiple clients that can play video files.
  • the client of the electronic device plays a video
  • the electronic device can obtain the video file to be played, and then decode the video file, specifically The above-mentioned soft decoding or hard decoding can be used to decode the video file.
  • the to-be-rendered multi-frame image data corresponding to the video file can be obtained, and then the multi-frame image data needs to be rendered before it can be displayed on the display screen.
  • the image data may also be images collected by a designated application in the electronic device through the camera of the electronic device.
  • the camera is called to take the image and the electronic device is requested
  • the device determines the image recognition result through the method of this application, and sends the image recognition result to a designated application, and the designated application executes a corresponding operation according to the image recognition result.
  • S102 Input the image data to be processed into multiple pre-trained specific networks to obtain attribute tags corresponding to the image data.
  • each of the specific networks is used to determine the attribute label corresponding to the image data, and the attribute labels determined by each of the specific networks are different from each other.
  • the sample image data input into the specific network includes multiple attribute tags, for example, the color of the hair in the face image is black, the white car in the vehicle image, etc.
  • the value of each attribute label is 0 or 1. 0 means that this attribute is not possessed, and 1 means that this attribute is not possessed.
  • These attribute labels are the feature values of the image preset in order to obtain the result of image recognition, and they are specific
  • the role of the sexual network determines whether the image includes pre-set attribute tags.
  • each specific network can determine the attribute label corresponding to the image data.
  • each specific network can determine at least one attribute label corresponding to the image data.
  • multiple specific networks include The first specific network and the second specific network, and the attribute label includes label 1, label 2, and label 3.
  • the first specific network is used to identify the recognition result of label 1 corresponding to the image data, that is, if the image data includes the label 1.
  • the image data identified by the first specific network corresponds to label 1, or, a recognition result about label 1 is given, that is, label 1 is 1, and if there is no label 1, the recognition result given is label 1. Is 0.
  • the second specific network is used to determine the label 2 and the label 3.
  • the label 1 and the label 2 and the label 3 are identified by different specific networks, which can improve the recognition efficiency and prevent the label 1, the label 2 and the label 3 from being identical.
  • a specific network recognition causes too much calculation, and the first specific network is only used to recognize label 1, so there is no need to learn and train to recognize labels 2 and 3, which also reduces training costs.
  • multiple specific networks can be executed at the same time, that is, they can be operated at the same time under multiple threads, rather than a cascading relationship, that is, the output result of a specific network does not require other specific The input of the sex network.
  • the structure of the specific network is introduced in the subsequent embodiments.
  • the specific network its main function is to segment the target object from the image, and to recognize the target object, that is, the specific network can also be the target detection network.
  • the specific network is the target object
  • the segmentation and recognition are combined into one.
  • Commonly used target detection networks include GOTURN network, MobileNet-SSD deep convolutional neural network, FasterRCNN neural network, Yolo neural network, and SPP-Net (Spatial Pyramid Pooling) neural network.
  • GOTURN neural network is a target detection algorithm that uses convolutional neural network for offline training. It uses the existing large-scale classification data set pre-trained CNN classification network to extract features and recognize the features.
  • the shared network is used to determine the image recognition result according to each attribute tag and the correlation between each attribute tag.
  • the shared network focuses on learning the shared information of all attribute tags. For example, when the attribute tag with raised corners of the mouth and the attribute tag with rolled eyes appear at the same time, the emotion expressed is thinking, and the attribute tags with raised corners of the mouth and attribute tags with rolled eyes
  • the correlation of is the recognition result that is identified through the shared network and obtained based on the correlation. That is to say, after pre-training, the shared network can recognize the correlation between each attribute tag, and obtain the image recognition result based on it.
  • the way of outputting the image recognition result can be to display the image recognition result on the screen or send it to the request end requesting to obtain the image recognition result.
  • the request end may be a server communicating with the electronic device, or other
  • the electronic device may also be an application installed in the electronic device, and the execution subject of the above method may be the application capable of image recognition in the electronic device, or the operating system of the electronic device.
  • the image recognition result is obtained, the image The recognition result is sent to the requester, and the requester performs an operation based on the image recognition result, such as transaction payment or screen unlocking.
  • the image processing method, device, electronic device, and storage medium provided by the embodiments of the application are pre-trained with a shared network and multiple specific networks, each of the specific networks is used to determine the attribute label corresponding to the image data, and The attribute tags determined by each specific network are different from each other.
  • the image data to be processed is obtained, the image data to be processed is input into multiple specific networks, and each specific network can identify the specificity.
  • the attributes that can be recognized by the network, so that the multiple attribute tags corresponding to the image data to be processed can be separately recognized by multiple specific networks, which improves the recognition of multiple attribute tags of the overall image data, so that image data can be obtained
  • the corresponding attribute tag, and then the attribute tag corresponding to the image data is input to the sharing network, and the sharing network determines the image recognition result according to the correlation between each attribute tag and each attribute tag, and outputs the image recognition result. Therefore, multiple specific networks jointly analyze image data and obtain multiple attribute tags, which can increase the speed of obtaining attribute tags, and the shared network can combine the correlation of each attribute tag to obtain image recognition results, which improves the accuracy and accuracy of the recognition results. Overall performance.
  • FIG. 2 shows an image processing method provided by an embodiment of the present application, and the method includes: S201 to S207.
  • the original image data may be a gray value corresponding to the image, that is, the value of each pixel in the image is a value in the interval [0, 255], that is, the gray value.
  • the image can be a color image, and then the color image is binarized to obtain a grayscale image, then each pixel in the grayscale image
  • the gray value of constitutes the data of the original image.
  • the original image data may be data collected by a camera of an electronic device.
  • the image processing method is applied to real-time analysis of the image data collected by the camera, and the analysis is for face recognition attributes. Analysis.
  • the image data may also be images collected by a designated application in the electronic device through the camera of the electronic device.
  • the designated application performs a certain function, the camera is called to take the image, and the electronic device is requested to pass this application.
  • the method determines the image recognition result, and sends the image recognition result to a designated application, and the designated application executes the corresponding operation according to the image recognition result.
  • the designated application program may be a screen unlocking APP in the electronic device or a payment APP.
  • the screen unlocking APP uses the facial image collected by the camera to perform face recognition to determine the identity information, and determine whether the facial image matches the preset facial image, if it matches, it is determined that the unlocking is successful, if it does not match , It is determined that the unlocking is not successful.
  • the preset face image may be a face image preset by the user, which may be stored in the mobile terminal, or stored in a certain server or memory, and the mobile terminal can be obtained by the server or the memory to be preset.
  • Set the face image may be the preset feature information of the preset face image. If the face image is a two-dimensional image, the preset feature information is the facial feature point information of the face image pre-entered by the user. If the face image is For a three-dimensional image, the preset feature information is the three-dimensional face information of the face image pre-entered by the user.
  • the way to determine whether the face image meets the preset condition is to obtain the feature point information of the face image, and compare the feature information of the collected face image with the preset feature information entered by the user in advance, and if they match, If it is determined that the face image meets the preset condition, it is determined that the face image has the authority to unlock the screen of the mobile terminal, and if it does not match, it is determined that the face image does not meet the preset condition and there is no authority to unlock the screen.
  • the image data includes a face image
  • the recognition of the image data is face recognition.
  • the image collected by the camera is a two-dimensional image. By searching for the facial features in the image, it can be determined whether a face image is collected. If so, the collected face image is sent to the mobile terminal.
  • the processor so that the processor can analyze the face image and perform the screen unlocking operation.
  • the camera includes structured light. According to the three-dimensional information collected by the structured light, it is determined whether there is three-dimensional face information, and if so, the collected image is sent to the processor of the mobile terminal.
  • the face collection reminder information may be displayed on the current interface of the electronic device.
  • the original image data may be normalized by means of mean variance normalization or grayscale transformation normalization.
  • the redundant information refers to the gap between the compressed distributions.
  • the original image data after the normalization process is used as the image data to be processed.
  • the attribute tag that can be recognized by the specific network is an attribute tag corresponding to the specific network.
  • the attribute labels that can be recognized by the specific network are already set when the specific network is trained. Specifically, refer to the subsequent embodiments.
  • each attribute tag in the image data corresponds to a position in the image
  • the position in the image corresponding to the attribute tag of the hair color is the hair position
  • the image corresponding to the attribute tag of the eye color The position in the data is the eye position, so the position corresponding to each attribute tag can be determined in advance.
  • the area identifier corresponding to each attribute tag can be set. Taking a human face image as an example, the region identifiers can be different regions in the human face such as eyes, nose, and hair. Specifically, different attribute tags may correspond to different area identifiers, and may also correspond to the same area identifier.
  • the attribute tag hair color and hair length correspond to the area to identify the hair
  • the attribute tag pupil color corresponds to the area to identify the eye. Since each pixel value in the image data corresponds to a pixel coordinate, after obtaining the area identifier corresponding to each attribute tag, the position of the area identifier corresponding to each attribute tag in the pixel coordinates can be determined, and then it can be determined The position in the image of the attribute tag that can be recognized by each specific network.
  • the image data can be divided into multiple sub-image data, and each sub-image data corresponds to an area in the image, and the attribute tags in the area correspond to the same specific network, that is, each specific network can identify The attribute tag of is located in the area corresponding to the sub-image data.
  • the image sub-region corresponding to each specific network that is, the sub-image data corresponding to each specific network
  • the image is divided into a first area, a second area, and a third area.
  • the attribute labels that can be recognized by the first specific network are distributed in the first area
  • the attribute labels that can be recognized by the second specific network are distributed in the first area.
  • the attribute tags that can be recognized by the third specific network are distributed in the third area
  • the image data is divided into three sub-image data, namely the first sub-image data, the second sub-image data, and the third sub-image data.
  • the first sub-image data corresponds to the first area
  • the second sub-image data corresponds to the second area
  • the third sub-image data corresponds to the third area.
  • the image can be adjusted in one direction uniformly, so that the designated area is located in the same position.
  • S205 Input the sub-image data into a specific network corresponding to the sub-image data.
  • the image after determining the area in the image where the attribute tag corresponding to each specific network is located, the image can be divided into multiple areas, and each area corresponds to a specific network, according to the determined The multiple regions divide the image into multiple sub-images, and the pixel data corresponding to each sub-image is used as the sub-image data, that is, the pixel data after S202 processing.
  • the above-mentioned first area, second area, and third area and then the image is divided into three sub-images, namely the first sub-image, the second sub-image and the third sub-image, and then the first sub-image
  • the first sub-image Input the pixel data corresponding to each pixel value in the first specific network into the first specific network, input the pixel data corresponding to each pixel value in the second sub-image into the second specific network, and input the pixel corresponding to each pixel value in the third sub-image
  • the data is entered into the third specificity network.
  • the training process can be after S101 and before S102 or after S201 and before S204, or before S101 and S201.
  • the specific network and the shared network may be trained before executing the image recognition method.
  • FIG. 3 shows the training process of the specific network and the shared network training in the image processing method provided by the embodiment of the present application. Specifically, as shown in FIG. 3, the method includes: S310 to S370.
  • S310 Acquire multiple sample image data, each of which corresponds to multiple attribute tags.
  • the sample image data is image data that has been marked, which may be manually marked after the image is obtained in advance, and each marked point corresponds to an attribute label.
  • the sample image data may be a CelebA face attribute data set as an experimental data set.
  • the data set contains about 200,000 face images, and each image provides 40 face attribute annotations and location information of 5 face key points.
  • this article takes about 100,000 face images for network model training, about 10,000 images for verification, and 10,000 images for testing network model.
  • each tag For the public face attribute data set, there are 40 attribute tags corresponding to each face picture, and the value of each tag is 0 or 1. 0 means that it does not have this attribute, and 1 means that it does not have this attribute.
  • both the sample image data and the image data to be processed include face images, that is to say, the image processing method provided in the embodiments of the present application is applied to face attribute recognition, and the training of the specific network and the shared network It is also trained for face attribute recognition.
  • S310 may include: S311 to S314.
  • S311 Acquire multiple sample image data.
  • this step can be referred to the above description, which will not be repeated here.
  • S312 Identify the position information of the key points of the face in each of the sample image data in the sample image.
  • the key points of the human face may be facial features in the face image, for example, the key points of the human face may be eyes, nose, mouth, and so on.
  • the specific recognition method can be the facial image recognition method, for example, PCA (principal component analysis) analysis method to determine the facial features in the face image, thereby determining the position information of the key points of the face in the face image. That is, pixel coordinates.
  • PCA principal component analysis
  • the face area image is obtained, the face area is cropped, the face correction is performed on the sample data to be trained, and the position information of the key points of the face is determined, such as eyes, nose, mouth, etc. .
  • the preset orientation may be that the human face is facing straight ahead.
  • the meaning of the human face facing straight forward is that the forehead part of the human face is in the upper part of the image, and the chin part of the human face is in the lower part of the image.
  • the orientation of the face in the image can be determined by the position information of the key points of the face.
  • the same pixel coordinate system is set for each sample image, that is, the pixel coordinate system can be established with the vertex on the left top of the sample image as the origin, so that the pixel coordinates of the key points of the face in the face image It can be obtained, and the positional relationship between the person’s forehead and chin can be determined through the position information of the key points of the face.
  • the face orientation in the sample image data can be made to conform to the preset orientation by rotating 90° clockwise.
  • the preset orientation may be within 15 degrees of the front of the face.
  • the size of the sample image is also possible to Adjust the size of the sample image to make size adjustments. Specifically, through the positioning of key points on the face, such as eyes, nose, mouth, etc., adjust the direction of the object to be predicted according to the preset direction standard to ensure that each object to be predicted Face within 15 degrees of the front to achieve face alignment, and add a predetermined percentage of margins to the facial area. At the same time, in order to reduce the amount of calculation, set the image size to a specified size, for example, the specified size can be 112*112 .
  • the size of the entire image can be compressed to a specified size, or the sample image can be cropped with a window of a specified size.
  • the center point of the sample image can be used as the center point of the window, and the window size can be detected.
  • the image in the corresponding image area is used as the resized image.
  • the size of the window may be 112*112.
  • S314 Use the adjusted sample image data as sample image data used to train the shared network and multiple initial specific networks this time.
  • the face orientation in each of the sample image data is adjusted to meet the preset orientation, then the face orientation adjustment will be performed
  • the subsequent sample image data is used as the sample image data used to train the shared network and multiple initial specific networks.
  • the above adjustment includes the position information of the key points of the face of each of the sample image data, adjust each If the face orientation in each of the sample image data conforms to the preset orientation and the size of the sample image is adjusted to the specified size, then the sample image data after the face orientation adjustment and the size adjustment are used as this time for training the sharing Sample image data of the network and multiple initial specific networks.
  • S310 may include: S311 , S315, S316 and S317.
  • S311 Acquire multiple sample image data.
  • this step can be referred to the above description, which will not be repeated here.
  • S315 Perform data enhancement processing on the multiple sample image data, so that the light intensity and contrast of each sample image data are randomly distributed within a preset interval.
  • the light intensity of the sample image data is transformed according to a preset light intensity interval to obtain data in which the light intensity of each sample image data is randomly distributed in the preset light intensity interval; and the light intensity of the sample image data is transformed according to the preset contrast interval.
  • the contrast of the sample image data is obtained by obtaining data in which the contrast of each sample image data is randomly distributed in the preset contrast interval.
  • the preset light intensity interval may be a preset light intensity area, and after the light intensity of each pixel in the sample image is obtained, the light intensity of the pixel may be adjusted to be within the preset light intensity interval .
  • the distribution of the light intensity of each pixel in the sample image can be counted, so that the pixels with higher light intensity are also located at a higher light intensity value within the preset light intensity interval, and the light intensity The lower pixels are also located at the lower value of the light intensity in the preset light intensity interval.
  • the continuity of the light intensity distribution of each pixel in the preset light intensity interval can be increased, that is, multiple pixel values In the distribution sub-region of the light intensity, the intensity value difference between two adjacent sub-regions is not greater than the specified value, so that the light intensity is randomly distributed within the preset light intensity interval, so as to increase the diversity of the data.
  • the illumination intensity of the pixels in each sample image data can be randomly distributed within its corresponding preset illumination intensity interval, and the preset illumination intensity interval corresponding to each sample image data may not be all the same, thereby further increasing the data Diversity, thereby improving the generalization of later model training.
  • the contrast of the object to be trained is transformed according to the preset contrast interval, and the data that the contrast of each object to be trained is randomly distributed in the preset contrast interval can be referred to the above-mentioned adjustment process of the light intensity. Therefore, the contrast of the pixels in each sample image data can be randomly distributed within its corresponding preset contrast interval, and the preset contrast interval corresponding to each sample image data may not be all the same, thereby further increasing the variety of data It improves the generalization of later model training.
  • the object to be trained is cropped according to a preset random cropping ratio and adjusted to a preset size, where the preset size may be 112*112; the object to be trained is flipped horizontally.
  • the above-mentioned cropping of the object to be trained according to a preset random cropping ratio and adjusting it to a preset size can refer to the aforementioned cropping method of adjusting to a specified size, and the preset size is the same as the specified size.
  • S317 Use the trimmed sample image data as the sample image data used to train the shared network and multiple initial specific networks this time.
  • steps S311 to S314 can replace S310, that is, execute S320 after S311, S312, S313, and S314, or it can replace S310 with steps S311, S315, S316, and S317, that is, execute S311, S315, S316, and S316.
  • S320 is executed after S317, or S310 is replaced with S311 to S317, that is, S311, S312, S313, S314, S315, S316, and S317 are executed before S320.
  • S320 Set up a shared network and multiple specific networks.
  • Each specific network can recognize at least one attribute tag, and the attribute tags that each specific network can recognize are different from each other. If a specific network is configured for each attribute tag, there will be huge computational overhead. For example, taking face image recognition as an example, assuming that there are a total of 40 attribute tags, if 40 face attributes are regarded as 40 independent tasks, directly treating 40 face attributes as 40 independent tasks has huge computational overhead. , And ignores the display position correlation between face attributes. Therefore, multiple regions can be divided for a human face, and each region corresponds to a specific network.
  • the specific implementation of setting a shared network and multiple specific networks may be to divide multiple measurement regions, each of which is Corresponding to different areas of the face; multiple specific networks are set up according to the multiple measurement areas, each specific network corresponds to a measurement area, and each specific network is used to confirm the attribute label in the corresponding measurement area .
  • the attribute labels are divided into four groups, namely the upper group, the middle group, the lower group, and the full face group.
  • Each group corresponds to the attribute label, and the attribute labels of each group are different, that is, each specific network Be able to identify the attribute tag of the corresponding group.
  • the attribute classification of each group can be regarded as a separate attribute learning task.
  • the sample image is divided into four regions, which are an upper region m1, a middle region m2, a lower region m3, and a full face region m4.
  • the upper region m1 is The area between the top side of the image and the abscissa of the position of the lowermost eye in both eyes.
  • the abscissa of the position of the lowermost eye in both eyes is set with one The straight line parallel to the abscissa axis is denoted as the first straight line, and the area between the first straight line and the top side of the image is regarded as the upper area.
  • a straight line parallel to the abscissa axis is set, which is marked as the second straight line, and the interval between the first straight line and the second straight line is regarded as the middle part Area, the area between the second straight line and the bottom side of the image is taken as the lower area, and the area between the end of the chin of the face and the top of the hair is taken as the full face area, where the full face area can enclose the face and hair.
  • the above-mentioned areas are measurement areas, that is, the upper area m1, the middle area m2, the lower area m3, and the full face area m4 are four measurement areas.
  • each task has an independent specific network.
  • the parameters of the specificity network between tasks are not shared.
  • the shared network does not correspond to a specific learning task, but instead extracts complementary information between tasks in order to learn the correlation between tasks.
  • a simple connection unit can be used to connect the specific network and the shared network, so as to maximize the information flow between the two.
  • the connection relationship between the specific network and the shared network is shown in Figure 7.
  • the input of each layer of the shared network includes, in addition to the output characteristics of the previous layer, the above The output characteristics of all specific networks in one layer.
  • FIG. 7 only shows the connection relationship between the two specific networks and the shared network, and the connection relationship for the four specific networks can also be reasonably obtained with reference to FIG. 7.
  • the multi-task attribute recognition model constructed in the embodiment of the present application includes 4 specific networks and 1 shared network.
  • the specific network focuses on learning the specific features of each feature group, while the shared network focuses on learning the shared information of all feature groups.
  • the specific network and the sharing network are connected and information exchanged through the local sharing unit, thereby forming the entire local sharing multi-task face multi-attribute classification network.
  • Each specific network and shared network have the same network structure, including 5 convolutional layers and 2 fully connected layers. At the same time, each convolutional layer and fully connected layer are followed by a normalization layer and a ReLU (Rectified Linear Unit) activation layer.
  • the number of output channels in each layer between specific networks is also the same, while the number of output channels in shared networks is different from that of specific networks.
  • S330 Input the multiple sample image data into the shared network and multiple specific networks for training, so as to obtain a trained shared network and multiple specific networks.
  • each specific network can only identify the corresponding attribute label, for example, the upper specific network can identify the upper part The attribute tag of the group, and other attribute tags are not recognized.
  • sample image data can be input for each specific network, specifically, according to the attribute label corresponding to each specific network.
  • the sample image data is divided into a plurality of sub-sample image data, and the sub-sample image data is input to a specific network corresponding to the sub-image data.
  • the same sample image is divided into four sub-sample images.
  • the upper left image is the sub-sample image data corresponding to the upper region m1 in the sample image as shown in 6, and the upper right image is the sample as shown in 6.
  • the sub-sample image data corresponding to the upper region m1 is input to the upper specific network for training the upper specific network
  • the sub-sample image data corresponding to the middle area m2 is input to the middle specific network, using To train the middle specific network, input the sub-sample image data corresponding to the lower region m3 into the lower specific network for training the lower specific network, and input the sub-sample image data corresponding to the full-face region m4 into the full-face specific network, using To train the full face specific network.
  • part of the sample image data obtained in step S311 is used to train the above-mentioned network model, and the other part is used to test the above-mentioned network model.
  • two types of samples of the object to be trained are proportionally random Divided into training set and test set, the division ratio is 8:2.
  • the training set is used to train the face multi-attribute recognition model
  • the test set is used to test the face multi-attribute recognition model to ensure that the data of the same person only appears in In a collection.
  • the test data set is sent to the trained specific network and shared network for testing to verify the accuracy of the network model, and obtain the sample data of the above-mentioned test data set to judge the error, and the above-mentioned test data set to judge the error
  • the samples of is sent to the network model for training again to fine-tune the network model and improve the generalization of the model.
  • the Adam gradient descent algorithm is adopted.
  • Adam is an efficient calculation method that can improve the convergence speed of the gradient descent.
  • the training set is input to the convolutional neural network model and iterated for a preset number of epochs. This method is set to 90 epochs.
  • the Adam gradient descent algorithm is used to optimize the objective function.
  • This method sets batch_size to 64, that is, 64 input images are sent to each round of training. Among them, the specific network and the shared network are both trained based on the convolutional neural network model.
  • this method uses cross-entropy as a loss function for training.
  • This function is used as a standard to measure the cross-entropy between the target and the output.
  • the formula is as follows:
  • m represents the total number of attributes
  • n i represents the total number of samples for the i-th attribute. Represents the label value of the jth sample of the i-th attribute, and Refers to the predicted value of the jth sample of the i-th attribute.
  • the detection of the face orientation can be added to obtain the image data to be processed. Specifically, it can be determined that the current image contains the face image, and then the currently collected face is determined. Whether the face orientation in the image meets the preset orientation. Specifically, taking a camera in an electronic device to collect a face image as an example, the electronic device calls the camera to collect a face image in response to a face recognition request, and recognizes the location information of the key points of the user's face in the face image, so as to determine Whether the orientation of the face is the preset orientation.
  • the face orientation of the image on the left is the right side of the user whose face image is collected, and several key points on the image can be determined, namely, left eye a1, right eye a2, and nose a3. And lips a4, and using the vertical symmetry line of the image, it can be seen that the right eye a2, nose a3, and lips a4 are all located on the left side of the symmetry line, so that it can be determined that the user’s face is shifted to the right.
  • a collection frame is displayed on the screen, and the user needs to move his face in the collection frame, and if the user is not facing the screen or the face orientation does not meet the preset orientation, a reminder message can be issued The user adjusts the face orientation for the camera.
  • the left eye b1, right eye b2, nose b3, and lips b4 are located near the line of symmetry, and the left eye b1 and right eye b2 are on both sides of the line of symmetry. If the face is facing the screen, it conforms to the preset orientation, and then the face image can be collected normally, and the post-recognition can be performed.
  • S350 Input the image data to be processed into multiple pre-trained specific networks to obtain attribute tags corresponding to the image data.
  • the face image not only includes facial attribute information such as facial features, race, gender, age, expression, etc., but also can express the person's identity information. Therefore, face attribute recognition has broad application prospects in the fields of age-related access control, face retrieval of face attributes, security and human-computer interaction.
  • FIG. 10 shows an image processing method provided by an embodiment of the present application.
  • the method includes: S1001 to S1004.
  • S1001 Acquire a plurality of sample image data, each of the sample image data corresponds to a plurality of attribute tags.
  • each of the specific networks can identify at least one attribute tag, and the attribute tags that can be recognized by each specific network are different from each other.
  • S1003 Input the multiple sample image data into the shared network and multiple specific networks for training, so as to obtain a trained shared network and multiple specific networks.
  • S1001 to S1003 are the training process of the shared network and multiple specific networks, and the specific implementation can refer to the aforementioned S310 to S330, which will not be repeated here.
  • S1004 Obtain image data to be processed, and process the image data to be processed according to the trained shared network and multiple specific networks to obtain an image recognition result.
  • FIG. 11 shows a structural block diagram of an image processing apparatus 1100 provided by an embodiment of the present application.
  • the apparatus may include: a data acquisition unit 1110, an attribute determination unit 1120, a result acquisition unit 1130, and an output unit 1140.
  • the data acquisition unit 1110 is used to acquire image data to be processed.
  • the attribute determining unit 1120 is configured to input the image data to be processed into multiple pre-trained specific networks to obtain attribute labels corresponding to the image data, wherein each specific network is used to determine the The attribute labels corresponding to the image data, and the attribute labels determined by each of the specific networks are different from each other.
  • the result obtaining unit 1130 is configured to input the attribute label determined by each specific network into a pre-trained shared network to obtain an image recognition result, wherein the shared network is used to obtain an image recognition result according to each attribute label and each attribute label.
  • the correlation determines the result of image recognition.
  • the output unit 1140 is configured to output the image recognition result.
  • FIG. 12 shows a structural block diagram of an image processing apparatus 1200 provided by an embodiment of the present application.
  • the apparatus may include: a training unit 1210, a data acquisition unit 1220, an attribute determination unit 1230, a result acquisition unit 1240, and an output Unit 1250.
  • the training unit 1210 is used to train the shared network and multiple specific networks.
  • the training unit 1210 includes a sample acquisition subunit 1211, a setting subunit 1212, and a training subunit 1213.
  • the obtaining subunit 1211 is configured to obtain multiple sample image data, and each of the sample image data corresponds to multiple attribute tags.
  • the setting subunit 1212 is used to set a shared network and multiple specific networks, each of the specific networks can recognize at least one attribute tag, and the attribute tags that can be recognized by each specific network are different from each other.
  • the training subunit 1213 is configured to input the multiple sample image data into the shared network and multiple specific networks for training, so as to obtain a trained shared network and multiple specific networks.
  • sample image data and the image data to be processed both include a face image
  • the obtaining subunit 1211 is also used to obtain a plurality of sample image data; identifying that the key points of the face in each of the sample image data are located at all.
  • the position information in the sample image; according to the position information of the key points of the face of each of the sample image data, the face orientation in each of the sample image data is adjusted to meet the preset orientation; the adjusted sample image data As the sample image data used to train the shared network and multiple initial specific networks this time.
  • the acquiring subunit 1211 is also used to acquire multiple sample image data; perform data enhancement processing on the multiple sample image data, so that the light intensity and contrast of each sample image data are randomly within a preset interval. Distribution; each of the sample image data after the enhancement process is cut according to a preset random trimming ratio, and the size of each trimmed sample image data is a preset size; the trimmed sample image data is used as This time it is used to train the sample image data of the shared network and multiple initial specific networks.
  • sample image data and the image data to be processed both include face images, and multiple attribute tags corresponding to each of the sample image data correspond to different positions of the face; the setting subunit 1212 is also used to divide multiple images.
  • Each measurement area corresponds to a different area of the face; according to the multiple measurement areas, a plurality of specific networks are set, each specific network corresponds to a measurement area, and each specific network is used for Confirm the attribute label in the corresponding measurement area.
  • the data acquisition unit 1220 is used to acquire image data to be processed.
  • the data acquisition unit 1220 is also used to acquire original image data; normalize the original image data to obtain the image data to be processed.
  • the attribute determining unit 1230 is configured to input the image data to be processed into multiple pre-trained specific networks to obtain attribute labels corresponding to the image data, wherein each specific network is used to determine the The attribute labels corresponding to the image data, and the attribute labels determined by each of the specific networks are different from each other.
  • the attribute determining unit 1230 is further configured to determine the attribute label corresponding to each specific network, wherein the attribute label that can be recognized by the specific network is the attribute label corresponding to the specific network;
  • the attribute tag corresponding to the specific network divides the image data into a plurality of sub-image data; and the sub-image data is input into the specific network corresponding to the sub-image data.
  • the result obtaining unit 1240 is configured to input the attribute label determined by each specific network into a pre-trained shared network to obtain an image recognition result, wherein the shared network is used to obtain an image recognition result according to each attribute label and each attribute label.
  • the correlation determines the result of image recognition.
  • the output unit 1250 is configured to output the image recognition result.
  • FIG. 13 shows a structural block diagram of an image processing apparatus 1300 provided by an embodiment of the present application.
  • the apparatus may include: a sample acquisition unit 1310, a setting unit 1320, a network training unit 1330, and a recognition unit 1340.
  • the sample obtaining unit 1310 is configured to obtain multiple sample image data, and each of the sample image data corresponds to multiple attribute tags.
  • the setting unit 1320 is configured to set a shared network and multiple specific networks, each of the specific networks can recognize at least one attribute tag, and the attribute tags that can be recognized by each specific network are different from each other.
  • the network training unit 1330 is configured to input the multiple sample image data into the shared network and multiple specific networks for training, so as to obtain a trained shared network and multiple specific networks.
  • the sample acquisition unit 1310, the setting unit 1320, and the network training unit 1330 correspond to the above-mentioned training unit 1210.
  • the sample acquisition unit 1310 corresponds to the acquisition subunit
  • the specific implementation of the sample acquisition unit 1310 can refer to the acquisition subunit
  • the setting unit 1320 corresponds to the setting subunit
  • the specific implementation of the setting unit 1320 can refer to the setting subunit
  • the network training unit 1330 corresponds to the training subunit
  • the specific implementation of the network training unit 1330 can refer to the training subunit.
  • the recognition unit 1334 is configured to obtain image data to be processed, and process the image data to be processed according to the trained shared network and multiple specific networks to obtain an image recognition result.
  • the recognition unit 1334 is used to obtain image data to be processed; input the image data to be processed into multiple pre-trained specific networks to obtain attribute tags corresponding to the image data, wherein each The specificity network is used to determine the attribute label corresponding to the image data, and the attribute label determined by each specificity network is different from each other; the attribute label determined by each specificity network is input into a pre-trained shared network , To obtain the image recognition result, wherein the shared network is used to determine the image recognition result according to the correlation between each attribute tag and each attribute tag; and output the image recognition result.
  • the identification unit 1334 corresponds to a data acquisition unit, an attribute determination unit, a result acquisition unit, and an output unit.
  • the data acquisition unit, attribute determination unit, result acquisition unit and output unit please refer to the data acquisition unit, attribute determination unit, result acquisition unit and output unit.
  • the coupling between the modules may be electrical, mechanical or other forms of coupling.
  • each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software function modules.
  • the electronic device 100 may be an electronic device capable of running application programs, such as a smart phone, a tablet computer, or an e-book.
  • the electronic device 100 in this application may include one or more of the following components: a processor 110, a memory 120, and one or more application programs, where one or more application programs may be stored in the memory 120 and configured to be configured by One or more processors 110 execute, and one or more programs are configured to execute the methods described in the foregoing method embodiments.
  • the processor 110 may include one or more processing cores.
  • the processor 110 uses various interfaces and lines to connect various parts of the entire electronic device 100, and executes by running or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and calling data stored in the memory 120.
  • Various functions and processing data of the electronic device 100 may use at least one of digital signal processing (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA Programmable Logic Array
  • the processor 110 may be integrated with one or a combination of a central processing unit (CPU), a graphics processing unit (GPU), a modem, and the like.
  • the CPU mainly processes the operating system, user interface, and application programs; the GPU is used to render and draw the display content; the modem is used to process wireless communication. It can be understood that the above-mentioned modem may not be integrated into the processor 110, but may be implemented by a communication chip alone.
  • the memory 120 may include random access memory (RAM) or read-only memory (Read-Only Memory).
  • the memory 120 may be used to store instructions, programs, codes, code sets or instruction sets.
  • the memory 120 may include a program storage area and a data storage area, where the program storage area may store instructions for implementing the operating system and instructions for implementing at least one function (such as touch function, sound playback function, image playback function, etc.) , Instructions used to implement the following various method embodiments, etc.
  • the data storage area can also store data (such as phone book, audio and video data, chat record data) created by the electronic device 100 during use.
  • FIG. 15 shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application.
  • the computer readable medium 1500 stores program code, and the program code can be invoked by a processor to execute the method described in the foregoing method embodiment.
  • the computer-readable storage medium 1500 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the computer-readable storage medium 1500 includes a non-transitory computer-readable storage medium.
  • the computer-readable storage medium 1500 has storage space for the program code 1510 for executing any method steps in the above-mentioned methods. These program codes can be read from or written into one or more computer program products.
  • the program code 1510 may be compressed in an appropriate form, for example.
  • the image processing method, device, electronic equipment, and storage medium provided by this application are pre-trained with a shared network and multiple specific networks, and each specific network is used to determine the attribute corresponding to the image data Tags, and the attribute tags determined by each specific network are different from each other, when the image data to be processed is obtained, the image data to be processed is input into multiple specific networks, and each specific network can identify The attributes that can be identified by the specific network, so that the multiple attribute tags corresponding to the image data to be processed can be identified by multiple specific networks, which improves the recognition of multiple attribute tags of the overall image data, thereby enabling The attribute label corresponding to the image data is obtained, and then the attribute label corresponding to the image data is input to the sharing network, and the sharing network determines the image recognition result according to the correlation between each attribute label and each attribute label, and outputs the image recognition result. Therefore, multiple specific networks jointly analyze image data and obtain multiple attribute tags, which can increase the speed of obtaining attribute tags, and the shared network can combine the correlation of each attribute tag to obtain image recognition results
  • This application proposes a face multi-attribute recognition algorithm and system based on deep learning.
  • the method divides 40 face attributes into 4 face attribute groups according to the image location corresponding to the attributes, taking into account the display between face attributes For location correlation, the attribute classification problem of each attribute group is regarded as a subtask, and a model including 4 specific networks and 1 shared network is constructed.
  • the specificity network is designed to learn the specificity between tasks, so each attribute group is configured with a separate specificity network, and the shared network is designed to learn the complementary information between tasks and promote the interaction between tasks.
  • the rich connection between the specific network and the shared network promotes the exchange of information between each other, is conducive to mining the correlation between tasks, and improves the overall performance.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种图像处理方法、装置、电子设备及存储介质,涉及图像处理技术领域。该方法包括:获取待处理的图像数据;将待处理的图像数据输入预先训练的多个特异性网络,以获取图像数据对应的属性标签,其中,每个特异性网络用于确定图像数据对应的属性标签,且各个特异性网络确定的属性标签互不相同;将每个特异性网络所确定的属性标签输入预先训练好的共享网络,以获取图像识别结果;输出图像识别结果。因此,多个特异性网络共同分析图像数据并得到多个属性标签,能够提高属性标签的获得速度,而共享网络能够结合各属性标签的相关性得到图像识别结果,提高了识别结果的准确度和整体性能。

Description

图像处理方法、装置、电子设备及存储介质
相关申请的交叉引用
本申请要求于2019年10月22日提交中国专利局的申请号为CN201911007790.5、名称为“图像处理方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,更具体地,涉及一种图像处理方法、装置、电子设备及存储介质。
背景技术
现有的图像的属性识别技术方案主要是基于传统的机器学习的属性识别方案和基于卷积神经网络模型的属性识别方案等。但是,现有的图像的属性识别技术,最常用的是基于单一的模型并实现单一的属性判断,而在多属性识别的时候,效率并不高。
发明内容
本申请提出了一种图像处理方法、装置、电子设备及存储介质,以改善上述缺陷。
第一方面,本申请实施例提供了一种图像处理方法,包括:获取待处理的图像数据;将所述待处理的图像数据输入预先训练的多个特异性网络,以获取所述图像数据对应的属性标签,其中,每个所述特异性网络用于确定所述图像数据对应的属性标签,且各个所述特异性网络确定的属性标签互不相同;将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络,以获取图像识别结果,其中,所述共享网络用于根据各属性标签和各属性标签的相关性确定图像识别结果;输出所述图像识别结果。
第二方面,本申请实施例还提供了一种图像处理方法,包括:获取多个样本图像数据,每个所述样本图像数据对应多个属性标签;设置共享网络和多个特异性网络,每个所述特异性网络能够识别至少一个属性标签,且每个所述特异性网络能够识别的属性标签互不相同;将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练,以得到训练后的共享网络和多个特异性网络;获取待处理的图像数据,根据所述训练后的共享网络和多个特异性网络对所述待处理的图像数据处理,得到图像识别结果。
第三方面,本申请实施例还提供了图像处理装置,包括:数据获取单元、属性确定单元、结果获取单元和输出单元。数据获取单元,用于获取待处理的图像数据。属性确定单元,用于将所述待处理的图像数据输入预先训练的多个特异性网络,以获取 所述图像数据对应的属性标签,其中,每个所述特异性网络用于确定所述图像数据对应的属性标签,且各个所述特异性网络确定的属性标签互不相同。结果获取单元,用于将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络,以获取图像识别结果,其中,所述共享网络用于根据各属性标签和各属性标签的相关性确定图像识别结果。输出单元,用于输出所述图像识别结果。
第四方面,本申请实施例还提供了图像处理装置,包括:样本获取单元、设置单元、网络训练单元和识别单元。样本获取单元,用于获取多个样本图像数据,每个所述样本图像数据对应多个属性标签。设置单元,用于设置共享网络和多个特异性网络,每个所述特异性网络能够识别至少一个属性标签,且每个所述特异性网络能够识别的属性标签互不相同。训练单元,用于将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练,以得到训练后的共享网络和多个特异性网络。识别单元,用于获取待处理的图像数据,根据所述训练后的共享网络和多个特异性网络对所述待处理的图像数据处理,得到图像识别结果。
第五方面,本申请实施例还提供了一种电子设备,包括:一个或多个处理器;存储器;一个或多个应用程序,其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行上述方法。
第六方面,本申请实施例还提供了一种计算机可读介质,所述可读存储介质存储有处理器可执行的程序代码,所述程序代码中的多条指令被所述处理器执行时使所述处理器执行上述方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本申请一实施例提供的图像处理方法的方法流程图;
图2示出了本申请另一实施例提供的图像处理方法的方法流程图;
图3示出了本申请又一实施例提供的图像处理方法的方法流程图;
图4示出了本申请一实施例提供的图3所示的图像处理方法的中S310的方法流程图;
图5示出了本申请另一实施例提供的图3所示的图像处理方法的中S310的方法流程图;
图6示出了本申请实施例提供的测量区域的示意图;
图7示出了本申请实施例提供的特异性网络和共享网络的连接示意图;
图8示出了本申请实施例提供的子图像数据的示意图;
图9示出了本申请实施例提供的人脸朝向的示意图;
图10示出了本申请再一实施例提供的图像处理方法的方法流程图;
图11示出了本申请一实施例提供的图像处理装置的模块框图;
图12示出了本申请另一实施例提供的图像处理装置的模块框图;
图13示出了本申请又一实施例提供的图像处理装置的模块框图;
图14示出了本申请实施例提供的电子设备的模块框图;
图15出了本申请实施例提供的用于保存或者携带实现根据本申请实施例的图形处理方法的程序代码的存储单元。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
人脸识别是一种基于人类脸部表象特征来鉴别不同人身份的技术,应用场景广泛,相关研究和应用已有数十年之久。随着近年来大数据和深度学习等相关技术的发展,人脸识别效果有了突飞猛进的提高,在身份认证、视频监控、美颜娱乐等场景应用愈加广泛。其中,人证比对问题,即在标准证件照与生活照之间的人脸识别问题,由于识别目标人仅需要在数据库中部署其证件照,免去了目标人在系统中采集生活照进行注册的麻烦,正得到越来越多的关注。
现有的人脸属性识别技术方案主要是基于传统的机器学习的属性识别方案和基于卷积神经网络CNN模型的属性识别方案等等。
一些人脸识别技术中,借鉴了多任务学习的概念,在应用人脸检测算法从图像或影像中提取人脸区域后,再利用卷积神经网络学习人脸库中预设的分析任务的卷积层,得到人脸分析模型,并完成对人脸情绪的预测。
另一些人脸识别技术中,基于多任务学习的思想,通过在训练过程中加入性别、是否微笑、是否戴眼镜、姿态等辅助信息实现了多任务级联学习,但是该技术将人脸的多种属性作为标签,通过级联实现人脸对齐。
多任务学习方法不仅仅进行单一的人脸属性识别,同时也可以实现多属性预测。例如,可以通过将多任务学习方法引入人脸图像种族和性别识别中,以不同语义作为不同任务,提出基于语义的多任务特征选择,应用于种族和性别识别中,但是在构建网络结构时,依旧将种族和性别作为两个任务单独求解,模型存在大量的冗余,且不能实时预测。
因此,现有的人脸属性识别技术,最常用的是基于单一的模型并实现单一的属性判断,即在统一的模型下一次只学习一个任务,将复杂问题先分解成理论上独立的子问题,在每个子问题中,训练集中的样本只反映单个任务的信息。但是人脸图像蕴含着种族、性别、年龄等各种各样的属性信息,对应不同信息的识别任务间存在相关性,在学习过程中各个任务之间共享一定的相关信息。将多任务学习方法引入人脸图像种族和性别、年龄识别中,以不同语义作为不同任务,提出基于语义的多任务特征选择,应用于多属性识别,能显著提高学习系统的泛化能力和识别效果。
虽然,针对上述问题,出现了基于多任务学习的人脸图像种族和性别识别方法。它将多任务学习方法引入人脸图像种族和性别识别中,以不同语义作为不同任务,提出基于语义的多任务特征选择,尽管显著的提高了学习系统的泛化能力和识别效果,但是采用的是传统的机器学习的方式,在效率方面大打折扣。
现有的专利也有提出将深度学习技术与属性识别任务结合起来,但是只采用了多任务学习的方式,提出了三阶段的训练流程,分别在卷积网络上学习到人脸部位、人脸动作单元和情绪空间值这三个特征,来完成人脸情绪分析的任务,并没有实现多属性的输出结果。
因为,为了解决上述缺陷,本申请实施例提供了一种图像处理方法,如图1所示,该方法包括:S101至S104。
S101:获取待处理的图像数据。
其中,图像数据可以是预先已经下载在电子设备内的离线的图像文件,也可以是在线的图像文件,于本申请实施例中,该图像数据可以是在线图像数据,例如,可以是实时获取的图像。
其中,该在线图像数据对应一个视频文件内某一帧图像或者多帧图像,而该在线图像数据为该视频文件已发送至电子设备内的数据,例如,该视频文件是某某电影,电子设备所接收的是该某某电影的播放时间0至10分钟的数据,则该某某电影对应的在线图像数据为某某电影的播放时间0至10分钟的数据。则客户端在获取到每个在线图像数据之后能够将每个在线图像数据各自解码并得到各自对应的待渲染图层,然后合并显示,从而能够在屏幕上显示多个视频画面。
作为一种实施方式,电子设备内包括多个能够播放视频文件的客户端,当电子设备的客户端播放视频的时候,电子设备能够获取欲播放的视频文件,然后再对视频文件解码,具体地,可以采用上述的软解码或者硬解码对视频文件解码,在解码之后就能够获取视频文件对应的待渲染的多帧图像数据,之后需要将多帧图像数据渲染之后才能够在显示屏上显示。
作为另一种实施方式,该图像数据还可以是电子设备内的指定应用程序通过电子设备的摄像头采集的图像,具体地,指定应用程序执行某个功能的时候,调用摄像头拍摄图像,并且请求电子设备通过本申请的方法确定图像识别结果,并且将该图像识别结果发送至指定应用程序,由该指定应用程序根据该图像识别结果执行对应的操作。
S102:将所述待处理的图像数据输入预先训练的多个特异性网络,以获取所述图像数据对应的属性标签。
其中,每个所述特异性网络用于确定所述图像数据对应的属性标签,且各个所述特异性网络确定的属性标签互不相同。
具体地,在预先对特异性网络学习的时候,输入特异性网络内的样本图像数据包括多个属性标签,例如,人脸图像中的头发的颜色为黑,车辆图像内的白色汽车等,而每个属性标签的取值为0或者1,0表示不具备这项属性,1表示不具备这项属性,而这些属性标签是为了得到图像识别结果而预先设定的图像的特征值,而特异性网络的作用确定图像内是否包括预先设定的属性标签。
具体地,每个特异性网络都能够确定图像数据对应的属性标签,在一些实施例中,可以是每个特异性网络能够确定图像数据对应的至少一个属性标签,例如,多个特异性网络包括第一特异性网络和第二特异性网络,而属性标签包括标签1、标签2和标签3,则第一特异性网络用于识别图像数据对应的标签1的识别结果,即如果图像数据包括标签1,则第一特异性网络识别的图像数据对应有标签1,或者,给出一个关于标签1的识别结果,即标签1为1,而如果没有标签1,则给出的识别结果是标签1为0。而第二特异性网络用于确定标签2和标签3,因此,标签1与标签2和标签3分开由不同的特异性网络识别,能够提高识别效率,避免标签1、标签2和标签3由同一个特异性网络识别而导致计算量过大,并且,第一特异性网络仅用于识别标签1,则就不需要学习和训练针对标签2和3的识别,也减少了训练成本。
另外,需要说明的是,多个特异性网络可以是同时执行的,即可以是多个线程下同时操作,而并非是一个级联的关系,即某个特异性网络的输出结果不需要其他特异性网络的输入。具体地,特异性网络的结构在后续实施例中介绍。
本申请实施例中,特异性网络,其主要作用是从图像中分割出目标物体,并对该目标物体进行识别,即特异性网络也可以是目标检测网络,显然,特异性网络是将目标物的分割和识别合二为一。常用的目标检测网络有GOTURN网络、MobileNet-SSD深度卷积神经网络、FasterRCNN神经网络、Yolo神经网络以及SPP-Net(Spatial Pyramid Pooling)神经网络等。GOTURN神经网络是一种利用卷积神经网络进行离线训练的目标检测算法,其利用现有大规模分类数据集预训练的CNN分类网络提取特征并对该特征进行识别。
S103:将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络,以获取图像识别结果。
其中,所述共享网络用于根据各属性标签和各属性标签的相关性确定图像识别结果。具体地,共享网络则重点学习所有属性标签的共享信息,例如,嘴角上扬的属性标签与翻白眼的属性标签同时出现的时候,所表达的情绪为思考,而嘴角上扬的属性标签与翻白眼的属性标签的相关性就是通过共享网络识别到并且根据该相关性得到的识别结果。也就是说,共享网络经过预先训练之后,能够识别到各个属性标签之间的相关性,并根据其得到图像识别结果。
S104:输出所述图像识别结果。
其中,输出图像识别结果的方式可以是,将图像识别结果在屏幕上显示,或者发送至请求获取该图像识别结果的请求端,该请求端可以是与该电子设备通信的服务器,也可以是其他的电子设备,还可以是电子设备内安装的应用程序,则上述方法的执行主体可以是电子设备内的能够图像识别的应用程序可以是电子设备的操作系统,则得到图像识别结果之后,将图像识别结果发送至请求端,请求端根据该图像识别结果执行某个操作,例如,交易支付或者屏幕解锁等。
本申请实施例提供的图像处理方法、装置、电子设备及存储介质,预先训练好共享网络和多个特异性网络,每个所述特异性网络用于确定所述图像数据对应的属性标签,且各个所述特异性网络确定的属性标签互不相同,则当获取到待处理的图像数据的时候,将待处理的图像数据输入多个特异性网络,每个特异性网络能够识别到该特异性网络所能够识别到的属性,从而待处理图像数据所对应的多个属性标签能够被多个特异性网络分别识别到,提高了整体的图像数据的多个属性标签的识别,从而能够得到图像数据对应的属性标签,然后,再将该图像数据对应的属性标签输入至共享网络,共享网络根据各属性标签和各属性标签的相关性确定图像识别结果,并将图像识别结果输出。因此,多个特异性网络共同分析图像数据并得到多个属性标签,能够提高属性标签的获得速度,而共享网络能够结合各属性标签的相关性得到图像识别结果,提高了识别结果的准确度和整体性能。
请参阅图2,示出了本申请实施例提供的图像处理方法,该方法包括:S201至S207。
S201:获取原始图像数据。
其中,原始图像数据为可以是图像对应的灰度值,也就是说,图像内的每个像素的数值为[0,255]区间的数值,即灰度值。则作为一种实施方式,在电子设备获取到图像的时候,该图像可以是一个彩色图像,则对该彩色图像做二值化处理,得到灰度图,则该灰度图内的每个像素的灰度值构成了该原始图像的数据。
另外,需要说明的是,该原始图像数据可以是电子设备的摄像头采集的数据,例如,而图像处理方法的应用于该摄像头采集的图像数据的实时分析,并且,该分析是针对人脸识别属性的分析。
具体地,该图像数据还可以是电子设备内的指定应用程序通过电子设备的摄像头采集的图像,具体地,指定应用程序执行某个功能的时候,调用摄像头拍摄图像,并且请求电子设备通过本申请的方法确定图像识别结果,并且将该图像识别结果发送至指定应用程序,由该指定应用程序根据该图像识别结果执行对应的操作。其中,该指定应用程序可以是电子设备内的屏幕解锁APP也可以是支付APP。例如,该屏幕解锁APP通过摄像头采集到的人脸图像进行人脸识别从而确定出身份信息,判断所述人脸图像是否与预设人脸图像匹配,如果匹配,则判定成功解锁,如果不匹配,则判定未成功解锁。
其中,预设人脸图像可以是用户预先设定的人脸图像,可以是存储在移动终端内,也可以是存储在某个服务器或者存储器内,移动终端能够由该服务器或者存储器内获取 待预设人脸图像。具体地,可以是预设人脸图像的预设特征信息,则如果人脸图像为二维图像,则预设特征信息为用户预先录入的人脸图像的五官特征点信息,如果人脸图像为三维图像,则预设特征信息为用户预先录入的人脸图像的人脸三维信息。则判断所述人脸图像是否满足预设条件的方式为,获取人脸图像的特征点信息,将所采集的人脸图像的特征信息与用户预先录入的预设特征信息比对,如果匹配,则判定人脸图像满足预设条件,则确定人脸图像有权限将移动终端的屏幕解锁,如果不匹配,则判定人脸图像不满足预设条件,没有权限将屏幕解锁。
于本申请实施例中,图像数据包含人脸图像,而针对该图像数据的识别为人脸识别,则在执行获取待处理的图像数据,可以先判断该图像数据内是否包括人脸,如果包括,则执行后续操作。具体地,摄像头采集的图像为二维图像,通过查找该图像内是否有人脸五官特征点,能够确定是否采集到人脸图像,如果采集到,则将所采集的人脸图像发送给移动终端的处理器,以使处理器能够对人脸图像进行分析并执行屏幕解锁操作。作为另一种实施方式,摄像头包括结构光,则根据结构光所采集的三维信息,确定是否存在人脸三维信息,如果存在,则将所采集的图像发送给移动终端的处理器。
另外,如果所述摄像头采集的图像内不包括人脸图像,则返回继续执行判断所述摄像头采集的图像内是否包括人脸图像的操作,还可以是,发出人脸采集提醒信息,以提醒用户使用所述摄像头采集人脸图像。具体地,该人脸采集提醒信息可以是在电子设备的当前界面显示。
S202:对所述原始图像数据归一化处理,以得到待处理的图像数据。
对原始图像内的每个像素值做归一化处理,即原来的0-255的数值变为0-1区间内的数值,从而能够提高后续的特异性网络和共享网络的计算速度,提高整体图像处理的速度,具体地,可以采用均值方差归一化或灰度变换归一化的方式将原始图像数据归一化处理。
另外,还是可以在对所述原始图像数据归一化处理之后,去除冗余信息,该冗余信息是指被压缩了分布之间的差距。
则经过归一化处理之后的原始图像数据作为待处理的图像数据。
S203:确定每个所述特异性网络对应的属性标签。
具体地,所述特异性网络能够识别的属性标签为该特异性网络对应的属性标签。所述特异性网络能够识别的属性标签是在训练该特异性网络的时候就已经设置好的,具体地,可参考后续实施例。
S204:根据每个所述特异性网络对应的属性标签将所述图像数据划分为多个子图像数据。
由于图像数据内的每个属性标签都对应图像内的一个位置,以人脸图像为例,头发的颜色的属性标签所对应的图像内的位置为头发位置,眼睛颜色的属性标签所对应的图像数据内的位置为眼睛位置,从而每个属性标签所对应的位置可以预先确定。在一些实施方式中,可以设定好每个属性标签所对应的区域标识。以人脸图像为例,区域标识可以是眼睛、鼻子、头发等人脸中不同的区域。具体地,不同的属性标签可以对应不同的区域标识,也可以对应相同的区域标识。例如,属性标签头发颜色和头发长度都对应区域标识头发,属性标签瞳孔颜色对应区域标识眼睛。由于,图像数据内的每个像素值都对应一个像素坐标,在获取每个属性标签对应的区域标识后,则每个属性标签对应的区域标识在像素坐标内的位置能够确定,进而就能够确定每个特异性网络所能识别的属性标签在图像内的位置。进而,就能够将图像数据划分为多个子图像数据,而每个子图像数据对应图像内的一个区域,并且该区域内的属性标签都对应同一个特异性网络,即每个特异性网络所能够识别的属性标签位于子图像数据对应的区域内。
从而就能够得到每个特异性网络对应的图像子区域,即每个特异性网络对应的子图像数据。例如,图像被划分为第一区域、第二区域和第三区域,第一特异性网络所能够识别的属性标签分布在第一区域内,第二特异性网络所能够识别的属性标签分布在第二区域内,第三特异性网络所能够识别的属性标签分布在第三区域内,则图像数据被划分为三个子图像数据,分别为第一子图像数据、第二子图像数据和第三子图像数据,则第一子图像数据对应第一区域,第二子图像数据对应第二区域,第三子图像数据对应第三区域。
另外,为了更好地根据像素坐标划分多个区域,可以将图像统一按照一个方向调整,从而使指定区域位于同一个位置内。以人脸图像为例,可以在获取到原始图像的时候,截图原始图像内的人脸区域,并且按照一定方向调整图像旋转,使得人脸固定朝向某个位置,例如,始终保持人脸的额头位于图像的上部,下巴位于图像的下部。
S205:将所述子图像数据输入与该子图像数据对应的特异性网络。
作为一种实施方式,可以是在确定了每个特异性网络对应的属性标签位于图像内的区域之后,可以将图像划分为多个区域,而每个区域对应一个特异性网络,按照所确定的多个区域将图像划分为多个子图像,而每个子图像对应的像素数据作为子图像数据,即为经过S202处理之后的像素数据。
例如,上述的第一区域、第二区域和第三区域,然后,将图像分为三个子图像,分别为第一子图像、第二子图像和第三子图像,然后,将第一子图像内的各个像素值对应的像素数据输入第一特异性网络,将第二子图像内的各个像素值对应的像素数据输入第二特异性网络,将第三子图像内的各个像素值对应的像素数据输入第三特异性网络。
从而可以不必将整个图像数据分别输入到各个特异性网络内,而只将特异性网络所能够识别的属性标签对应的子图像数据输入至该特异性网络内,减少了特异性网络的计算量,提高了整体的识别速度。
S206:将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络,以获取图像识别结果。
S207:输出所述图像识别结果。
需要说明的是,上述步骤中未详细描述的部分可以参考前述实施例,在此不再赘述。
另外,在执行S102或S204之前,还需要对特异性网络和共享网络训练,具体地,该训练过程可以是在S101之后以及S102之前或者S201之后以及S204之前,也可以是在S101和S201之前,于本申请实施例中,可以是在执行本图像识别方法之前先对特异性网络和共享网络训练。
具体地,请参阅图3,示出了本申请实施例提供的图像处理方法中的特异性网络和共享网络训练的训练过程,具体地,图3所示,该方法包括:S310至S370。
S310:获取多个样本图像数据,每个所述样本图像数据对应多个属性标签。
具体地,该样本图像数据为已经被标记的图像数据,可以是预先获取的图像之后,人工对该图像进行标记,而每个标记点对应一个属性标签。例如,该样本图像数据可以是CelebA人脸属性数据集作为实验数据集。该数据集包含约20万张人脸图像,每张图片提供了40个人脸属性标注和5个人脸关键点的位置信息。本文依据CelebA官方的标准,取其中的约10万张人脸图像用于网络模型的训练,约1万张图像用于验证,1万张图像用于测试网络模型。
针对该公开的人脸属性数据集,可得每张人脸图片对应40个属性标签,每个标签的取值为0或者1,0表示不具备这项属性,1表示不具备这项属性。
需要说明的是,样本图像数据和待处理的图像数据均包含人脸图像,也就是说本申请实施例所提供的图像处理方法应用于人脸属性识别,而该特异性网络和共享网络的训 练也是针对人脸属性识别而训练的。
进一步地,为了提高图像识别的准确度可以将人脸对齐,具体地,请参阅图4,S310可以包括:S311至S314。
S311:获取多个样本图像数据。
具体地,该步骤可参考上述描述,在此不再赘述。
S312:识别每个所述样本图像数据内的人脸关键点在所述样本图像内的位置信息。
具体地,该人脸关键点可以是人脸图像内的五官,例如,该人脸关键点可以是眼睛、鼻子、嘴巴等。而具体的识别方式可以是,通过人脸图像的五官识别方式,例如,PCA(principal component analysis)分析方法确定人脸图像内的五官,从而确定人脸关键点在人脸图像内的位置信息,即像素坐标。
具体地,可以是在获取到样本图像之后,获取人脸区域图像,裁剪面部区域,对所述待训练样本数据进行人脸矫正,确定人脸关键点的位置信息,如眼睛、鼻子、嘴巴等。
S313:根据每个所述样本图像数据的人脸关键点的位置信息,调整每个所述样本图像数据内的人脸朝向符合预设朝向。
具体地,预设朝向可以是人脸朝向正前方,具体地,人脸朝向正前方的含义是人脸的额头部分在图像的上部,人脸的下巴部分在图像的下部。具体地,可以通过人脸关键点的位置信息确定图像中人脸朝向。具体地,为每个样本图像设定相同的像素坐标系,即都可以是以样本图像的左侧顶部的顶点为原点,建立像素坐标系,从而人脸图像内的人脸关键点的像素坐标就能够获取到,而通过人脸关键点的位置信息能够确定人的额头和下巴的位置关系,具体地,假如确定眼睛在图像的左侧,而嘴巴在图像的右侧,且二者的纵坐标之间的差距小于指定数值,则可以确定眼睛和嘴巴在同一个水平线上,则可以通过顺时针旋转90°的方式使得样本图像数据内的人脸朝向符合预设朝向。作为一种实施方式,预设朝向可以是人脸朝向正前方15度之内。
作为一种实施方式,在根据每个所述样本图像数据的人脸关键点的位置信息,调整每个所述样本图像数据内的人脸朝向符合预设朝向之后,为了减少计算量,还可以调整样本图像的尺寸,从而做尺寸调整,具体地,通过人脸关键点的定位,如眼睛、鼻子、嘴巴等,按预设方向标准调整所述待预测对象的方向,保证每个待预测对象的人脸朝向正前方15度之内,实现人脸对齐,并向面部区域添加预定比例的边距,同时为了减少计算量,设置图像大小为指定尺寸,例如,该指定尺寸可以是112*112。具体地,可以是将整个图像的尺寸压缩到指定尺寸,也可以是以指定尺寸大小的窗口剪裁样本图像,具体地,可以是以样本图像的中心点为该窗口的中心点,检测该窗口大小对应的图像区域内的图像,作为尺寸调整后的图像。作为一种实施方式,该窗口的大小可以是112*112。
S314:将调整后的样本图像数据作为本次用于训练所述共享网络和多个初始特异性网络的样本图像数据。
具体地,如果上述调整为根据每个所述样本图像数据的人脸关键点的位置信息,调整每个所述样本图像数据内的人脸朝向符合预设朝向,则将经过人脸朝向的调整之后的样本图像数据作为本次用于训练所述共享网络和多个初始特异性网络的样本图像数据,如果上述调整包括根据每个所述样本图像数据的人脸关键点的位置信息,调整每个所述样本图像数据内的人脸朝向符合预设朝向以及调整样本图像的尺寸为指定尺寸,则将经过人脸朝向的调整和尺寸调整之后的样本图像数据作为本次用于训练所述共享网络和多个初始特异性网络的样本图像数据。
进一步地,为了增加数据样本并且提高训练后的图像处理模型,即人脸识别模型,也即共享网络和多个特异性网络的泛化型,具体地,请参阅图5,S310可以包括:S311、S315、S316和S317。
S311:获取多个样本图像数据。
具体地,该步骤可参考上述描述,在此不再赘述。
S315:对所述多个样本图像数据进行数据增强处理,以使每个所述样本图像数据的光照强度和对比度在预设区间内随机分布。
作为一种实施方式,按预设光照强度区间变换所述样本图像数据的光照强度,得到每个样本图像数据的光照强度在预设光照强度区间随机分布的数据;按预设对比度区间变换所述样本图像数据的对比度,得到每个样本图像数据的对比度在预设对比度区间随机分布的数据。
具体地,预设光照强度区间可以是预先设定的光照强度区域,而在获取样本图像内的每个像素点的光照强度之后,可以将该像素点的光照强度调整到预设光照强度区间内。作为一种实施方式,可以是统计样本图像内的每个像素点的光照强度的分布情况,使得光照强度较高的像素点在预设光照强度区间内也位于光照强度较高的数值,光照强度较低的像素点在预设光照强度区间内也位于光照强度较低的数值,另外,还可以增加各个像素点在预设光照强度区间内的光照强度的分布的连续性,即多个像素值的光照强度的分布子区域内,相邻的两个子区域之间的强度值相差不大于指定数值,使得光照强度在预设光照强度区间内随机分布,从而能够增加数据的多样性,具体地,每个样本图像数据内的像素点的光照强度均可以在其对应的预设光照强度区间内随机分布,并且每个样本图像数据所对应的预设光照强度区间可以不全相同,从而进一步增加数据的多样性,进而提高后期模型训练的泛化型。
同理,按预设对比度区间变换所述待训练对象的对比度,得到每个待训练对象的对比度在预设对比度区间随机分布的数据要可以参考上述光照强度的调整的过程。从而使得每个样本图像数据内的像素点的对比度均可以在其对应的预设对比度区间内随机分布,并且每个样本图像数据所对应的预设对比度区间可以不全相同,从而进一步增加数据的多样性,进而提高后期模型训练的泛化型。
另外,在执行上述的增强处理之前,可以先对处理得到的样本数据进行归一化处理,将其像素值从[0,255]归一化到[0,1],去除样本数据中包含的冗余信息。
S316:对增强处理后的每个所述样本图像数据按照预设随机剪裁比例剪裁,并且每个将每个剪裁后的样本图像数据的尺寸均为预设尺寸。
按预设随机裁剪比例裁剪所述待训练对象,并将其调整为预设尺寸,其中,预设尺寸可以是112*112;按水平方向翻转所述待训练对象。
需要说明的是,上述按预设随机裁剪比例裁剪所述待训练对象,并将其调整为预设尺寸可以参考前述的调整为指定尺寸的剪裁方式,则预设尺寸与指定尺寸相同。
S317:将剪裁后的样本图像数据作为本次用于训练所述共享网络和多个初始特异性网络的样本图像数据。
需要说明的是,上述的步骤S311至S314可以替换S310,即在S311、S312、S313和S314之后执行S320,也可以是步骤S311、S315、S316和S317替换S310,即在S311、S315、S316和S317之后执行S320,还可以是在S311至S317一起替换S310,即S311、S312、S313、S314、S315、S316和S317之后再执行S320。
S320:设置共享网络和多个特异性网络。
每个所述特异性网络能够识别至少一个属性标签,且每个所述特异性网络能够识别的属性标签互不相同。如果为每个属性标签配置一个特异性网络,会存在巨大的计算开销。例如,以人脸图像识别为例,假设属性标签一共包括40个,如果将40个人脸属性视为40个独立的任务,直接将40个人脸属性视为40个独立的任务存在巨大的计算开销,而且忽视了人脸属性之间的显示的位置相关性。因此,可以为人脸划分多 个区域,而每个区域对应一个特异性网络,具体地,设置共享网络和多个特异性网络的具体实施方式可以是划分多个测量区域,每个所述测量区域对应人脸的不同区域;根据所述多个测量区域设置多个特异性网络,每个特异性网络对应一个测量区域,每个所述特异性网络用于确认所对应的测量区域内的属性标签。
具体地,可以设置四个特异性网络,分别为上部特异性网络、中部特异性网络、下部特异性网络和全脸特异性网络。对应地,属性标签划分为四个组,分别为上部组、中部组、下部组和全脸组,每个组对应属性标签,且各个组的属性标签不同,也就是说,每个特异性网络能够识别所对应的组的属性标签。根据它们的相应位置,可以将每个组的属性分类视为单独的属性学习任务。将属性分成4个组之后,将每个属性组的属性分类问题视为一个子任务,具体地,属性标签如下表所示:
Figure PCTCN2020122506-appb-000001
如图6所示,在一些实施方式中,样本图像被划分为四个区域,分别为上部区域m1、中部区域m2、下部区域m3和全脸区域m4,作为一种实施方式,上部区域m1为图像的顶部侧边至双眼中位置最靠下的眼睛的位置的横坐标之间的区域,具体地,如图6所示,在双眼中位置最靠下的眼睛的位置的横坐标设置有一条平行横坐标轴的直线,记为第一直线,则第一直线与图像的顶部侧边之间的区域作为上部区域。在鼻子与上嘴唇之间的区域选中一个位置点,可以是中间位置点,设置有一条平行横坐标轴的直线,记为第二直线,则第一直线和第二直线之前的区间作为中部区域,第二直线和图像的底部侧边之间的区域作为下部区域,将人脸的下巴末端至头发顶端之间的区域作为全脸区域,其中,该全脸区域能够圈住人脸以及头发。在其中,上述的区域为测量区域,即上部区域m1、中部区域m2、下部区域m3和全脸区域m4为四个测量区域。
于本申请实施例中,一共包括四个特异性网络和一个共享网络,则每个任务拥有一个独立的特异性网络。与分支结构不同的是,为了更好地保留任务各自的特异性,任务间的特异性网络的参数是不共享的。共享网络作为一个独立的网格,并不对应某个特定的学习任务,而是为了学习任务之间的相关性,提取任务间的互补信息。在一些实施方式中,可以通过一个简单的连接单元,将特异性网络和共享网络连接起来,从而达到最大化两者之间信息流的目的。作为一种实施方式,如图7所示,特异性网络和共享网络的连接关系如图7所示,共享网络的每一层输入,除了包括其上一层的输出特征之外,还包括上一层所有的特异性网络的输出特征。这些特征串联在一起,组成了特异性网络每一层的输入。同时,特异性网络的每一层输入,除了包括其上一层的输出特征之外,还包括上一层共享网络的输出特征。两者串联在一起形成了最后的输入。另外,图7仅示出了两个特异性网络和共享网络之间的连接关系,而针对四个特异性网络的连接关系也可以参考图7所示而合理得出。
本申请实施例所构建的多任务属性识别模型包括4个特异性网络和1个共享网络。特异性网络重点学习每一个特征组的特异性特征,而共享网络则重点学习所有特征组的 共享信息。特异性网络和共享网络通过局部共享单元进行连接和信息交互,从而组成整个局部共享的多任务人脸多属性分类网络。每个特异性网络和共享网络都具有相同的网络结构,都含有5个卷积层和2个全连接层。同时,每个卷积层和全连接层的后面接上归一化层和ReLU(Rectified Linear Unit)激活层。特异性网络之间每一层的输出通道个数也相同,而共享网络的输出通道数个数则与特异性网络有不同。
S330:将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练,以得到训练后的共享网络和多个特异性网络。
作为一种实施方式,可以将样本图像数据整张都输入至每个特异性网络进行训练,但是,每个特异性网络仅能够识别所对应的属性标签,例如,上部特异性网络进能够识别上部组的属性标签,而其他的属性标签其无法识别。
作为另一种实施方式,为了减少计算量和训练速度,可以为每个特异性网络输入不同的样本图像数据的不同的部分,具体地,根据每个所述特异性网络对应的属性标签将所述样本图像数据划分为多个子样本图像数据,将所述子样本图像数据输入与该子图像数据对应的特异性网络。
如图8所示,同一张样本图像被划分为四个子样本图像,左上的图像为如6所示的样本图像中上部区域m1对应的子样本图像数据,右上的图像为如6所示的样本图像中中部区域m2对应的子样本图像数据,左下的图像为如6所示的样本图像中下部区域m3对应的子样本图像数据,右下的图像为如6所示的样本图像中全脸区域m4对应的子样本图像数据,则将上部区域m1对应的子样本图像数据输入上部特异性网络,用于训练上部特异性网络,将中部区域m2对应的子样本图像数据输入中部特异性网络,用于训练中部特异性网络,将下部区域m3对应的子样本图像数据输入下部特异性网络,用于训练下部特异性网络,将全脸区域m4对应的子样本图像数据输入全脸特异性网络,用于训练全脸特异性网络。
在一些实施方式中,在S311步骤中获取到的样本图像数据一部分用于训练上述网络模型,另一部分用于测试上述网络模型,具体地,对所述待训练对象样本中的两类按比例随机划分为训练集和测试集,划分比例为8:2,其中训练集用于人脸多属性识别模型的训练,测试集用于人脸多属性识别模型的测试,保证同一个人的数据仅出现在一个集合中。
作为一种实施方式,将测试数据集送入训练好的特异性网络和共享网络进行测试,验证网络模型的准确性,并且获取上述测试数据集中判断错误的样本数据,将上述测试数据集中判断错误的样本再次送入网络模型中进行训练,以精调网络模型,提高模型的泛化性。
于本申请实施例中,采用Adam梯度下降算法,Adam是一种高效计算方法,可以提高梯度下降收敛速度。训练过程中将训练集输入卷积神经网络模型并迭代预设次数epochs,本方法设置为epochs为90次。每一次迭代计算过程中使用Adam梯度下降算法优化目标函数,本方法设置batch_size为64,即每轮训练送入64张输入图像。其中,特异性网络和共享网络均是基于卷积神经网络模型而训练完成的。
针对多属性问题,本方法使用交叉熵作为损失函数进行训练,该函数作为衡量目标和输出之间的交叉熵的标准,其公式如下:
Figure PCTCN2020122506-appb-000002
上式中,m代表属性总数,n i代表第i个属性的样本总数,
Figure PCTCN2020122506-appb-000003
代表第i个属性第j个样本的标签值,而
Figure PCTCN2020122506-appb-000004
指第i个属性第j个样本的预测值。
S340:获取待处理的图像数据。
需要说明的是,获取待处理的图像数据除了可以参考上述步骤之后,还可以增加人脸朝向的检测,具体地,可以是在确定当前图像内包含人脸图像之后,确定当前所采集的人脸图像内的人脸朝向是否满足预设朝向。具体地,以电子设备内的摄像头采集人脸图像为例,电子设备响应人脸识别请求调用摄像头采集人脸图像,识别人脸图像内的用户的人脸关键点的位置信息,从而能够确定出人脸的朝向是否为预设朝向。
如图9所示,左边的图像的人脸朝向为以被采集人脸图像的用户的右侧,并且能够确定图像上的几个关键点,分别为,左眼a1、右眼a2、鼻子a3和嘴唇a4,并且以图像的竖向对称线,可以看出右眼a2、鼻子a3和嘴唇a4均位于对称线的左侧,从而能够确定用户的人脸朝向向右偏,具体地,可以在屏幕上显示一个采集框,用户需要通过移动的方式将自己的人脸位于该采集框内,而如果用户没有正对屏幕的话,或者人脸的朝向不符合预设朝向,则可以发出提醒信息提示用户针对摄像头调整脸部朝向。则图9所示,右侧的人脸图中,左眼b1、右眼b2、鼻子b3和嘴唇b4位于对称线附近,并且左眼b1、右眼b2分居对称线两侧,可以确定该人脸朝向正对屏幕,则符合预设朝向,则可以正常采集人脸图像,并进行后期识别。
S350:将所述待处理的图像数据输入预先训练的多个特异性网络,以获取所述图像数据对应的属性标签。
S360:将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络,以获取图像识别结果。
S370:输出所述图像识别结果。
需要说明的是,人脸图像中不仅包含了面部特征、种族、性别、年龄、表情等人脸属性信息,而且还可以表达人的身份信息。因此,人脸属性识别在有关年龄的访问控制、人脸属性的人脸检索、安防和人机交互等领域中有广泛的应用前景。
请参阅图10,示出了本申请实施例提供的图像处理方法,该方法包括:S1001至S1004。
S1001:获取多个样本图像数据,每个所述样本图像数据对应多个属性标签。
S1002:设置共享网络和多个特异性网络,每个所述特异性网络能够识别至少一个属性标签,且每个所述特异性网络能够识别的属性标签互不相同。
S1003:将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练,以得到训练后的共享网络和多个特异性网络。
其中,S1001至S1003为共享网络和多个特异性网络的训练过程,其具体的实施方式可以参考前述S310至S330,在此不再赘述。
S1004:获取待处理的图像数据,根据所述训练后的共享网络和多个特异性网络对所述待处理的图像数据处理,得到图像识别结果。
根据所述训练后的共享网络和多个特异性网络对所述待处理的图像数据处理,得到图像识别结果可以参考前述实施例,在此不再赘述。
请参阅图11,其示出了本申请实施例提供的一种图像处理装置1100的结构框图,该装置可以包括:数据获取单元1110、属性确定单元1120、结果获取单元1130和输出单元1140。
数据获取单元1110,用于获取待处理的图像数据。
属性确定单元1120,用于将所述待处理的图像数据输入预先训练的多个特异性网络,以获取所述图像数据对应的属性标签,其中,每个所述特异性网络用于确定所述图像数据对应的属性标签,且各个所述特异性网络确定的属性标签互不相同。
结果获取单元1130,用于将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络,以获取图像识别结果,其中,所述共享网络用于根据各属性标签和各属性 标签的相关性确定图像识别结果。
输出单元1140,用于输出所述图像识别结果。
请参阅图12,其示出了本申请实施例提供的一种图像处理装置1200的结构框图,该装置可以包括:训练单元1210、数据获取单元1220、属性确定单元1230、结果获取单元1240和输出单元1250。
训练单元1210用于对所述共享网络和多个特异性网络进行训练。
具体地,训练单元1210包括样本获取子单元1211、设置子单元1212和训练子单元1213。
获取子单元1211,用于获取多个样本图像数据,每个所述样本图像数据对应多个属性标签。
设置子单元1212,用于设置共享网络和多个特异性网络,每个所述特异性网络能够识别至少一个属性标签,且每个所述特异性网络能够识别的属性标签互不相同。
训练子单元1213,用于将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练,以得到训练后的共享网络和多个特异性网络。
进一步地,所述样本图像数据和待处理的图像数据均包含人脸图像,获取子单元1211还用于获取多个样本图像数据;识别每个所述样本图像数据内的人脸关键点在所述样本图像内的位置信息;根据每个所述样本图像数据的人脸关键点的位置信息,调整每个所述样本图像数据内的人脸朝向符合预设朝向;将调整后的样本图像数据作为本次用于训练所述共享网络和多个初始特异性网络的样本图像数据。
进一步地,获取子单元1211还用于获取多个样本图像数据;对所述多个样本图像数据进行数据增强处理,以使每个所述样本图像数据的光照强度和对比度在预设区间内随机分布;对增强处理后的每个所述样本图像数据按照预设随机剪裁比例剪裁,并且每个将每个剪裁后的样本图像数据的尺寸均为预设尺寸;将剪裁后的样本图像数据作为本次用于训练所述共享网络和多个初始特异性网络的样本图像数据。
进一步地,所述样本图像数据和待处理的图像数据均包含人脸图像,每个所述样本图像数据对应的多个属性标签对应于人脸的不同位置;设置子单元1212还用于划分多个测量区域,每个所述测量区域对应人脸的不同区域;根据所述多个测量区域设置多个特异性网络,每个特异性网络对应一个测量区域,每个所述特异性网络用于确认所对应的测量区域内的属性标签。
数据获取单元1220,用于获取待处理的图像数据。
进一步地,数据获取单元1220还用于获取原始图像数据;对所述原始图像数据归一化处理,以得到待处理的图像数据。
属性确定单元1230,用于将所述待处理的图像数据输入预先训练的多个特异性网络,以获取所述图像数据对应的属性标签,其中,每个所述特异性网络用于确定所述图像数据对应的属性标签,且各个所述特异性网络确定的属性标签互不相同。
进一步地,属性确定单元1230还用于确定每个所述特异性网络对应的属性标签,其中,所述特异性网络能够识别的属性标签为该特异性网络对应的属性标签;根据每个所述特异性网络对应的属性标签将所述图像数据划分为多个子图像数据;将所述子图像数据输入与该子图像数据对应的特异性网络。
结果获取单元1240,用于将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络,以获取图像识别结果,其中,所述共享网络用于根据各属性标签和各属性标签的相关性确定图像识别结果。
输出单元1250,用于输出所述图像识别结果。
请参阅图13,其示出了本申请实施例提供的一种图像处理装置1300的结构框图, 该装置可以包括:样本获取单元1310、设置单元1320、网络训练单元1330和识别单元1340。
样本获取单元1310,用于获取多个样本图像数据,每个所述样本图像数据对应多个属性标签。
设置单元1320,用于设置共享网络和多个特异性网络,每个所述特异性网络能够识别至少一个属性标签,且每个所述特异性网络能够识别的属性标签互不相同。
网络训练单元1330,用于将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练,以得到训练后的共享网络和多个特异性网络。
其中,样本获取单元1310、设置单元1320、网络训练单元1330对应于上述的训练单元1210。作为一种实施方式,样本获取单元1310对应于获取子单元,样本获取单元1310的具体实施方式可以参考获取子单元,设置单元1320对应于设置子单元,设置单元1320的具体实施方式可以参考设置子单元,网络训练单元1330对应于训练子单元,网络训练单元1330的具体实施方式可以参考训练子单元。
识别单元1334,用于获取待处理的图像数据,根据所述训练后的共享网络和多个特异性网络对所述待处理的图像数据处理,得到图像识别结果。
具体地,识别单元1334用于获取待处理的图像数据;将所述待处理的图像数据输入预先训练的多个特异性网络,以获取所述图像数据对应的属性标签,其中,每个所述特异性网络用于确定所述图像数据对应的属性标签,且各个所述特异性网络确定的属性标签互不相同;将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络,以获取图像识别结果,其中,所述共享网络用于根据各属性标签和各属性标签的相关性确定图像识别结果;输出所述图像识别结果。
作为一种实施方式,识别单元1334对应于数据获取单元、属性确定单元、结果获取单元和输出单元,具体实施方式可参考数据获取单元、属性确定单元、结果获取单元和输出单元。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,模块相互之间的耦合可以是电性,机械或其它形式的耦合。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
请参考图14,其示出了本申请实施例提供的一种电子设备的结构框图。该电子设备100可以是智能手机、平板电脑、电子书等能够运行应用程序的电子设备。本申请中的电子设备100可以包括一个或多个如下部件:处理器110、存储器120、以及一个或多个应用程序,其中一个或多个应用程序可以被存储在存储器120中并被配置为由一个或多个处理器110执行,一个或多个程序配置用于执行如前述方法实施例所描述的方法。
处理器110可以包括一个或者多个处理核。处理器110利用各种接口和线路连接整个电子设备100内的各个部分,通过运行或执行存储在存储器120内的指令、程序、代码集或指令集,以及调用存储在存储器120内的数据,执行电子设备100的各种功能和处理数据。可选地,处理器110可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器110可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用 户界面和应用程序等;GPU用于负责显示内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器110中,单独通过一块通信芯片进行实现。
存储器120可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。存储器120可用于存储指令、程序、代码、代码集或指令集。存储器120可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。存储数据区还可以存储电子设备100在使用中所创建的数据(比如电话本、音视频数据、聊天记录数据)等。
请参考图15,其示出了本申请实施例提供的一种计算机可读存储介质的结构框图。该计算机可读介质1500中存储有程序代码,所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。
计算机可读存储介质1500可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地,计算机可读存储介质1500包括非易失性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读存储介质1500具有执行上述方法中的任何方法步骤的程序代码1510的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码1510可以例如以适当形式进行压缩。
综上所述,本申请提供的图像处理方法、装置、电子设备及存储介质,预先训练好共享网络和多个特异性网络,每个所述特异性网络用于确定所述图像数据对应的属性标签,且各个所述特异性网络确定的属性标签互不相同,则当获取到待处理的图像数据的时候,将待处理的图像数据输入多个特异性网络,每个特异性网络能够识别到该特异性网络所能够识别到的属性,从而待处理图像数据所对应的多个属性标签能够被多个特异性网络分别识别到,提高了整体的图像数据的多个属性标签的识别,从而能够得到图像数据对应的属性标签,然后,再将该图像数据对应的属性标签输入至共享网络,共享网络根据各属性标签和各属性标签的相关性确定图像识别结果,并将图像识别结果输出。因此,多个特异性网络共同分析图像数据并得到多个属性标签,能够提高属性标签的获得速度,而共享网络能够结合各属性标签的相关性得到图像识别结果,提高了识别结果的准确度和整体性能。
本申请提出了一种基于深度学习的人脸多属性识别算法及系统,该方法将40个人脸属性根据属性对应的图片位置分为4个人脸属性组,考虑到了人脸属性之间的显示的位置相关性,将每个属性组的属性分类问题视为一个子任务,构建了包含4个特异性网络和1个共享网络的模型。
特异性网络旨在学习任务之间的特异性,因此每个属性组配置一个单独的特异性网络,而共享网络旨在学习任务之间的互补信息,促进任务间的交互。特异性网络和共享网络间丰富的连接促进了相互之间的信息交流,有利于挖掘任务间的相关性,提高了整体的性能。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (20)

  1. 一种图像处理方法,其特征在于,包括:
    获取待处理的图像数据;
    将所述待处理的图像数据输入预先训练的多个特异性网络,以获取所述图像数据对应的属性标签,其中,每个所述特异性网络用于确定所述图像数据对应的属性标签,且各个所述特异性网络确定的属性标签互不相同;
    将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络,以获取图像识别结果,其中,所述共享网络用于根据各属性标签和各属性标签的相关性确定图像识别结果;
    输出所述图像识别结果。
  2. 根据权利要求1所述的方法,其特征在于,所述将所述待处理的图像数据输入预先训练的多个特异性网络,以获取所述图像数据对应的属性标签,包括:
    确定每个所述特异性网络对应的属性标签,其中,所述特异性网络能够识别的属性标签为该特异性网络对应的属性标签;
    根据每个所述特异性网络对应的属性标签将所述图像数据划分为多个子图像数据;
    将所述子图像数据输入与该子图像数据对应的特异性网络。
  3. 根据权利要求2所述的方法,其特征在于,所述根据每个所述特异性网络对应的属性标签将所述图像数据划分为多个子图像数据,包括:
    获取每个所述属性标签对应的区域标识;
    在所述图像数据的像素坐标中,确定每个所述属性标签对应的区域标识的位置;
    根据每个所述特异性网络对应的属性标签和每个所述属性标签对应的区域标识的位置,将所述图像数据划分为多个子图像数据,其中,每个所述子图像数据内的属性标签对应同一个所述特异性网络。
  4. 根据权利要求1所述的方法,其特征在于,所述获取待处理的图像数据,包括:
    获取原始图像数据;
    对所述原始图像数据归一化处理,以得到待处理的图像数据。
  5. 根据权利要求1所述的方法,其特征在于,所述将所述待处理的图像数据输入预先训练的多个特异性网络,以获取所述图像数据对应的属性标签之前,还包括:
    获取多个样本图像数据,每个所述样本图像数据对应多个属性标签;
    设置共享网络和多个特异性网络,每个所述特异性网络能够识别至少一个属性标签,且每个所述特异性网络能够识别的属性标签互不相同;
    将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练,以得到训练后的共享网络和多个特异性网络。
  6. 根据权利要求5所述的方法,其特征在于,所述样本图像数据和待处理的图像数据均包含人脸图像;所述获取多个样本图像数据,包括:
    获取多个样本图像数据;
    识别每个所述样本图像数据内的人脸关键点在所述样本图像内的位置信息;
    根据每个所述样本图像数据的人脸关键点的位置信息,调整每个所述样本图像数据内的人脸朝向符合预设朝向;
    将调整后的样本图像数据作为本次用于训练所述共享网络和多个初始特异性网络的样本图像数据。
  7. 根据权利要求6所述的方法,其特征在于,在所述根据每个所述样本图像数据 的人脸关键点的位置信息,调整每个所述样本图像数据内的人脸朝向符合预设朝向之后,所述方法还包括:
    对调整后的每个所述样本图像进行人脸对齐;
    将人脸对齐后的每个所述样本图像调整为指定尺寸。
  8. 根据权利要求5所述的方法,其特征在于,所述获取多个样本图像数据,包括:
    获取多个样本图像数据;
    对所述多个样本图像数据进行数据增强处理,以使每个所述样本图像数据的光照强度和对比度在预设区间内随机分布;
    对增强处理后的每个所述样本图像数据按照预设随机剪裁比例剪裁,并且每个将每个剪裁后的样本图像数据的尺寸均为预设尺寸;
    将剪裁后的样本图像数据作为本次用于训练所述共享网络和多个初始特异性网络的样本图像数据。
  9. 根据权利要求8所述的方法,其特征在于,所述对所述多个样本图像数据进行数据增强处理,以使每个所述样本图像数据的光照强度和对比度在预设区间内随机分布,包括:
    按预设光照强度区间变换每个所述样本图像数据的光照强度,以使每个所述样本图像数据的光照强度在所述预设光照强度区间随机分布;
    按预设对比度区间变换每个所述样本图像数据的对比度,以使每个所述样本图像数据的对比度在所述预设对比度区间随机分布。
  10. 根据权利要求5所述的方法,其特征在于,所述样本图像数据和待处理的图像数据均包含人脸图像,每个所述样本图像数据对应的多个属性标签对应于人脸的不同位置;所述设置共享网络和多个特异性网络,包括:
    划分多个测量区域,每个所述测量区域对应人脸的不同区域;
    根据所述多个测量区域设置多个特异性网络,每个特异性网络对应一个测量区域,每个所述特异性网络用于确认所对应的测量区域内的属性标签。
  11. 根据权利要求10所述的方法,其特征在于,所述属性标签被划分为上部组、中部组、下部组和全脸组,每个组对应的属性标签不同;所述划分多个测量区域,每个所述测量区域对应人脸的不同区域,包括:
    根据每个组对应的所述属性标签,将所述样本图像数据划分四个测量区域,所述测量区域包括人脸的上部区域、中部区域、下部区域和全脸区域。
  12. 根据权利要求5所述的方法,其特征在于,所述设置共享网络和多个特异性网络,包括:
    将所述共享网络和所述多个特异性网络连接。
  13. 根据权利要求12所述的方法,其特征在于,所述将所述共享网络和所述多个特异性网络连接,包括:
    将所述共享网络上一层的输出特征和上一层多个所述特异性网络的输出特征连接,作为所述共享网络当前层的输入;
    将所述特异性网络上一层的输出特征和上一层所述共享网络的输出特征连接,作为所述特异性网络当前层的输入。
  14. 根据权利要求5所述的方法,其特征在于,所述将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练,以得到训练后的共享网络和多个特异性网络,包括:
    将所述多个样本图像数据按照预设比例随机划分为训练数据集和测试数据集;
    将所述训练数据集输入所述共享网络和所述多个特异性网络进行训练,以得到所 述训练后的共享网络和多个特异性网络;
    将所述测试数据集输入所述训练后的共享网络和多个特异性网络进行测试。
  15. 根据权利要求14所述的方法,在将所述测试数据集输入所述训练后的共享网络和多个特异性网络进行测试之后,所述方法还包括:
    获取所述测试数据集中判断错误的样本数据;
    将所述判断错误的样本数据输入所述训练后的共享网络和多个特异性网络进行训练。
  16. 一种图像处理方法,其特征在于,包括:
    获取多个样本图像数据,每个所述样本图像数据对应多个属性标签;
    设置共享网络和多个特异性网络,每个所述特异性网络能够识别至少一个属性标签,且每个所述特异性网络能够识别的属性标签互不相同;
    将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练,以得到训练后的共享网络和多个特异性网络;
    获取待处理的图像数据,根据所述训练后的共享网络和多个特异性网络对所述待处理的图像数据处理,得到图像识别结果。
  17. 一种图像处理装置,其特征在于,所述装置包括:
    数据获取单元,用于获取待处理的图像数据;
    属性确定单元,用于将所述待处理的图像数据输入预先训练的多个特异性网络,以获取所述图像数据对应的属性标签,其中,每个所述特异性网络用于确定所述图像数据对应的属性标签,且各个所述特异性网络确定的属性标签互不相同;
    结果获取单元,用于将每个所述特异性网络所确定的属性标签输入预先训练好的共享网络,以获取图像识别结果,其中,所述共享网络用于根据各属性标签和各属性标签的相关性确定图像识别结果;
    输出单元,用于输出所述图像识别结果。
  18. 一种图像处理装置,其特征在于,所述装置包括:
    样本获取单元,用于获取多个样本图像数据,每个所述样本图像数据对应多个属性标签;
    设置单元,用于设置共享网络和多个特异性网络,每个所述特异性网络能够识别至少一个属性标签,且每个所述特异性网络能够识别的属性标签互不相同;
    网络训练单元,用于将所述多个样本图像数据输入所述共享网络和多个特异性网络进行训练,以得到训练后的共享网络和多个特异性网络;
    识别单元,用于获取待处理的图像数据,根据所述训练后的共享网络和多个特异性网络对所述待处理的图像数据处理,得到图像识别结果。
  19. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    存储器;
    一个或多个应用程序,其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行如权利要求1-15任一项所述的方法。
  20. 一种计算机可读介质,其特征在于,所述可读存储介质存储有处理器可执行的程序代码,所述程序代码中的多条指令被所述处理器执行时使所述处理器执行权利要求1-15任一项所述方法。
PCT/CN2020/122506 2019-10-22 2020-10-21 图像处理方法、装置、电子设备及存储介质 WO2021078157A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911007790.5A CN110728255B (zh) 2019-10-22 2019-10-22 图像处理方法、装置、电子设备及存储介质
CN201911007790.5 2019-10-22

Publications (1)

Publication Number Publication Date
WO2021078157A1 true WO2021078157A1 (zh) 2021-04-29

Family

ID=69222737

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/122506 WO2021078157A1 (zh) 2019-10-22 2020-10-21 图像处理方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN110728255B (zh)
WO (1) WO2021078157A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113744161A (zh) * 2021-09-16 2021-12-03 北京顺势兄弟科技有限公司 增强数据的获取方法及装置、数据增强方法、电子设备

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728255B (zh) * 2019-10-22 2022-12-16 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备及存储介质
CN111400534B (zh) * 2020-03-05 2023-09-19 杭州海康威视系统技术有限公司 图像数据的封面确定方法、装置及计算机存储介质
CN111539452B (zh) * 2020-03-26 2024-03-26 深圳云天励飞技术有限公司 多任务属性的图像识别方法、装置、电子设备及存储介质
CN111428671A (zh) * 2020-03-31 2020-07-17 杭州博雅鸿图视频技术有限公司 人脸结构化信息识别方法、系统、装置及存储介质
CN111507263B (zh) * 2020-04-17 2022-08-05 电子科技大学 一种基于多源数据的人脸多属性识别方法
CN111611805B (zh) * 2020-04-24 2023-04-07 平安科技(深圳)有限公司 一种基于图像的辅助写作方法、装置、介质及设备
CN111738325B (zh) * 2020-06-16 2024-05-17 北京百度网讯科技有限公司 图像识别方法、装置、设备以及存储介质
CN112861926B (zh) * 2021-01-18 2023-10-31 平安科技(深圳)有限公司 耦合的多任务特征提取方法、装置、电子设备及存储介质
CN113407564A (zh) * 2021-06-18 2021-09-17 浙江非线数联科技股份有限公司 一种数据的处理方法及系统
CN114170484B (zh) * 2022-02-11 2022-05-27 中科视语(北京)科技有限公司 图片属性预测方法、装置、电子设备和存储介质
CN114581706B (zh) * 2022-03-02 2024-03-08 平安科技(深圳)有限公司 证件识别模型的配置方法、装置、电子设备、存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426404A (zh) * 2015-10-28 2016-03-23 广东欧珀移动通信有限公司 一种音乐信息推荐方法、装置和终端
US20170344881A1 (en) * 2016-05-25 2017-11-30 Canon Kabushiki Kaisha Information processing apparatus using multi-layer neural network and method therefor
CN110728255A (zh) * 2019-10-22 2020-01-24 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6063217B2 (ja) * 2012-11-16 2017-01-18 任天堂株式会社 プログラム、情報処理装置、情報処理システム、および情報処理方法
CN105825191B (zh) * 2016-03-23 2020-05-15 厦门美图之家科技有限公司 基于人脸多属性信息的性别识别方法、系统及拍摄终端
CN106503669B (zh) * 2016-11-02 2019-12-10 重庆中科云丛科技有限公司 一种基于多任务深度学习网络的训练、识别方法及系统
JP2019079135A (ja) * 2017-10-20 2019-05-23 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America 情報処理方法及び情報処理装置
CN108596839A (zh) * 2018-03-22 2018-09-28 中山大学 一种基于深度学习的人脸漫画生成方法及其装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426404A (zh) * 2015-10-28 2016-03-23 广东欧珀移动通信有限公司 一种音乐信息推荐方法、装置和终端
US20170344881A1 (en) * 2016-05-25 2017-11-30 Canon Kabushiki Kaisha Information processing apparatus using multi-layer neural network and method therefor
CN110728255A (zh) * 2019-10-22 2020-01-24 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备及存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CAO JIAJIONG; LI YINGMING; ZHANG ZHONGFEI: "Partially Shared Multi-task Convolutional Neural Network with Local Constraint for Face Attribute Learning", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, IEEE, 18 June 2018 (2018-06-18), pages 4290 - 4299, XP033476402, DOI: 10.1109/CVPR.2018.00451 *
CAO, JIAJIONG: "Face Attribute Learning Based on Multi-task Learning and Metric Learning", INFORMATION & TECHNOLOGY, CHINA MASTER'S THESES FULL-TEXT DATABASE, no. 6, 15 June 2018 (2018-06-15), pages 1 - 67, XP055804127, ISSN: 1674-0246 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113744161A (zh) * 2021-09-16 2021-12-03 北京顺势兄弟科技有限公司 增强数据的获取方法及装置、数据增强方法、电子设备
CN113744161B (zh) * 2021-09-16 2024-03-29 北京顺势兄弟科技有限公司 增强数据的获取方法及装置、数据增强方法、电子设备

Also Published As

Publication number Publication date
CN110728255B (zh) 2022-12-16
CN110728255A (zh) 2020-01-24

Similar Documents

Publication Publication Date Title
WO2021078157A1 (zh) 图像处理方法、装置、电子设备及存储介质
CN109359548B (zh) 多人脸识别监控方法及装置、电子设备及存储介质
Wang et al. Unsupervised adversarial domain adaptation for cross-domain face presentation attack detection
González-Briones et al. A multi-agent system for the classification of gender and age from images
Yang et al. Preventing deepfake attacks on speaker authentication by dynamic lip movement analysis
CN108829900B (zh) 一种基于深度学习的人脸图像检索方法、装置及终端
WO2020114118A1 (zh) 面部属性识别方法、装置、存储介质及处理器
WO2021258989A1 (zh) 人脸防伪识别方法、装置、设备及存储介质
WO2016172872A1 (zh) 用于验证活体人脸的方法、设备和计算机程序产品
Raj et al. Face recognition based smart attendance system
WO2021213067A1 (zh) 物品显示方法、装置、设备及存储介质
WO2023098128A1 (zh) 活体检测方法及装置、活体检测系统的训练方法及装置
WO2019075666A1 (zh) 图像处理方法、装置、终端及存储介质
CN112733802B (zh) 图像的遮挡检测方法、装置、电子设备及存储介质
CN112395979B (zh) 基于图像的健康状态识别方法、装置、设备及存储介质
WO2024109374A1 (zh) 换脸模型的训练方法、装置、设备、存储介质和程序产品
WO2022188697A1 (zh) 提取生物特征的方法、装置、设备、介质及程序产品
US20220207913A1 (en) Method and device for training multi-task recognition model and computer-readable storage medium
WO2021128846A1 (zh) 电子文件的控制方法、装置、计算机设备及存储介质
CN112580572A (zh) 多任务识别模型的训练方法及使用方法、设备及存储介质
WO2022267653A1 (zh) 图像处理方法、电子设备及计算机可读存储介质
Boncolmo et al. Gender Identification Using Keras Model Through Detection of Face
CN112580472A (zh) 一种快速轻量的人脸识别方法、装置、机器可读介质及设备
CN111191549A (zh) 一种两级人脸防伪检测方法
RU2768797C1 (ru) Способ и система для определения синтетически измененных изображений лиц на видео

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20878316

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20878316

Country of ref document: EP

Kind code of ref document: A1