CN105631406B - Image recognition processing method and device - Google Patents

Image recognition processing method and device Download PDF

Info

Publication number
CN105631406B
CN105631406B CN201510958744.9A CN201510958744A CN105631406B CN 105631406 B CN105631406 B CN 105631406B CN 201510958744 A CN201510958744 A CN 201510958744A CN 105631406 B CN105631406 B CN 105631406B
Authority
CN
China
Prior art keywords
face
image
recognized
sample image
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510958744.9A
Other languages
Chinese (zh)
Other versions
CN105631406A (en
Inventor
张涛
汪平仄
张胜凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Xiaomi Inc
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaomi Inc, Beijing Xiaomi Mobile Software Co Ltd filed Critical Xiaomi Inc
Priority to CN201510958744.9A priority Critical patent/CN105631406B/en
Publication of CN105631406A publication Critical patent/CN105631406A/en
Application granted granted Critical
Publication of CN105631406B publication Critical patent/CN105631406B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Abstract

The disclosure relates to an image recognition processing method and device, the method comprises the following steps: acquiring an image to be identified; and carrying out face identification on the image to be identified, and simultaneously acquiring a face classification result and position coordinates of the face organ point of the image to be identified. When the face of the image to be recognized is recognized, the face classification result, such as whether the image is a face image, can be determined, and meanwhile, the position coordinates of face organ points can be obtained, so that the recognition processing efficiency is improved.

Description

Image recognition processing method and device
Technical Field
The present disclosure relates to the field of image processing, and in particular, to an image recognition processing method and apparatus.
Background
In the process of face recognition, face detection and face alignment are two main processes. In short, the face detection is to detect the region where the face is located in the photo, and the face alignment is to precisely locate the face organ points in the detected region where the face is located.
In the related art, generally, an adaboost algorithm is used to perform face detection processing on a picture to obtain a face region. And then, performing organ point positioning on the detected face area by adopting algorithms such as aam, asm, sdm and the like. In practical application, since the adaboost algorithm adopts face detection processing performed by a plurality of strong classifiers formed by a set of classifiers, in order to ensure a certain detection accuracy, many levels of classification detection processing, generally 12 layers, need to be performed.
BRIEF SUMMARY OF THE PRESENT DISCLOSURE
In order to overcome the problems in the related art, the present disclosure provides an image recognition processing method and apparatus, so as to improve the processing speed and accuracy of face detection and organ point positioning.
According to a first aspect of the embodiments of the present disclosure, there is provided an image recognition processing method, including:
acquiring an image to be identified;
and carrying out face recognition on the image to be recognized, and simultaneously acquiring a face classification result and position coordinates of the face organ points of the image to be recognized.
The scheme can comprise the following beneficial effects: when the face of the image to be recognized is recognized, the face classification result, such as whether the image is a face image, can be determined, and meanwhile, the position coordinates of face organ points can be obtained, so that the recognition processing efficiency is improved.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the performing face recognition on the image to be recognized and obtaining facial organ point position coordinates of a face classification result of the image to be recognized at the same time includes:
adopting a face recognition model to perform face recognition on the image to be recognized, and simultaneously acquiring a face classification result and position coordinates of face organ points of the image to be recognized;
the output layer of the face recognition model comprises a classification output layer and a position regression output layer, the position regression output layer is used for positioning the position coordinates of the face organ points, and the face classification result comprises the image to be recognized, namely a face image or a non-face image.
The scheme can comprise the following beneficial effects: the face recognition model with two output layers is obtained through training, so that the face recognition model is adopted to recognize the image to be recognized, the face classification result is determined, and meanwhile, the position coordinates of the face organ points can be obtained. Based on the face recognition model, not only can accurate classification and coordinate position recognition results be obtained, but also the recognition processing efficiency can be improved.
With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the method further includes:
carrying out face contour detection processing on the image to be recognized, wherein the face contour detection processing is not larger than a preset level, and obtaining a face contour candidate area image of the image to be recognized;
the identifying the image to be identified comprises:
and identifying the face contour candidate area image.
The scheme can comprise the following beneficial effects: before the image to be recognized is subjected to recognition processing, certain few levels of face contour detection processing are carried out on the image to be recognized, so that the rough area where the face is located can be obtained, the face recognition model only needs to carry out recognition processing on the rough area, and the processing efficiency of face recognition can be further improved.
With reference to the first possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the method further includes:
acquiring a training sample set, wherein the training sample set comprises a face sample image and a non-face sample image;
and training the characteristic coefficients between hidden layer nodes of each layer in the convolutional neural network according to the face sample image and the non-face sample image to obtain the face recognition model.
With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, the training, according to the face sample image and the non-face sample image, feature coefficients between hidden layer nodes of each layer in a convolutional neural network to obtain the face recognition model includes:
marking a first classification label on the face sample image, and marking a second classification label on the non-face sample image;
determining a first face organ point position coordinate corresponding to the face sample image and a second face organ point position coordinate corresponding to the non-face sample image, wherein the second face organ point position coordinate is set as a preset coordinate value;
and respectively inputting the human face sample image, the first classification label, the first human face organ point position coordinate, the non-human face sample image, the second classification label and the second human face organ point position coordinate into the convolutional neural network, and training the characteristic coefficients among all layers of hidden layer nodes in the convolutional neural network to obtain the face recognition model.
The scheme can comprise the following beneficial effects: by adopting the training sample set containing a large number of human face sample images and non-human face sample images, the classification labels corresponding to the sample images and the position coordinates of human face organ points are used as input to train the convolutional neural network, so that the obtained face recognition model can automatically and deeply learn multi-level characteristic information contained in each sample image, the possibility of correctly recognizing the classification of the images to be recognized based on the face recognition model is improved, the position coordinates of the human face organ points can be accurately recognized and obtained, and the accuracy and reliability of the recognition result are ensured.
With reference to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, the method further includes:
if the difference between the output classification label corresponding to the currently input sample image and the input classification label corresponding to the currently input sample image is larger than a preset difference, or if the distance between the output human face organ point position coordinate corresponding to the currently input sample image and the input human face organ point position coordinate corresponding to the currently input sample image is larger than a preset distance, adjusting the characteristic coefficient between hidden layer nodes of each layer obtained after the currently input sample image is trained;
the input classification number is the first classification label or the second classification label, and the input face organ point position coordinate is the first face organ point position coordinate or the second face organ point position coordinate.
The scheme can comprise the following beneficial effects: based on the training result of each sample image, each characteristic coefficient of the face recognition model is timely and effectively adjusted, the face recognition model obtained through final training can be guaranteed to have better stability and reliability, and rapid convergence of recognition is realized.
According to a second aspect of the embodiments of the present disclosure, there is provided an image recognition processing apparatus including:
the device comprises a first acquisition module, a second acquisition module and a recognition module, wherein the first acquisition module is configured to acquire an image to be recognized;
and the recognition processing module is configured to perform face recognition on the image to be recognized and simultaneously acquire a face classification result and position coordinates of the face organ points of the image to be recognized.
The scheme can comprise the following beneficial effects: when the face of the image to be recognized is recognized, the face classification result, such as whether the image is a face image, can be determined, and meanwhile, the position coordinates of face organ points can be obtained, so that the recognition processing efficiency is improved.
With reference to the second aspect, in a first possible implementation manner of the second aspect, the identification processing module includes:
the first recognition processing submodule is configured to perform face recognition on the image to be recognized by adopting a face recognition model, and meanwhile, a face classification result and position coordinates of a face organ point of the image to be recognized are obtained;
the output layer of the face recognition model comprises a classification output layer and a position regression output layer, the position regression output layer is used for positioning the position coordinates of the face organ points, and the face classification result comprises the image to be recognized, namely a face image or a non-face image.
The scheme can comprise the following beneficial effects: the face recognition model with two output layers is obtained through training, so that the face recognition model is adopted to recognize the image to be recognized, the face classification result is determined, and meanwhile, the position coordinates of the face organ points can be obtained. Based on the face recognition model, not only can accurate classification and coordinate position recognition results be obtained, but also the recognition processing efficiency can be improved.
With reference to the second aspect or the first possible implementation manner of the second aspect, in a second possible implementation manner of the second aspect, the apparatus further includes:
the face contour detection module is configured to perform face contour detection processing not larger than a preset level on the image to be recognized and acquire a face contour candidate area image of the image to be recognized;
the identification processing module comprises:
a second recognition processing sub-module configured to recognize the face contour candidate region image.
The scheme can comprise the following beneficial effects: before the image to be recognized is subjected to recognition processing, certain few levels of face contour detection processing are carried out on the image to be recognized, so that the rough area where the face is located can be obtained, the face recognition model only needs to carry out recognition processing on the rough area, and the processing efficiency of face recognition can be further improved.
With reference to the first possible implementation manner of the second aspect, in a third possible implementation manner of the second aspect, the apparatus further includes:
a second obtaining module configured to obtain a training sample set, wherein the training sample set comprises a face sample image and a non-face sample image;
and the training module is configured to train the feature coefficients between hidden layer nodes of each layer in the convolutional neural network according to the face sample image and the non-face sample image acquired by the second acquisition module to obtain the face recognition model.
With reference to the third possible implementation manner of the second aspect, in a fourth possible implementation manner of the second aspect, the training module includes:
the marking sub-module is configured to mark the face sample image acquired by the second acquisition module with a first classification label and mark the non-face sample image acquired by the second acquisition module with a second classification label;
a determining submodule configured to determine a first face organ point position coordinate corresponding to the face sample image acquired by the second acquisition module and a second face organ point position coordinate corresponding to the non-face sample image acquired by the second acquisition module, the second face organ point position coordinate being set to a preset coordinate value;
and the training submodule is configured to input the face sample image, the first classification label, the first face organ point position coordinate, the non-face sample image, the second classification label and the second face organ point position coordinate into the convolutional neural network respectively, and train the feature coefficients between all layers of hidden layer nodes in the convolutional neural network to obtain the face recognition model.
The scheme can comprise the following beneficial effects: by adopting the training sample set containing a large number of human face sample images and non-human face sample images, the classification labels corresponding to the sample images and the position coordinates of human face organ points are used as input to train the convolutional neural network, so that the obtained face recognition model can automatically and deeply learn multi-level characteristic information contained in each sample image, the possibility of correctly recognizing the classification of the images to be recognized based on the face recognition model is improved, the position coordinates of the human face organ points can be accurately recognized and obtained, and the accuracy and reliability of the recognition result are ensured.
With reference to the fourth possible implementation manner of the second aspect, in a fifth possible implementation manner of the second aspect, the method further includes:
the adjusting module is configured to adjust a feature coefficient between hidden layer nodes of each layer obtained after training of a currently input sample image when a difference value between an output classification label corresponding to the currently input sample image and an input classification number corresponding to the currently input sample image is larger than a preset difference value, or when a distance between an output human face organ point position coordinate corresponding to the currently input sample image and an input human face organ point position coordinate corresponding to the currently input sample image is larger than a preset distance;
the input classification number is the first classification label or the second classification label, and the input face organ point position coordinate is the first face organ point position coordinate or the second face organ point position coordinate.
The scheme can comprise the following beneficial effects: based on the training result of each sample image, each characteristic coefficient of the face recognition model is timely and effectively adjusted, the face recognition model obtained through final training can be guaranteed to have better stability and reliability, and rapid convergence of recognition is realized.
According to a third aspect of the embodiments of the present disclosure, there is provided an image recognition processing apparatus including:
a memory;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring an image to be identified;
and carrying out face recognition on the image to be recognized, and acquiring a face classification result and position coordinates of the face organ point of the image to be recognized.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow diagram illustrating a first embodiment of a method for image recognition processing in accordance with an illustrative embodiment;
FIG. 2 is a diagram of a deep convolutional neural network Alex network;
FIG. 3 is a flowchart illustrating a second embodiment of an image recognition processing method according to an exemplary embodiment
FIG. 4 is a flowchart illustrating a third embodiment of a method for image recognition processing in accordance with an illustrative embodiment;
FIG. 5 is a flowchart illustrating a fourth embodiment of a method for image recognition processing in accordance with an illustrative embodiment;
FIG. 6 is a block diagram illustrating a first embodiment of an image recognition processing apparatus according to an exemplary embodiment;
FIG. 7 is a block diagram illustrating a second embodiment of an image recognition processing device in accordance with an illustrative embodiment;
FIG. 8 is a block diagram of a third embodiment of an image recognition processing apparatus according to an exemplary embodiment;
FIG. 9 is a block diagram illustrating a fourth embodiment of an image recognition processing apparatus according to an exemplary embodiment;
FIG. 10 is a block diagram illustrating an image recognition processing device according to an exemplary embodiment;
fig. 11 is a block diagram illustrating another image recognition processing apparatus according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
Fig. 1 is a flowchart illustrating a first embodiment of an image recognition processing method according to an exemplary embodiment, and as shown in fig. 1, the image recognition processing method according to this embodiment may be used in a terminal device or a server, where the terminal device may be, for example, a mobile phone, a tablet computer, a PDA (Personal Digital Assistant, PDA for short), or the like. The image recognition processing method comprises the following steps.
In step 101, an image to be recognized is acquired.
In step 102, face recognition is performed on the image to be recognized, and simultaneously, a face classification result and position coordinates of the face organ points of the image to be recognized are obtained.
In this embodiment, the image to be recognized may not include a human face, or may include one or more human faces. In this embodiment, the face of the image to be recognized is recognized, and in addition to whether the image is a face image or not, the position coordinates of the face organ points are further obtained, so that different applications such as face makeup processing and the like are performed based on the obtained position coordinates of the face organ points when the image is recognized as the face image.
In this embodiment, the classification result and the coordinates of the organ point position of the image to be recognized are simultaneously recognized, and the recognition may be realized by using a face recognition model, which may be obtained based on training a convolutional neural network.
That is to say, in this embodiment, the face recognition model may be used to perform face recognition on the image to be recognized, and the face classification result and the coordinates of the position of the face organ point of the image to be recognized are obtained at the same time.
In order to obtain the above two recognition results, the output layers of the face recognition model include a classification output layer and a position regression output layer. The position regression output layer is used for positioning the position coordinates of the facial organ points, the classification output layer is used for identifying the face classification result, and the classification result can comprise two results, namely a face image or a non-face image, of the image to be identified.
In this embodiment, a Convolutional Neural Network (CNN) is used to construct a face recognition model, so as to perform recognition processing on an image to be recognized by using the face recognition model.
Convolutional neural networks are one type of artificial neural networks, and have become a hot research point in the field of current speech analysis and image recognition. The weight sharing network structure of the system is more similar to a biological neural network, the complexity of a network model is reduced, and the number of weights is reduced. The advantage is more obvious when the input of the network is a multi-dimensional image, so that the image can be directly used as the input of the network, and the complex characteristic extraction and data reconstruction process in the traditional recognition algorithm is avoided. Convolutional neural networks are multi-layered perceptrons specifically designed to recognize two-dimensional shapes, and the network structure is highly invariant to translation, scaling, tilting, or other forms of deformation.
The Alex network is one of the convolutional neural networks, and is a deep convolutional neural network map for object recognition which is relatively common at present. Fig. 2 is a diagram of an Alex network of a deep convolutional neural network, which is a multi-layer neural network including a plurality of convolutional layers, a downsampling layer, a full-link layer, and an output layer, as shown in fig. 2, each layer being composed of a plurality of two-dimensional planes, and each plane being composed of a plurality of independent neurons. The mapping from one plane to the next can be viewed as a convolution operation. In this embodiment, it is assumed that a face recognition model obtained based on a convolutional neural network has an N-layer structure, and a weight coefficient, i.e., a convolutional kernel, of each connection between two adjacent hidden layer nodes, that is, the above feature coefficient is determined by training a training sample set.
The output layer of the conventional convolutional neural network shown in fig. 4 is only one, i.e., a classification output layer having a classification function is provided. In this embodiment, on the basis of a conventional convolutional neural network, an output layer is extended to have two output layers with different functions, and the two different output layers are based on the same feature extraction process, that is, the two output layers share the feature coefficients before the output layer.
The two output layers are a classification output layer and a position regression output layer respectively, and each output layer is equivalent to a loss function or a supervision function. The classification output layer is used for realizing a classification function, namely determining whether the image to be identified is a human face image or not; and the position regression output layer is used for positioning the position of the organ point of the face, namely when the image to be identified is a face image, the position coordinates of the organ point of the face can be obtained.
Therefore, after the image to be recognized is input into the face recognition model based on the face recognition model, two output results can be obtained at the output end, one is the judgment result of whether the image is a face image, and the other is the position coordinates of the face organ points. The two output results are output simultaneously.
Specifically, if the image to be recognized is determined not to be a face image, the output facial organ point position coordinates are 0, that is, there is no position coordinates or other preset coordinate values representing no organ point position coordinates. And if the image to be recognized is determined to be a face image, outputting the recognized position coordinates of the face organ points.
In this embodiment, a face recognition model that can recognize whether a face image is recognized and can obtain position coordinates of face organ points is obtained by performing training with two output layers on a convolutional neural network based on deep learning. Therefore, the face recognition model is adopted to recognize the image to be recognized, and the position coordinates of the face organ points can be obtained simultaneously while the image to be recognized is determined to be the face image. Based on the face recognition model, not only can an accurate recognition result be obtained, but also the recognition processing efficiency can be improved.
Fig. 3 is a flowchart illustrating an embodiment two of an image recognition processing method according to an exemplary embodiment, and as shown in fig. 3, the method may further include the following steps:
in step 201, an image to be recognized is acquired.
In step 202, the image to be recognized is normalized.
In step 203, the normalized image to be recognized is subjected to face contour detection processing not greater than a preset level, and a face contour candidate region image is obtained.
In step 204, the face recognition model is used to recognize the face contour candidate area image, and the face classification result and the position coordinates of the face organ points of the image to be recognized are obtained at the same time.
In practical application, in order to further ensure the speed of the recognition processing and the accuracy of the recognition result, certain preprocessing can be performed on the image to be recognized, wherein the preprocessing comprises the normalization processing of the image. Such as normalizing for size, coordinate centering, x-sharpening, scaling and rotation, etc.
In addition, the preprocessing may further include performing face contour detection processing on the image to be recognized, particularly the normalized image to be recognized, to detect a rough face contour region, that is, the above-mentioned face contour candidate region image. Specifically, the adaboost algorithm may be used for face contour detection.
In the traditional face contour detection method, the adaboost algorithm is also adopted to carry out face contour detection processing, but in order to control the error rate, a plurality of layers of operations are generally carried out, and finally an accurate result is output. For example, adaboost performs 12-level operations to converge, so the speed is obviously not too fast, and the error rate cannot be controlled.
In the embodiment, although the adaboost algorithm is also used for detecting the face contour, only a rough face contour candidate region needs to be output, and an inaccurate result is not needed, even if the error rate is high. Therefore, the adaboost algorithm in this embodiment only needs to perform operations not greater than a preset level, for example, 7 levels, which greatly improves the time performance. Different from the conventional method, the error rate of the face contour detection in the embodiment is very high, but it is irrelevant, because the face contour detection in the embodiment mainly obtains the candidate region of the face, the face detection is not really performed, and even if there is a high error rate, the face can be accurately recognized by being subsequently input into the face recognition model obtained by the convolutional neural network based on deep learning.
In this embodiment, before the image to be recognized is recognized, the approximate region where the face is located can be obtained by performing certain facial contour detection processing on the image to be recognized in fewer levels, so that the face recognition model only needs to recognize the approximate region, and the processing efficiency of face recognition can be further improved.
Fig. 4 is a flowchart illustrating an embodiment of a third image recognition processing method according to an exemplary embodiment, and this embodiment describes a training process of a face recognition model, as shown in fig. 4. Specifically, the following steps may be included:
in step 301, a training sample set is obtained, where the training sample set includes a face sample image and a non-face sample image.
In step 302, the face sample image and the non-face sample image are normalized respectively.
In step 303, feature coefficients between hidden layer nodes of each layer in the convolutional neural network are trained according to the normalized human face sample image and the non-human face sample image to obtain a face recognition model.
In this embodiment, in order to ensure the accuracy and reliability of the face recognition model, a large number of face sample images and non-face sample images need to be collected during training, for example, 20 ten thousand face sample images and non-face sample images are respectively collected.
In order to ensure the accuracy and reliability of the training result, normalization processing, such as size, coordinate centering, x-sharpening, scaling, rotation and the like, may be performed on each sample image. And then, training the characteristic coefficients between hidden layer nodes of each layer in the convolutional neural network according to the normalized human face sample image and the normalized non-human face sample image to obtain a face recognition model.
In this embodiment, the purpose of training the convolutional neural network is to obtain a face recognition model having two output layers, so that, in order to ensure that the classification output layer is accurate and reliable in classification results and the position regression output layer is accurate and reliable in positioning results of positions of human face organ points, specifically, a process of training feature coefficients between hidden layer nodes in each layer in the convolutional neural network according to a human face sample image and a non-human face sample image is as follows:
and marking the first classification label on the face sample image, and marking the second classification label on the non-face sample image. And determining a first face organ point position coordinate corresponding to the face sample image and a second face organ point position coordinate corresponding to the non-face sample image, wherein the second face organ point position coordinate is set to a preset coordinate value, for example, 0. And then, respectively inputting the human face sample image, the corresponding first classification label, the corresponding first facial organ point position coordinate, the non-human face sample image, the second classification label and the corresponding second facial organ point position coordinate into a convolutional neural network, and training the characteristic coefficients among all layers of hidden layer nodes in the convolutional neural network to obtain a face recognition model.
That is to say, in order to distinguish the face sample image from the non-face sample image and to determine whether the trained face recognition model is accurate, the face sample image and the non-face sample image are respectively labeled with classification labels, it is assumed that the face sample image is labeled with a first classification label 1, and the non-face sample image is labeled with a second classification label 2.
In addition, in order to implement training of the position regression output layer function, the face organ point position coordinates of each sample image are determined in advance. For non-human face sample images, there are no organ point coordinates, and the position coordinates may be set to 0. For the face sample images, an algorithm such as sdm and the like can be adopted in advance to perform organ point calibration so as to obtain the organ point position coordinates of each face sample image.
Therefore, in the process of training the convolutional neural network, the sample images are respectively input to obtain an output result. Specifically, for the input of the face sample image, the corresponding classification label 1, and the corresponding face organ point position coordinates are input, and the output classification label and the output position coordinates are output. For the input non-human face sample image, the input is the non-human face sample image, the corresponding classification label 2, and the corresponding facial organ point position coordinate 0, and the output is the output classification label and the output position coordinate, which is ideally 0.
In the embodiment, the training sample set comprising a large number of human face sample images and non-human face sample images is adopted, the sample images, the classification labels corresponding to the sample images and the position coordinates of the facial organ points are used as input, and the convolutional neural network is trained, so that the obtained face recognition model can automatically and deeply learn the multi-level characteristic information contained in each sample image, the possibility of correctly recognizing the classification of the images to be recognized based on the face recognition model is improved, the position coordinates of the facial organ points can be accurately recognized and obtained, and the accuracy and reliability of the recognition result are ensured.
Fig. 5 is a flowchart illustrating a fourth embodiment of an image recognition processing method according to an exemplary embodiment, and as shown in fig. 5, on the basis of the previous embodiment, after step 303, the following steps may further be included:
in step 401, if the difference between the output classification label corresponding to the currently input sample image and the input classification number corresponding to the currently input sample image is greater than the preset difference, or if the distance between the output face organ point position coordinate corresponding to the currently input sample image and the input face organ point position coordinate corresponding to the currently input sample image is greater than the preset distance, the feature coefficients between hidden layer nodes of each layer obtained after training by the currently input sample image are adjusted.
The input classification number is a first classification label or a second classification label, and the input face organ point position coordinate is a first face organ point position coordinate or a second face organ point position coordinate.
It should be noted that, in the process of training the convolutional neural network, for any input sample image, if the output class label is different from the input class label, the feature coefficient may be adjusted, or if the distance between the output position coordinate and the input position coordinate is greater than a preset distance, the feature coefficient may be adjusted. After the characteristic coefficients are adjusted, the input training process of the subsequent sample image is carried out, and the process is repeated until the convolutional neural network converges, and at the moment, stable and reliable characteristic coefficients, namely convolutional kernels, can be obtained, so that the face recognition model is finally obtained.
Wherein, the distance between the output position coordinate and the input position coordinate can be calculated according to the equidistance measurement mode such as Euclidean distance, Markov distance, Chebyshev distance and cosine distance. The adjustment of the characteristic coefficient between the hidden layer nodes of each layer can be performed by adopting a gradient descent method.
In this embodiment, based on the training result of each sample image, each feature coefficient of the face recognition model is effectively adjusted in time, so that the face recognition model obtained by final training can be guaranteed to have better stability and reliability, and rapid convergence of recognition is realized.
The implementation procedure of the image recognition processing method, which can be implemented by the image recognition processing apparatus, is described above, and the internal function and structure of the image recognition processing apparatus will be explained below.
Fig. 6 is a block diagram illustrating a first embodiment of an image recognition processing apparatus according to an exemplary embodiment, as shown in fig. 6, the image recognition processing apparatus includes: the device comprises a first acquisition module 11 and an identification processing module 12.
A first acquisition module 11 configured to acquire an image to be recognized.
A recognition processing module 12 configured to perform face recognition on the image to be recognized acquired by the first acquisition module 11, and acquire a face classification result and facial organ point position coordinates of the image to be recognized at the same time.
Optionally, the identification processing module 12 includes: a first recognition processing sub-module 121.
The first recognition processing submodule 121 is configured to perform face recognition on the image to be recognized by using a face recognition model, and simultaneously acquire a face classification result and position coordinates of a face organ point of the image to be recognized.
The output layer of the face recognition model comprises a classification output layer and a position regression output layer, the position regression output layer is used for positioning the position coordinates of the face organ points, and the face classification result comprises the image to be recognized, namely a face image or a non-face image.
In this embodiment, the face image to be recognized acquired by the first acquiring module 11 may not include a face, or may include one or more faces. In this embodiment, the recognition processing module 12 performs face recognition on the image to be recognized, and further obtains position coordinates of face parts in addition to whether the image is a face image, so as to perform different applications such as face makeup processing based on the obtained position coordinates of the face parts when the image is recognized as a face image.
In this embodiment, in order to simultaneously recognize the classification result and the position coordinates of the organ point of the image to be recognized, the first recognition processing sub-module 121 may be implemented by using a face recognition model, which may be obtained based on training a convolutional neural network. That is to say, in this embodiment, the first recognition processing sub-module 121 may perform face recognition on the image to be recognized by using the face recognition model, and simultaneously acquire the face classification result and the coordinates of the position of the face organ point of the image to be recognized.
In order to obtain the above two recognition results, the output layers of the face recognition model include a classification output layer and a position regression output layer. The position regression output layer is used for positioning the position coordinates of the facial organ points, the classification output layer is used for identifying the face classification result, and the classification result can comprise two results, namely a face image or a non-face image, of the image to be identified.
In this embodiment, a Convolutional Neural Network (CNN) is used to construct a face recognition model, so as to perform recognition processing on an image to be recognized by using the face recognition model.
Convolutional neural networks are one type of artificial neural networks, and have become a hot research point in the field of current speech analysis and image recognition. The weight sharing network structure of the system is more similar to a biological neural network, the complexity of a network model is reduced, and the number of weights is reduced. The advantage is more obvious when the input of the network is a multi-dimensional image, so that the image can be directly used as the input of the network, and the complex characteristic extraction and data reconstruction process in the traditional recognition algorithm is avoided. Convolutional neural networks are multi-layered perceptrons specifically designed to recognize two-dimensional shapes, and the network structure is highly invariant to translation, scaling, tilting, or other forms of deformation.
The Alex network is one of the convolutional neural networks, and is a deep convolutional neural network map for object recognition which is relatively common at present. Fig. 2 is a diagram of an Alex network of a deep convolutional neural network, which is a multi-layer neural network including a plurality of convolutional layers, a downsampling layer, a full-link layer, and an output layer, as shown in fig. 2, each layer being composed of a plurality of two-dimensional planes, and each plane being composed of a plurality of independent neurons. The mapping from one plane to the next can be viewed as a convolution operation. In this embodiment, it is assumed that a face recognition model obtained based on a convolutional neural network has an N-layer structure, and a weight coefficient, i.e., a convolutional kernel, of each connection between two adjacent hidden layer nodes, that is, the above feature coefficient is determined by training a training sample set.
The output layer of the conventional convolutional neural network shown in fig. 4 is only one, i.e., a classification output layer having a classification function is provided. In this embodiment, on the basis of a conventional convolutional neural network, an output layer is extended to have two output layers with different functions, and the two different output layers are based on the same feature extraction process, that is, the two output layers share the feature coefficients before the output layer.
The two output layers are a classification output layer and a position regression output layer respectively, and each output layer is equivalent to a loss function or a supervision function. The classification output layer is used for realizing a classification function, namely determining whether the image to be identified is a human face image or not; and the position regression output layer is used for positioning the position of the organ point of the face, namely when the image to be identified is a face image, the position coordinates of the organ point of the face can be obtained.
Therefore, after the image to be recognized is input into the face recognition model based on the face recognition model, two output results can be obtained at the output end, one is the judgment result of whether the image is a face image, and the other is the position coordinates of the face organ points. The two output results are output simultaneously.
Specifically, if the image to be recognized is determined not to be a face image, the output facial organ point position coordinates are 0, that is, there is no position coordinates or other preset coordinate values representing no organ point position coordinates. And if the image to be recognized is determined to be a face image, outputting the recognized position coordinates of the face organ points.
In this embodiment, a face recognition model that can recognize whether a face image is recognized and can obtain position coordinates of face organ points is obtained by performing training with two output layers on a convolutional neural network based on deep learning. Therefore, the face recognition model is adopted to recognize the image to be recognized, and the position coordinates of the face organ points can be obtained simultaneously while the image to be recognized is determined to be the face image. Based on the face recognition model, not only can an accurate recognition result be obtained, but also the recognition processing efficiency can be improved.
Fig. 7 is a block diagram of a second embodiment of an image recognition processing apparatus according to an exemplary embodiment, as shown in fig. 7, based on the embodiment shown in fig. 6, the apparatus further includes: a face contour detection module 21.
A face contour detection module 21 configured to perform face contour detection processing not greater than a preset level on the image to be recognized acquired by the first acquisition module 11, and acquire a face contour candidate region image.
The identification processing module 12 includes: a second recognition processing sub-module 122.
A second recognition processing sub-module 122 configured to recognize the face contour candidate region image.
In practical application, in order to further ensure the speed of the recognition processing and the accuracy of the recognition result, certain preprocessing can be performed on the image to be recognized, wherein the preprocessing comprises the normalization processing of the image. Such as normalizing for size, coordinate centering, x-sharpening, scaling and rotation, etc.
In addition, the preprocessing may further include the face contour detection module 21 performing face contour detection processing on the image to be recognized, particularly the image to be recognized after the normalization processing, to detect a rough face contour region, that is, the above-mentioned face contour candidate region image. Specifically, the adaboost algorithm may be used for face contour detection.
In the traditional face contour detection method, the adaboost algorithm is also adopted to carry out face contour detection processing, but in order to control the error rate, a plurality of layers of operations are generally carried out, and finally an accurate result is output. For example, adaboost performs 12-level operations to converge, so the speed is obviously not too fast, and the error rate cannot be controlled.
In the embodiment, although the adaboost algorithm is also used for detecting the face contour, only a rough face contour candidate region needs to be output, and an inaccurate result is not needed, even if the error rate is high. Therefore, the adaboost algorithm in this embodiment only needs to perform operations not greater than a preset level, for example, 7 levels, which greatly improves the time performance. Different from the conventional method, the error rate of the face contour detection in the embodiment is very high, but it is irrelevant, because the face contour detection in the embodiment mainly obtains the candidate region of the face, the face detection is not really performed, and even if there is a high error rate, the face can be accurately recognized by being subsequently input into the face recognition model obtained by the convolutional neural network based on deep learning.
In this embodiment, before the image to be recognized is recognized, the approximate region where the face is located can be obtained by performing certain facial contour detection processing on the image to be recognized in fewer levels, so that the face recognition model only needs to recognize the approximate region, and the processing efficiency of face recognition can be further improved.
Fig. 8 is a block diagram of a third embodiment of an image recognition processing apparatus according to an exemplary embodiment, as shown in fig. 8, on the basis of the above embodiment, the apparatus further includes: a second acquisition module 31, a training module 32.
A second obtaining module 31 configured to obtain a training sample set, where the training sample set includes a face sample image and a non-face sample image.
A training module 32 configured to train feature coefficients between hidden layer nodes in each layer in a convolutional neural network according to the face sample image and the non-face sample image acquired by the second acquiring module 31, so as to obtain the face recognition model.
In particular, the training module 32 comprises: a marking sub-module 321, a determining sub-module 322, and a training sub-module 323.
A labeling sub-module 321 configured to label the face sample image acquired by the second acquisition module 31 with a first classification label, and label the non-face sample image acquired by the second acquisition module with a second classification label.
A determining submodule 322, configured to determine a first face organ point position coordinate corresponding to the face sample image acquired by the second acquiring module 31 and a second face organ point position coordinate corresponding to the non-face sample image acquired by the second acquiring module, where the second face organ point position coordinate is set to a preset coordinate value.
The training submodule 323 is configured to input the face sample image, the first classification label, the first face organ point position coordinate, the non-face sample image, the second classification label, and the second face organ point position coordinate into the convolutional neural network, and train feature coefficients between hidden layer nodes of each layer in the convolutional neural network to obtain the face recognition model.
In this embodiment, in order to ensure the accuracy and reliability of the face recognition model, a large number of face sample images and non-face sample images, for example, 20 ten thousand face sample images and non-face sample images, need to be obtained during training.
In order to ensure the accuracy and reliability of the training result, normalization processing may be performed on each sample image, for example, the second normalization module 33 performs normalization processing such as size, coordinate centering, x-sharpening, scaling, and rotation. Further, the training module 32 trains the feature coefficients between hidden layer nodes of each layer in the convolutional neural network according to the normalized human face sample image and non-human face sample image to obtain a face recognition model.
In this embodiment, the purpose of training the convolutional neural network is to obtain a face recognition model having two output layers, so that, in order to ensure that the classification output layer is accurate and reliable in classification results and the position regression output layer is accurate and reliable in positioning results of positions of human face organ points, specifically, a process of training feature coefficients between hidden layer nodes in each layer in the convolutional neural network according to a human face sample image and a non-human face sample image is as follows:
the labeling sub-module 321 performs labeling of a first classification label on the face sample image, and performs labeling of a second classification label on the non-face sample image. The determining sub-module 322 determines a first face organ point position coordinate corresponding to the face sample image and a second face organ point position coordinate corresponding to the non-face sample image, where the second face organ point position coordinate is set to a preset coordinate value, such as 0. Further, the training sub-module 323 inputs the face sample image, the corresponding first classification label, the corresponding first facial organ point position coordinate, the non-face sample image, the second classification label, and the corresponding second facial organ point position coordinate into the convolutional neural network, and trains the feature coefficients between hidden layer nodes of each layer in the convolutional neural network to obtain the face recognition model.
That is to say, in order to distinguish the face sample image from the non-face sample image and to determine whether the trained face recognition model is accurate, the face sample image and the non-face sample image are respectively labeled with classification labels, it is assumed that the face sample image is labeled with a first classification label 1, and the non-face sample image is labeled with a second classification label 2.
In addition, in order to implement training of the position regression output layer function, the face organ point position coordinates of each sample image are determined in advance. For non-human face sample images, there are no organ point coordinates, and the position coordinates may be set to 0. For the face sample images, an algorithm such as sdm and the like can be adopted in advance to perform organ point calibration so as to obtain the organ point position coordinates of each face sample image.
Therefore, in the process of training the convolutional neural network, the sample images are respectively input to obtain an output result. Specifically, for the input of the face sample image, the corresponding classification label 1, and the corresponding face organ point position coordinates are input, and the output classification label and the output position coordinates are output. For the input non-human face sample image, the input is the non-human face sample image, the corresponding classification label 2, and the corresponding facial organ point position coordinate 0, and the output is the output classification label and the output position coordinate, which is ideally 0.
In the embodiment, the training sample set comprising a large number of human face sample images and non-human face sample images is adopted, the sample images, the classification labels corresponding to the sample images and the position coordinates of the facial organ points are used as input, and the convolutional neural network is trained, so that the obtained face recognition model can automatically and deeply learn the multi-level characteristic information contained in each sample image, the possibility of correctly recognizing the classification of the images to be recognized based on the face recognition model is improved, the position coordinates of the facial organ points can be accurately recognized and obtained, and the accuracy and reliability of the recognition result are ensured.
Fig. 9 is a block diagram illustrating a fourth embodiment of an image recognition processing apparatus according to an exemplary embodiment, as shown in fig. 9, on the basis of the embodiment shown in fig. 8, the image recognition processing apparatus further includes: the module 41 is adjusted.
The adjusting module 41 is configured to adjust a feature coefficient between hidden layer nodes of each layer obtained after training the currently input sample image when a difference between an output classification label corresponding to the currently input sample image and an input classification number corresponding to the currently input sample image is greater than a preset difference, or when a distance between an output human face organ point position coordinate corresponding to the currently input sample image and an input human face organ point position coordinate corresponding to the currently input sample image is greater than a preset distance.
The input classification number is the first classification label or the second classification label, and the input face organ point position coordinate is the first face organ point position coordinate or the second face organ point position coordinate.
In the process of training the convolutional neural network, for any input sample image, if the output class label is different from the input class label, the adjusting module 41 may adjust the feature coefficient, or if the distance between the output position coordinate and the input position coordinate is greater than a preset distance, the adjusting module 41 may adjust the feature coefficient. After the characteristic coefficients are adjusted, the input training process of the subsequent sample image is carried out, and the process is repeated until the convolutional neural network converges, and at the moment, stable and reliable characteristic coefficients, namely convolutional kernels, can be obtained, so that the face recognition model is finally obtained.
Wherein, the distance between the output position coordinate and the input position coordinate can be calculated according to the equidistance measurement mode such as Euclidean distance, Markov distance, Chebyshev distance and cosine distance. The adjustment module 41 may adjust the characteristic coefficients between hidden layer nodes of each layer by using a gradient descent method.
In this embodiment, based on the training result of each sample image, each feature coefficient of the face recognition model is effectively adjusted in time, so that the face recognition model obtained by final training can be guaranteed to have better stability and reliability, and rapid convergence of recognition is realized.
Having described the internal functions and structure of the image recognition processing apparatus, as shown in fig. 10, fig. 10 is a block diagram of an image recognition processing apparatus according to an exemplary embodiment; the image recognition processing apparatus may be realized as:
a memory;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring an image to be identified;
and carrying out face recognition on the image to be recognized, and acquiring a face classification result and position coordinates of the face organ point of the image to be recognized.
In this embodiment, when performing face recognition on an image to be recognized, the position coordinates of the face organ points can be obtained while determining the face classification result, for example, whether the image is a face image, so that the recognition processing efficiency is improved.
Fig. 11 is a block diagram illustrating another image recognition processing apparatus according to an exemplary embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 11, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.
The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user, in some embodiments, the screen may include a liquid crystal display (L CD) and a Touch Panel (TP). if the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), programmable logic devices (P L D), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the methods described above.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium in which instructions, when executed by a processor of a terminal device, enable the terminal device to perform an image recognition processing method, the method comprising:
acquiring an image to be identified;
and carrying out face recognition on the image to be recognized, and acquiring a face classification result and position coordinates of the face organ point of the image to be recognized.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (7)

1. An image recognition processing method is characterized by comprising the following steps:
acquiring an image to be identified;
adopting a face recognition model to perform face recognition on the image to be recognized, and simultaneously acquiring a face classification result and position coordinates of face organ points of the image to be recognized;
the output layer of the face recognition model comprises a classification output layer and a position regression output layer, the position regression output layer is used for positioning the position coordinates of the face organ points, and the face classification result comprises that the image to be recognized is a face image or a non-face image;
acquiring a training sample set, wherein the training sample set comprises a face sample image and a non-face sample image;
marking a first classification label on the face sample image, and marking a second classification label on the non-face sample image;
determining a first face organ point position coordinate corresponding to the face sample image and a second face organ point position coordinate corresponding to the non-face sample image, wherein the second face organ point position coordinate is set as a preset coordinate value;
respectively inputting the human face sample image, the first classification label, the first human face organ point position coordinate, the non-human face sample image, the second classification label and the second human face organ point position coordinate into a convolutional neural network, and training the characteristic coefficients among all layers of hidden layer nodes in the convolutional neural network;
and simultaneously outputting two output results, wherein one output result is a judgment result of the face image, the other output result is the position coordinates of the facial organ points, when the image to be recognized is determined to be the face image, the recognized position coordinates of the facial organ points are output, and when the image to be recognized is determined not to be the face image, the coordinate values without the position coordinates or the preset other coordinates representing the position coordinates of the organ points are output.
2. The method of claim 1, further comprising:
carrying out face contour detection processing on the image to be recognized, wherein the face contour detection processing is not larger than a preset level, and obtaining a face contour candidate area image of the image to be recognized;
the identifying the image to be identified comprises:
and identifying the face contour candidate area image.
3. The method of claim 1, further comprising:
if the difference between the output classification label corresponding to the currently input sample image and the input classification label corresponding to the currently input sample image is larger than a preset difference, or if the distance between the output human face organ point position coordinate corresponding to the currently input sample image and the input human face organ point position coordinate corresponding to the currently input sample image is larger than a preset distance, adjusting the characteristic coefficient between hidden layer nodes of each layer obtained after the currently input sample image is trained;
the input classification number is the first classification label or the second classification label, and the input face organ point position coordinate is the first face organ point position coordinate or the second face organ point position coordinate.
4. An image recognition processing apparatus, characterized by comprising:
the device comprises a first acquisition module, a second acquisition module and a recognition module, wherein the first acquisition module is configured to acquire an image to be recognized;
the recognition processing module is configured to perform face recognition on the image to be recognized by adopting a face recognition model, and simultaneously acquire a face classification result and position coordinates of a face organ point of the image to be recognized;
the output layer of the face recognition model comprises a classification output layer and a position regression output layer, the position regression output layer is used for positioning the position coordinates of the face organ points, and the face classification result comprises that the image to be recognized is a face image or a non-face image;
a second obtaining module configured to obtain a training sample set, wherein the training sample set comprises a face sample image and a non-face sample image;
a training module comprising:
the marking sub-module is configured to mark the face sample image acquired by the second acquisition module with a first classification label and mark the non-face sample image acquired by the second acquisition module with a second classification label;
a determining submodule configured to determine a first face organ point position coordinate corresponding to the face sample image acquired by the second acquisition module and a second face organ point position coordinate corresponding to the non-face sample image acquired by the second acquisition module, the second face organ point position coordinate being set to a preset coordinate value;
the training submodule is configured to input the face sample image, the first classification label, the first face organ point position coordinate and the non-face sample image, the second classification label and the second face organ point position coordinate into a convolutional neural network respectively, and train feature coefficients between hidden layer nodes of each layer in the convolutional neural network;
and simultaneously outputting two output results, wherein one output result is a judgment result of the face image, the other output result is the position coordinates of the facial organ points, when the image to be recognized is determined to be the face image, the recognized position coordinates of the facial organ points are output, and when the image to be recognized is determined not to be the face image, the coordinate values without the position coordinates or the preset other coordinates representing the position coordinates of the organ points are output.
5. The apparatus of claim 4, further comprising:
the face contour detection module is configured to perform face contour detection processing not larger than a preset level on the image to be recognized and acquire a face contour candidate area image of the image to be recognized;
the identification processing module comprises:
a second recognition processing sub-module configured to recognize the face contour candidate region image.
6. The apparatus of claim 4, further comprising:
the adjusting module is configured to adjust a feature coefficient between hidden layer nodes of each layer obtained after training of a currently input sample image when a difference value between an output classification label corresponding to the currently input sample image and an input classification number corresponding to the currently input sample image is larger than a preset difference value, or when a distance between an output human face organ point position coordinate corresponding to the currently input sample image and an input human face organ point position coordinate corresponding to the currently input sample image is larger than a preset distance;
the input classification number is the first classification label or the second classification label, and the input face organ point position coordinate is the first face organ point position coordinate or the second face organ point position coordinate.
7. An image recognition processing apparatus, characterized by comprising:
a memory;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring an image to be identified;
adopting a face recognition model to perform face recognition on the image to be recognized, and simultaneously acquiring a face classification result and position coordinates of face organ points of the image to be recognized;
the output layer of the face recognition model comprises a classification output layer and a position regression output layer, the position regression output layer is used for positioning the position coordinates of the face organ points, and the face classification result comprises that the image to be recognized is a face image or a non-face image;
acquiring a training sample set, wherein the training sample set comprises a face sample image and a non-face sample image;
marking a first classification label on the face sample image, and marking a second classification label on the non-face sample image;
determining a first face organ point position coordinate corresponding to the face sample image and a second face organ point position coordinate corresponding to the non-face sample image, wherein the second face organ point position coordinate is set as a preset coordinate value;
respectively inputting the human face sample image, the first classification label, the first human face organ point position coordinate, the non-human face sample image, the second classification label and the second human face organ point position coordinate into a convolutional neural network, and training the characteristic coefficients among all layers of hidden layer nodes in the convolutional neural network;
and simultaneously outputting two output results, wherein one output result is a judgment result of the face image, the other output result is the position coordinates of the facial organ points, when the image to be recognized is determined to be the face image, the recognized position coordinates of the facial organ points are output, and when the image to be recognized is determined not to be the face image, the coordinate values without the position coordinates or the preset other coordinates representing the position coordinates of the organ points are output.
CN201510958744.9A 2015-12-18 2015-12-18 Image recognition processing method and device Active CN105631406B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510958744.9A CN105631406B (en) 2015-12-18 2015-12-18 Image recognition processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510958744.9A CN105631406B (en) 2015-12-18 2015-12-18 Image recognition processing method and device

Publications (2)

Publication Number Publication Date
CN105631406A CN105631406A (en) 2016-06-01
CN105631406B true CN105631406B (en) 2020-07-10

Family

ID=56046319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510958744.9A Active CN105631406B (en) 2015-12-18 2015-12-18 Image recognition processing method and device

Country Status (1)

Country Link
CN (1) CN105631406B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341436B (en) * 2016-08-19 2019-02-22 北京市商汤科技开发有限公司 Gestures detection network training, gestures detection and control method, system and terminal
CN106446946B (en) * 2016-09-22 2020-07-21 北京小米移动软件有限公司 Image recognition method and device
CN106446862A (en) * 2016-10-11 2017-02-22 厦门美图之家科技有限公司 Face detection method and system
CN106778543A (en) * 2016-11-29 2017-05-31 北京小米移动软件有限公司 Single face detecting method, device and terminal
CN108171244A (en) * 2016-12-07 2018-06-15 北京深鉴科技有限公司 Object identifying method and system
CN107239727A (en) * 2016-12-07 2017-10-10 北京深鉴智能科技有限公司 Gesture identification method and system
CN107665341A (en) * 2017-09-30 2018-02-06 珠海市魅族科技有限公司 One kind identification control method, electronic equipment and computer product
CN110163032B (en) * 2018-02-13 2021-11-16 浙江宇视科技有限公司 Face detection method and device
CN109117741A (en) * 2018-07-20 2019-01-01 苏州中德宏泰电子科技股份有限公司 Offline object identifying method and device to be detected
CN109558791B (en) * 2018-10-11 2020-12-01 浙江大学宁波理工学院 Bamboo shoot searching device and method based on image recognition
CN111414895A (en) * 2020-04-10 2020-07-14 上海卓繁信息技术股份有限公司 Face recognition method and device and storage equipment
CN111523476B (en) * 2020-04-23 2023-08-22 北京百度网讯科技有限公司 Mask wearing recognition method, device, equipment and readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095911A (en) * 2015-07-31 2015-11-25 小米科技有限责任公司 Sensitive picture identification method and apparatus, and server

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7236615B2 (en) * 2004-04-21 2007-06-26 Nec Laboratories America, Inc. Synergistic face detection and pose estimation with energy-based models
CN103793693A (en) * 2014-02-08 2014-05-14 厦门美图网科技有限公司 Method for detecting face turning and facial form optimizing method with method for detecting face turning
CN103824055B (en) * 2014-02-17 2018-03-02 北京旷视科技有限公司 A kind of face identification method based on cascade neural network
CN104346607B (en) * 2014-11-06 2017-12-22 上海电机学院 Face identification method based on convolutional neural networks
CN105095859B (en) * 2015-06-29 2019-03-15 小米科技有限责任公司 Face identification method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095911A (en) * 2015-07-31 2015-11-25 小米科技有限责任公司 Sensitive picture identification method and apparatus, and server

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Fast R-CNN;Ross Girshick等;《2015 IEEE International Conference on Computer Vision(ICCV)》;20151213;第1440-1447页 *
Ross Girshick等.Fast R-CNN.《2015 IEEE International Conference on Computer Vision(ICCV)》.2015,第1440-1447页. *

Also Published As

Publication number Publication date
CN105631406A (en) 2016-06-01

Similar Documents

Publication Publication Date Title
CN105631406B (en) Image recognition processing method and device
CN108121952B (en) Face key point positioning method, device, equipment and storage medium
CN105631408B (en) Face photo album processing method and device based on video
CN110688951B (en) Image processing method and device, electronic equipment and storage medium
CN105654033B (en) Face image verification method and device
CN109871896B (en) Data classification method and device, electronic equipment and storage medium
WO2019101021A1 (en) Image recognition method, apparatus, and electronic device
US10616475B2 (en) Photo-taking prompting method and apparatus, an apparatus and non-volatile computer storage medium
CN108256555B (en) Image content identification method and device and terminal
CN110602527B (en) Video processing method, device and storage medium
CN105512685B (en) Object identification method and device
CN104408402B (en) Face identification method and device
CN106228556B (en) image quality analysis method and device
CN109859096A (en) Image Style Transfer method, apparatus, electronic equipment and storage medium
US20120321193A1 (en) Method, apparatus, and computer program product for image clustering
CN109446961B (en) Gesture detection method, device, equipment and storage medium
CN106845398B (en) Face key point positioning method and device
CN106557759B (en) Signpost information acquisition method and device
CN105426857A (en) Training method and device of face recognition model
CN107133354B (en) Method and device for acquiring image description information
CN106326853B (en) Face tracking method and device
CN107463903B (en) Face key point positioning method and device
US11868521B2 (en) Method and device for determining gaze position of user, storage medium, and electronic apparatus
CN106295499A (en) Age estimation method and device
CN107992841A (en) The method and device of identification objects in images, electronic equipment, readable storage medium storing program for executing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200603

Address after: 100085, Haidian District, 68, Qinghe street, Huarun colorful city shopping center, two, 9, 01, room

Applicant after: BEIJING XIAOMI MOBILE SOFTWARE Co.,Ltd.

Applicant after: Xiaomi Technology Co.,Ltd.

Address before: 100085, Haidian District, Beijing Qinghe Street No. 68, Huarun colorful city shopping center two, 13 layers

Applicant before: Xiaomi Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant