CN109214333A

CN109214333A - Convolutional neural networks structure, face character recognition methods, device and terminal device

Info

Publication number: CN109214333A
Application number: CN201811018499.3A
Authority: CN
Inventors: 陈书楷; 杨奇
Original assignee: Xiamen Central Intelligent Information Technology Co Ltd; ZKTeco Co Ltd
Current assignee: Xiamen Central Intelligent Information Technology Co Ltd; ZKTeco Co Ltd
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2019-01-15

Abstract

The application is suitable for technical field of biometric identification, a kind of convolutional neural networks structure, face character recognition methods, device and terminal device are provided, is provided with inception layers of at least one layer of warp lamination and at least one layer between the first layer convolutional layer and the last layer convolutional layer of the convolutional neural networks.The application can solve the convolution kernel that the same convolutional layer of convolutional neural networks in the prior art uses identical size, it is difficult to adapt to the variation of input image size, and convolutional layer without calligraphy learning to more abundant feature the problem of.

Description

Convolutional neural network structure, face attribute identification method and device and terminal equipment

Technical Field

The application belongs to the technical field of biological recognition, and particularly relates to a convolutional neural network structure, a face attribute recognition method, a face attribute recognition device and terminal equipment.

Background

With the development of science and technology, people can acquire personal information of people from information sources such as images, audios and videos through a biometric identification technology, face attribute identification is one of the biometric identification technologies, and people can acquire personal information such as gender, age and race of people from images or videos through face attribute identification.

At present, face attribute recognition is mainly carried out through a convolutional neural network, and in the process of designing a convolutional neural network structure, convolutional kernels with the same size are often adopted in the convolutional layers of the same layer, so that the size change of an input image is difficult to adapt.

Meanwhile, the convolutional neural network at present adopts convolutional layers, and the learned characteristics of each convolutional layer are less than those of the last convolutional layer, so that more abundant characteristics cannot be learned.

In summary, the conventional convolutional neural network has the problems that the same convolutional layer adopts convolutional kernels with the same size, the convolutional layer is difficult to adapt to the size change of an input image, and the convolutional layer cannot learn richer characteristics.

Disclosure of Invention

In view of this, embodiments of the present application provide a convolutional neural network structure, a face attribute identification method, an apparatus, and a terminal device, so as to solve the problems that in the prior art, a convolutional neural network and a convolutional layer use convolutional kernels of the same size, and are difficult to adapt to a change in the size of an input image, and the convolutional layer cannot learn richer features.

A first aspect of an embodiment of the present application provides a convolutional neural network structure for face attribute recognition, where at least one deconvolution layer and at least one interception layer are disposed between a first layer of convolutional layer and a last layer of convolutional layer of the convolutional neural network, and a plurality of convolutional kernels of different sizes are used in the interception for convolution.

A second aspect of the embodiments of the present application provides a face attribute identification method, including:

acquiring an image to be detected, and performing face detection and face alignment on the image to be detected to obtain a face image to be identified;

and inputting the face image to be recognized into the trained convolutional neural network to obtain a face attribute recognition result of the face image to be recognized.

A third aspect of the embodiments of the present application provides a face attribute recognition apparatus, including:

the face detection module is used for acquiring an image to be detected, and performing face detection and face alignment on the image to be detected to obtain a face image to be recognized;

and the attribute recognition module is used for inputting the face image to be recognized into the trained convolutional neural network to obtain a face attribute recognition result of the face image to be recognized.

A fourth aspect of the embodiments of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method when executing the computer program.

A fifth aspect of embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, implements the steps of the method as described above.

Compared with the prior art, the embodiment of the application has the advantages that:

the application provides a convolutional neural network structure for face attribute recognition, at least one deconvolution layer and at least one interception layer are arranged between a first convolutional layer and a last convolutional layer of the convolutional neural network, the deconvolution layer can supplement and expand the characteristic image learned by the last convolutional layer, therefore, richer features are learned, a plurality of convolution kernels with different sizes are adopted when the initiation layer is convolved, the different sizes of the convolution kernels can enable the initiation layer to learn the features with different scales, therefore, the convolutional neural network can better adapt to the size change of the input image, meanwhile, the diversity of the features can be increased by adopting a plurality of convolutional kernels with different sizes, and the problems that in the prior art, the convolutional neural network adopts the convolutional kernels with the same size with the same convolutional layer, the adaptation to the size change of the input image is difficult, and the convolutional layer cannot learn richer features are solved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic structural diagram of a convolutional neural network structure for face attribute recognition according to an embodiment of the present application;

fig. 2 is a schematic flow chart illustrating an implementation of a face attribute identification method according to an embodiment of the present application;

fig. 3 is a schematic diagram of a face attribute recognition apparatus according to an embodiment of the present application;

fig. 4 is a schematic diagram of a terminal device provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

In particular implementations, the mobile terminals described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having touch sensitive surfaces (e.g., touch screen displays and/or touch pads). It should also be understood that in some embodiments, the devices described above are not portable communication devices, but rather are desktop computers having touch-sensitive surfaces (e.g., touch screen displays and/or touch pads).

In the discussion that follows, a mobile terminal that includes a display and a touch-sensitive surface is described. However, it should be understood that the mobile terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.

The mobile terminal supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a website creation application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a digital camera application, a web browsing application, a digital music player application, and/or a digital video player application.

Various applications that may be executed on the mobile terminal may use at least one common physical user interface device, such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical architecture (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.

The first embodiment is as follows:

the embodiment of the application provides a convolutional neural network structure for face attribute recognition, wherein at least one deconvolution layer and at least one interception layer are arranged between a first layer of convolutional layer and a last layer of convolutional layer of the convolutional neural network, and a plurality of convolutional kernels with different sizes are adopted when interception is carried out convolution.

The convolutional neural network comprises a plurality of convolutional layers, and different types of convolutional neural network structures can be obtained through the combination of the convolutional layers, the pooling layers, the full-link layers and the like.

At least one deconvolution layer and at least one interception layer are arranged between a first convolution layer and a last convolution layer of a convolutional neural network.

A series of connected convolutional layers are often adopted in the current convolutional neural network, the size of a feature image is continuously reduced in the learning process of each convolutional layer, each convolutional layer is learned on the basis of the feature image learned by the last convolutional layer, and richer features are difficult to learn.

The forward propagation process of the deconvolution layer (also called a transposed convolution layer) can be regarded as the backward propagation process of the convolution layer, the size of the characteristic image can be restored through the deconvolution layer, the characteristic image is supplemented and expanded, and not only can characteristics of a higher level be learned, but also the model parameters of the network can be reduced.

Meanwhile, in the current convolutional neural network, one convolutional layer is provided with one or more convolutional kernels, and when a plurality of convolutional kernels are arranged, the sizes of the convolutional kernels are the same, so that the convolutional neural network is difficult to adapt to the condition of size change of an input image.

And a plurality of convolution kernels with different sizes are applied in the acceptance layer, so that the diversity of the characteristic images can be increased, the characteristic images can be fused in a multi-scale mode, the operation amount is reduced, and the convolution neural network can be well adapted to the change of the size of the input image.

Further, the output layer of the convolutional neural network is provided with a loss function, wherein the face attribute is divided into a semantic attribute and an order attribute in advance, and the semantic attribute and the order attribute are set to correspond to different loss functions, so that the recognition accuracy of the trained convolutional neural network is improved. For example, the semantic attribute is set to correspond to a cross entropy loss function, and the orderliness attribute corresponds to a SmoothL1 loss function.

The output layer of the convolutional neural network is provided with a loss function, and the human face attributes comprise semantic attributes and sequence attributes.

The semantic attributes include one or more of overall attributes (e.g., race and gender), local attributes (e.g., whether there is a beard), motion attributes (e.g., expression and motion), and wearing attributes (e.g., whether glasses and a hat are worn), and the semantic attributes refer to attributes of the human face that do not need to be sized or sized.

The order attribute refers to an attribute requiring judgment of size or length, such as age and hair, age requiring judgment of size, and hair requiring judgment of length.

In the embodiment of the application, different loss functions are set for different face attributes, the semantic attributes correspond to cross entropy loss functions, and the sequential attributes correspond to SmoothL1 loss functions.

In the training process of the convolutional neural network, the loss functions of all the face attributes are summed according to preset weight values to obtain a total loss function, a minimized objective function with the minimum total loss function as a target is constructed, the predicted value and the real value of a training sample are calculated through the minimized objective function, the weight values and the offset values of the convolutional neural network are updated through back propagation, the total loss function is made to be smaller and smaller until the preset requirement is met, the training is completed, and the expression of the minimized objective function is as follows:

wherein, G is the number of kinds of face attributes, for example, the face attributes are classified into age, gender, race, and the like; m^gIs a category of some human face attribute, such as gender classification into male and female; n is the number of the face sample images; lambda [ alpha ]^gIs a preset weight value; l is^gIs a loss function corresponding to each face attribute, and predicts the function F and the true value of the face sample image through the nonlinear attributeMapping to obtain; w^gWeight values that are shared feature sub-networks (e.g., convolutional layers, deconvolution layers, and acceptance layers); w_cIs a weight value that specifies a sub-network (e.g., a fully-connected layer) of the attribute feature; gamma ray₁And gamma₂Is a regular constant greater than 0; the phi function is a regular term and plays a punishment role in the weight value of the network.

And a targeted loss function is used according to the characteristics of different face attributes, the semantic attributes correspond to a cross entropy loss function, the sequential attributes correspond to a SmoothL1 loss function, the convolutional neural network can be trained more accurately, and the recognition accuracy of the trained convolutional neural network is improved, wherein the expression of the cross entropy loss function is as follows:

wherein,is a Softmax prediction function with results in [0,1 ]]Within the interval;the predicted value of the Kth of the jth face attribute is obtained through calculation of an attribute prediction function F;is the true value of the jth face attribute;it indicates that 1 is output when the two are the same in the parentheses, and 0 is output otherwise.

The expression of the SmoothL1 loss function is as follows:

wherein x is_iOrder attribute prediction value, y, for ith personal face sample image_iIs the actual value of the orderliness attribute of the ith personal face sample image.

The loss function of each semantic attribute and the loss function of the ordinal attribute can be directly substituted into the minimum objective function, or the total loss function of each semantic attribute can be obtained by summing each semantic attribute according to a preset semantic weight value, the total loss function of each ordinal attribute is obtained by summing each ordinal attribute according to a preset ordinal weight value, and then the total loss function of the semantic attributes and the total loss function of the ordinal attributes are substituted into the minimum objective function, wherein the summing process comprises the following steps:

L_ctotal loss function representing semantic attributes or total loss function representing ordinal attributes, T representing the number of classes of semantic attributes or the number of classes of ordinal attributes, α_tA preset semantic weight value representing a tth semantic attribute or a preset orderliness weight value representing a tth orderliness attribute.

Taking an example of improvement of an Alex model comprising 5 convolutional layers and 2 fully-connected layers, the improved model is named as an Alinc model, and the structure of the Alinc model is shown in fig. 1, four convolutional layers are arranged in the Alinc model, one deconvolution layer 9 is arranged between the second convolutional layer 2 and the third convolutional layer 3, an initiation layer 10 is arranged between the second convolutional layer 2 and the deconvolution layer 9, an initiation layer 11 is arranged between the deconvolution layer 9 and the third convolutional layer 3, the first convolutional layer 1 is connected with an input layer 7, the fourth convolutional layer 4 is connected with a first fully-connected layer 5, the first fully-connected layer 5 is connected with a second fully-connected layer 6, the second fully-connected layer 6 is connected with an output layer 8, a loss function is arranged in the output layer, semantic attributes correspond to cross-entropy loss functions, and sequence attributes correspond to SmoothL1 loss functions.

The aligned face image is used For training an Alinc model and testing the recognition accuracy of the Alinc model, the size of the input face image can be set to be 3 × 96 × 112, wherein 3 is the number of channels of a color image, the width is 96, the height is 112, the accuracy after the test is 90.25%, the Alinc model is compared with other models which are similar to the number of model layers and the size of the Alinc model, For example, the number of model layers in a document "scaling mix-Level details representation For Predicting FaceAttributes in the Wild" is more than that of the Alinc model, and the size of the model is also larger than that of the Alinc model, but the accuracy of the model is only 89.8%, and the technical scheme in the embodiment can effectively improve the structure of the convolutional neural network and improve the recognition accuracy of the neural network through comparison.

In the first embodiment of the application, at least one deconvolution layer and at least one interception layer are arranged between the first layer of convolution layer and the last layer of convolution layer of the convolutional neural network, the convolutional neural network can better adapt to the change of the size of an input image through the interception layer, meanwhile, the diversity of characteristic images is increased, the characteristic images are fused in a multi-scale mode, the operation amount is reduced, the size of the characteristic images can be reduced through the deconvolution layer, the characteristic images are subjected to supplementary expansion, characteristics at a higher level can be learned, model parameters of the network can be reduced, and the problems that in the prior art, the same convolution layer of the convolutional neural network adopts convolution kernels with the same size, the adaptation to the change of the size of the input image is difficult, and the convolution layer cannot learn richer characteristics are solved.

Meanwhile, different loss functions are set for different face attributes in an output layer, the semantic attributes correspond to cross entropy loss functions, and the sequential attributes correspond to SmoothL1 loss functions, so that the convolutional neural network is trained more accurately, and the identification accuracy of the trained convolutional neural network is improved.

Example two:

referring to fig. 2, a method for identifying a face attribute provided in the second embodiment of the present application is described below, where the method for identifying a face attribute in the second embodiment of the present application includes:

step S201, obtaining an image to be detected, and performing face detection and face alignment on the image to be detected to obtain a face image to be recognized.

When the image to be detected comprises a plurality of face images, the face detection and the face alignment are carried out on the image to be detected to obtain a plurality of face images to be recognized, so that only one face appears in one face image, and the face attribute recognition by the convolutional neural network is facilitated.

Step S202, inputting the face image to be recognized into a trained convolutional neural network to obtain a face attribute recognition result of the face image to be recognized.

After the face image to be recognized is input into the trained convolutional neural network, shared feature sub-networks (such as a convolutional layer, a deconvolution layer and an initiation layer) in the convolutional neural network can extract shared features of the face image to be recognized, designated attribute feature sub-networks (such as a full-link layer) can extract features of designated attributes of the face image to be recognized, and an output layer calculates each face attribute of the face image to be recognized through the shared features and the features of the designated attributes to obtain a recognition result of the face attribute of each face image to be recognized in the image to be detected.

In addition, before the image to be detected is subjected to face detection, the image to be detected can be screened to judge whether the image to be detected meets the preset image requirements, for example, whether the image definition reaches a preset definition threshold value, whether the image brightness reaches a preset brightness threshold value and the like, and the image to be detected which does not meet the preset image requirements is screened and removed.

Further, the convolutional neural network is trained by the following method:

acquiring a face sample image, labeling each face attribute to be learned in the face sample image, inputting the labeled face sample image into an initial convolutional neural network, and training the initial convolutional neural network to obtain the trained convolutional neural network.

Before training an initial convolutional neural network, a face sample image needs to be acquired, face attributes needing to be recognized by the convolutional neural network need to be determined, and then, each face attribute to be learned in the face sample image is labeled, for example, if the face attribute needing to be recognized by the convolutional neural network is race, age and gender, the race, age and gender of a face in the face sample image need to be labeled, for example, the face attribute can be labeled (yellow-bred person, 40 years old, male).

And inputting the labeled face image into an initial convolutional neural network, and training the initial convolutional neural network to obtain a trained convolutional neural network.

After the trained convolutional neural network is obtained, the trained convolutional neural network can be tested and relearned to obtain a face test image, the face test image is input into the trained convolutional neural network to obtain a face attribute recognition result of the face test image, the error between the face attribute result output by the trained convolutional neural network and the actual face attribute of the face test image is calculated, the face test image with the error larger than the preset error range is used as a new face sample image, the convolutional neural network is trained continuously, the preset error range can be determined according to the actual situation, for example, when the face attribute to be recognized is age, gender and race, the preset error range can be set to be within 5 years of age deviation, or any recognition error in gender and race, for example, male is recognized as female, caucasian people are identified as yellow.

Further, the face attributes include: semantic attributes and/or orderliness attributes;

the semantic attributes include one or more of global attributes, local attributes, action attributes, and wear attributes.

The semantic attributes include one or more of overall attributes (e.g., race and gender), local attributes (e.g., whether there is a beard), motion attributes (e.g., expression and motion), and wearing attributes (e.g., whether glasses and a hat are worn), and the semantic attributes are attributes of the face that do not need to be sized or sized.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Example three:

in the third embodiment of the present application, a face attribute recognition apparatus is provided, and for convenience of description, only the relevant portions of the present application are shown, as shown in fig. 3, the face attribute recognition apparatus includes,

the face detection module 301 is configured to acquire an image to be detected, perform face detection and face alignment on the image to be detected, and obtain a face image to be recognized;

the attribute recognition module 302 is configured to input the facial image to be recognized into the trained convolutional neural network, so as to obtain a facial attribute recognition result of the facial image to be recognized.

Further, the face attribute recognition apparatus further includes:

the network training module is used for acquiring a face sample image, labeling each face attribute to be learned in the face sample image, inputting the labeled face sample image into an initial convolutional neural network, and training the initial convolutional neural network to obtain the trained convolutional neural network.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

Example four:

fig. 4 is a schematic diagram of a terminal device according to a fourth embodiment of the present application. As shown in fig. 4, the terminal device 40 of this embodiment includes: a processor 400, a memory 401 and a computer program 402 stored in said memory 401 and executable on said processor 400. The processor 400 executes the computer program 402 to implement the steps in the above-mentioned palm and its key point detection method embodiment, such as the steps S201 to S202 shown in fig. 2. Alternatively, the processor 400, when executing the computer program 402, implements the functions of each module/unit in the above-mentioned device embodiments, for example, the functions of the modules 301 to 302 shown in fig. 3.

Illustratively, the computer program 402 may be partitioned into one or more modules/units, which are stored in the memory 401 and executed by the processor 400 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 402 in the terminal device 40. For example, the computer program 402 may be divided into a face detection module and an attribute identification module, and each module has the following specific functions:

The terminal device 40 may be a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. The terminal device may include, but is not limited to, a processor 400, a memory 401. Those skilled in the art will appreciate that fig. 4 is merely an example of a terminal device 40 and does not constitute a limitation of terminal device 40 and may include more or fewer components than shown, or some components may be combined, or different components, for example, the terminal device may also include input output devices, network access devices, buses, etc.

The Processor 400 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 401 may be an internal storage unit of the terminal device 40, such as a hard disk or a memory of the terminal device 40. The memory 401 may also be an external storage device of the terminal device 40, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 40. Further, the memory 401 may also include both an internal storage unit and an external storage device of the terminal device 40. The memory 401 is used for storing the computer program and other programs and data required by the terminal device. The memory 401 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A convolutional neural network structure for face attribute recognition is characterized in that at least one deconvolution layer and at least one interception layer are arranged between a first convolutional layer and a last convolutional layer of the convolutional neural network, and a plurality of convolutional kernels with different sizes are adopted during convolution in the interception layer.

2. The convolutional neural network structure for face attribute recognition as claimed in claim 1, wherein the output layer of the convolutional neural network is provided with a loss function, wherein the face attributes are divided into semantic attributes and order attributes in advance, the semantic attributes correspond to a cross entropy loss function, and the order attributes correspond to a SmoothL1 loss function.

3. A face attribute recognition method, based on the convolutional neural network structure of any one of claims 1 or 2, comprising:

4. The face attribute recognition method of claim 1, wherein the convolutional neural network is trained by:

5. The face attribute recognition method of claim 1, wherein the face attributes include: semantic attributes and/or orderliness attributes;

6. A face attribute recognition apparatus, characterized in that the apparatus is based on the convolutional neural network structure of any one of claims 1 or 2, and comprises:

7. The face attribute recognition device of claim 6, further comprising:

8. The face attribute recognition apparatus of claim 6, wherein the face attributes comprise: semantic attributes and/or orderliness attributes;

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 3 to 5 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 3 to 5.