CN112766176B - Training method of lightweight convolutional neural network and face attribute recognition method - Google Patents

Training method of lightweight convolutional neural network and face attribute recognition method Download PDF

Info

Publication number
CN112766176B
CN112766176B CN202110085271.1A CN202110085271A CN112766176B CN 112766176 B CN112766176 B CN 112766176B CN 202110085271 A CN202110085271 A CN 202110085271A CN 112766176 B CN112766176 B CN 112766176B
Authority
CN
China
Prior art keywords
face attribute
channel
feature
layer
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110085271.1A
Other languages
Chinese (zh)
Other versions
CN112766176A (en
Inventor
闫潇宁
陈晓艳
郑双午
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Anruan Huishi Technology Co ltd
Shenzhen Anruan Technology Co Ltd
Original Assignee
Shenzhen Anruan Huishi Technology Co ltd
Shenzhen Anruan Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Anruan Huishi Technology Co ltd, Shenzhen Anruan Technology Co Ltd filed Critical Shenzhen Anruan Huishi Technology Co ltd
Priority to CN202110085271.1A priority Critical patent/CN112766176B/en
Publication of CN112766176A publication Critical patent/CN112766176A/en
Application granted granted Critical
Publication of CN112766176B publication Critical patent/CN112766176B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the technical field of image recognition, and provides a training method of a lightweight convolutional neural network and a face attribute recognition method, wherein the training method comprises the following steps: collecting a face attribute data set; constructing a convolutional neural network, wherein the convolutional neural network comprises a characteristic dimension reduction structure, a channel mixing structure and a characteristic classification structure, and the characteristic classification structure comprises a channel separation algorithm layer; carrying out standard convolution on the face attribute data, and inputting the processed face attribute feature map into the feature dimension reduction structure for feature dimension reduction training; inputting the face attribute dimension reduction feature map output by training to the channel mixing structure to fuse dimensions among channels; and inputting the output human face attribute channel fusion feature map to the channel separation algorithm layer to split the channel dimension, and setting a full connection layer according to the splitting result to obtain the target convolutional neural network model. The invention can improve the accuracy of attribute identification, has small self parameters and high running speed.

Description

Training method of lightweight convolutional neural network and face attribute recognition method
Technical Field
The invention relates to the technical field of image recognition, in particular to a training method of a lightweight convolutional neural network and a face attribute recognition method.
Background
In recent years, face attribute recognition is increasingly widely applied in a plurality of fields, and an accurate and efficient face attribute recognition result has important auxiliary effects in the aspects of assisting law enforcement personnel in detecting cases, improving human-computer interaction experience and the like.
In multi-category identification based on convolutional neural networks, how to improve efficient sharing of useful information among categories and reduce interference of invalid information is always a technical problem to be solved. Typically, in standard convolution, the matrix operation of the convolution kernel and the image pixel value is performed on each channel simultaneously, but after the packet convolution is adopted, the matrix operation of the convolution kernel and the image pixel value on each channel is performed on each channel, that is, the information exchange between the channels is blocked, which is beneficial to maintaining the characteristics of certain face attributes. However, continuous blocking of the inter-channel information will have an adverse effect on the overall characteristics that are ultimately used to determine various attribute information, and will have an effect on the accuracy of the recognition results of the various facial attributes. Therefore, the problem of low accuracy of the identification result exists in the prior art.
Disclosure of Invention
The embodiment of the invention provides a training method of a lightweight convolutional neural network, which can improve the accuracy of attribute identification results, reduce parameters of a constructed model and improve the running speed of the model.
In a first aspect, an embodiment of the present invention provides a training method for a lightweight convolutional neural network, where the method includes the following steps:
collecting a face attribute data set, wherein the face attribute data set comprises face attribute data;
constructing a convolutional neural network, wherein the convolutional neural network comprises a characteristic dimension reduction structure, a channel mixing structure and a characteristic classification structure, and the characteristic classification structure comprises a channel separation algorithm layer;
carrying out standard convolution on the face attribute data in the face attribute data set, and inputting a face attribute feature map obtained after standard convolution processing into the feature dimension reduction structure for feature dimension reduction training so as to output a face attribute dimension reduction feature map;
inputting the face attribute dimension reduction feature map to the constructed channel mixing structure to fuse dimensions among channels, and outputting a face attribute channel fusion feature map;
inputting the face attribute channel fusion feature map to the feature classification structure, splitting the channel dimension through the channel separation algorithm layer, and setting a full connection layer according to the splitting result to obtain a target convolutional neural network model.
In a second aspect, an embodiment of the present invention provides a face attribute identifying method, including the steps of:
acquiring data to be detected, wherein the data to be detected comprises face attribute data;
inputting the face attribute data in the data to be detected into the target convolutional neural network model in any embodiment to identify the face attribute;
and outputting the face attribute recognition result.
In a third aspect, an embodiment of the present invention provides a training apparatus for a lightweight convolutional neural network, including:
the acquisition module is used for acquiring a face attribute data set, wherein the face attribute data set comprises face attribute data;
the construction module is used for constructing a convolutional neural network, wherein the convolutional neural network comprises a characteristic dimension reduction structure, a channel mixing structure and a characteristic classification structure, and the characteristic classification structure comprises a channel separation algorithm layer;
the feature dimension reduction module is used for carrying out standard convolution on the face attribute data in the face attribute data set, inputting a face attribute feature map obtained after standard convolution processing into the feature dimension reduction structure for feature dimension reduction training so as to output a face attribute dimension reduction feature map;
the channel fusion module is used for inputting the face attribute dimension reduction feature map into the constructed channel mixing structure to fuse dimensions among channels and outputting a face attribute channel fusion feature map;
The feature classification module is used for inputting the face attribute channel fusion feature graph into the feature classification structure, splitting the channel dimension through the channel separation algorithm layer, and setting a full connection layer according to the splitting result to obtain the target convolutional neural network model.
In a fourth aspect, an embodiment of the present invention provides a face attribute identifying apparatus, including:
the acquisition module is used for acquiring data to be detected, wherein the data to be detected comprises face attribute data;
the recognition module is used for inputting the face attribute data in the data to be detected into the target convolutional neural network model in any embodiment to recognize the face attribute;
and the output module is used for outputting the face attribute identification result.
In a fifth aspect, an embodiment of the present invention provides an electronic device, including: the training method comprises the steps of a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the steps in the training method of the lightweight convolutional neural network provided by any embodiment are realized when the processor executes the computer program.
In a sixth aspect, an embodiment of the present invention provides a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements the steps in the training method of a lightweight convolutional neural network provided in any of the embodiments.
In the embodiment of the invention, the face attribute data set is acquired, and the face attribute data set comprises face attribute data; constructing a convolutional neural network, wherein the convolutional neural network comprises a characteristic dimension reduction structure, a channel mixing structure and a characteristic classification structure, and the characteristic classification structure comprises a channel separation algorithm layer; carrying out standard convolution on the face attribute data in the face attribute data set, and inputting a face attribute feature map obtained after standard convolution processing into the feature dimension reduction structure for feature dimension reduction training so as to output a face attribute dimension reduction feature map; inputting the face attribute dimension reduction feature map to the constructed channel mixing structure to fuse dimensions among channels, and outputting a face attribute channel fusion feature map; inputting the face attribute channel fusion feature map to the feature classification structure, splitting the channel dimension through the channel separation algorithm layer, and setting a full connection layer according to the splitting result to obtain a target convolutional neural network model. Because the channel separation algorithm layer is constructed in the feature classification structure constructed by the invention, channel dimension splitting can be carried out on the face attribute channel fusion feature map through the channel separation algorithm layer, which is beneficial to reassigning features fused with a plurality of attributes to a plurality of attribute tasks to support final decision classification, so that the extraction of more refined features of the multi-class attributes to support discrimination of each attribute task, and the obtained target convolutional neural network model can improve the accuracy of face attribute identification; meanwhile, the built convolutional neural network structure is small in parameter quantity, and the running speed can be improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a training method for a lightweight convolutional neural network provided by an embodiment of the present invention;
FIG. 2 is a flow chart of another training method for a lightweight convolutional neural network provided by an embodiment of the present invention;
FIG. 2a is a flowchart of acquiring a face attribute dataset according to an embodiment of the present invention;
FIG. 2b is a flowchart of a feature dimension reduction structure process provided by an embodiment of the present invention;
FIG. 3 is a flow chart of another training method for a lightweight convolutional neural network provided by an embodiment of the present invention;
FIG. 3a is a flow chart of a channel mixing structure process provided by an embodiment of the present invention;
FIG. 3b is a flow chart of a feature classification structure process provided by an embodiment of the invention;
fig. 4 is a schematic structural diagram of a training device for a lightweight convolutional neural network according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of another training apparatus for a lightweight convolutional neural network according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of another training apparatus for a lightweight convolutional neural network according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of another training apparatus for a lightweight convolutional neural network according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of another training apparatus for a lightweight convolutional neural network according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of another training apparatus for a lightweight convolutional neural network according to an embodiment of the present invention;
fig. 10 is a flowchart of a face attribute recognition method provided in the present embodiment;
fig. 11 is a schematic structural diagram of a face attribute identifying apparatus according to the present embodiment;
fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms "comprising" and "having" and any variations thereof in the description and claims of the application and in the description of the drawings are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or drawings are used for distinguishing between different objects and not for describing a particular sequential order. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
As shown in fig. 1, fig. 1 is a flowchart of a training method of a lightweight convolutional neural network according to an embodiment of the present application, where the training method of the lightweight convolutional neural network includes the following steps:
101. and collecting a face attribute data set, wherein the face attribute data set comprises face attribute data.
In this embodiment, the training method of the lightweight convolutional neural network provided may be multi-class face attribute recognition, and may be applied in cases of case investigation, personnel detection, human-computer interaction experience, and the like. Aiming at constructing a multi-task classification model by using the convolutional neural network method, the training method of the lightweight convolutional neural network can be used. The electronic equipment on which the training method of the lightweight convolutional neural network operates can be connected in a network manner through a wired connection manner or a wireless connection manner, so that the acquisition and communication transmission of face attribute data are realized. The Wireless connection may include, but is not limited to, a 3G/4G connection, a WiFi (Wireless-Fidelity) connection, a bluetooth connection, a WiMAX (Worldwide Interoperability for Microwave Access) connection, a Zigbee (low power lan protocol, also known as the purple peak protocol) connection, a UWB (ultra wideband) connection, and other now known or later developed Wireless connection.
The face attribute data set includes various types of face attribute data, for example: beard, eyebrow, eye, pupil, etc. The acquisition mode can be video recording through a camera, extracting video frame images of video data, or acquiring face attribute data offline or real-time acquisition. The collected face attribute data set can comprise multiple types of sub-classification data sets, namely, different types of data can be stored respectively, so that distinguishing and identification are facilitated.
102. And constructing a convolutional neural network, wherein the convolutional neural network comprises a characteristic dimension reduction structure, a channel mixing structure and a characteristic classification structure, and the characteristic classification structure comprises a channel separation algorithm layer.
Among them, convolutional neural network (Convolutional Neural Network, CNN) is a feedforward neural network whose artificial neurons can respond to surrounding cells in a part of coverage, and has excellent performance for large-scale image processing. Because CNN avoids complex pre-processing of images and can directly input original images, the CNN has wide application in fields such as pattern classification and recognition.
In constructing the convolutional neural network, it may include constructing three parts: the method comprises the steps of respectively constructing a feature dimension reduction structure, a channel mixing structure and a feature classification structure, and constructing a channel separation algorithm layer in the feature classification structure. The characteristic dimension reduction structure can be used for a channel for improving characteristics; the channel mixing structure can be used for fusing characteristic information among channels; the feature classification structure can be used for splitting the fused result according to the channel dimension. Because face attribute recognition has more definite correlation or independence among attributes, for example, compared with human attribute recognition, each attribute task of a face has more definite correlation or independence among attributes, for example: huzi and gender. Splitting the channel separation algorithm in the channel dimension facilitates reassigning features that merge attributes to each specific attribute task to support its final decision classification.
103. And carrying out standard convolution on the face attribute data in the face attribute data set, and inputting the face attribute feature map obtained after standard convolution processing into a feature dimension reduction structure for feature dimension reduction training so as to output the face attribute dimension reduction feature map.
The standard convolution may be input to the first standard convolution layer to perform convolution. The convolution kernel is a convolution of 3×3, and the number of channels of the input image (face attribute data) can be increased by performing a standard convolution before performing feature dimension reduction processing. After the input image is subjected to standard convolution with the convolution kernel of 3 multiplied by 3, the face attribute feature map can be output to the feature dimension reduction structure for feature dimension reduction processing. The feature dimension reduction structure includes a plurality of layers, and in the process of performing feature dimension reduction processing, the feature dimension reduction structure needs to be repeatedly executed for a plurality of times, for example: performed 3 times. The output of the last feature dimension reduction structure can be used as the input of the next feature dimension reduction structure. After multiple feature dimension reduction operations, finally, a face attribute dimension reduction feature map can be output, and the number of channels of the face attribute dimension reduction feature map after being output is improved, for example: and the feature dimension of the face attribute dimension reduction feature map is improved from a standard convolution of 1 multiplied by 1 to 96 channels through 48 channels.
104. And inputting the face attribute dimension reduction feature map into the constructed channel mixing structure to fuse dimensions among channels, and outputting the face attribute channel fusion feature map.
After the channel lifting is completed, the face attribute dimension reduction feature map can be continuously input into the channel mixing structure. At this time, the channel mixing structure can perform feature fusion among the channels on the face attribute dimension reduction feature map of the lifted channel. The above-mentioned channel fusion structure may also include multiple layers, and perform multiple channel fusion structure processing, in this embodiment, the channel fusion structure is set 3 times, that is, the channel fusion structure performs 3 times of fusion of dimensions between channels on the face attribute dimension-reducing feature map. At this time, features in other channels are fused in each channel in the output face attribute channel fusion feature map.
Specifically, the channel mixing algorithm principle adopted in the channel mixing structure is as follows: assuming that the number of channels of each group of features is n, firstly, converting each group of features reshape into a feature map F1 with m multiplied by n, then, performing transposition operation on the feature map F1 to obtain a feature map F2, then, splitting the feature map F2 into m groups again according to the rows, wherein the feature vector in the m groups comprises the feature information of each group in the m groups which are originally input, and therefore, feature fusion among all channels is realized.
105. Inputting the face attribute channel fusion feature map into a feature classification structure, splitting the channel dimension through a channel separation algorithm layer, and setting a full connection layer according to a splitting result to obtain the target convolutional neural network model.
The feature classification structure can split the features, and the split results are distributed to each attribute task. The attribute task may be a task created for classifying and identifying the received result. The feature classification structure may include a plurality of layers, each layer performing a different function. In particular, a layer of channel separation algorithms is built in the feature classification structure. The channel separation algorithm layer can split the face attribute channel fusion feature map according to the channel dimension, then output the split result to a plurality of full-connection layers, set the full-connection layer classifier, and finally output a model which is a target convolutional neural network model so as to predict the face attribute category.
Each node of the fully connected layer is connected with all nodes of the upper layer and is used for integrating the characteristic attributes extracted from the front edge. In the CNN structure, after passing through a plurality of convolution layers and pooling layers, 1 or more fully connected layers are connected. Each neuron in the fully connected layer is fully connected to all neurons in its previous layer. The fully connected layer may integrate characteristic attributes with class differentiation in the convolutional layer or the pooling layer. To improve CNN network performance, the excitation function of each neuron of the fully connected layer may employ a ReLU function. The last layer output value of the full connection layer is transferred to an output, and the output value is the face attribute type predicted value, and can be classified by softmax logistic regression (softmax regression), and other classification modes are also possible. The plurality of fully connected layers may be distinguished by identification sequence numbers, for example: the full link layers are labeled full link layer 1, full link layer 2, …, full link layer n.
In the embodiment of the invention, the face attribute data set is acquired, and the face attribute data set comprises face attribute data; constructing a convolutional neural network, wherein the convolutional neural network comprises a characteristic dimension reduction structure, a channel mixing structure and a characteristic classification structure, and the characteristic classification structure comprises a channel separation algorithm layer; carrying out standard convolution on face attribute data in the face attribute data set, and inputting a face attribute feature map obtained after standard convolution processing into a feature dimension reduction structure for feature dimension reduction training so as to output a face attribute dimension reduction feature map; inputting the face attribute dimension reduction feature map into the constructed channel mixing structure to fuse dimensions among channels, and outputting the face attribute channel fusion feature map; inputting the face attribute channel fusion feature map into a feature classification structure, splitting the channel dimension through a channel separation algorithm layer, and setting a full connection layer according to a splitting result to obtain the target convolutional neural network model. Because the channel separation algorithm layer is constructed in the feature classification structure constructed by the invention, channel dimension splitting can be carried out on the face attribute channel fusion feature map through the channel separation algorithm layer, which is beneficial to reassigning the features fused with a plurality of attributes to a plurality of attribute tasks to support final decision classification, so that the extraction of more refined features of the multi-class attributes to support discrimination of each attribute task, and the obtained target convolutional neural network model can improve the accuracy of face attribute identification; meanwhile, the built convolutional neural network structure is small in parameter quantity, and the running speed can be improved.
As shown in fig. 2, fig. 2 is a flowchart of another training method of a lightweight convolutional neural network according to an embodiment of the present invention, which specifically includes the following steps:
201. and collecting a face attribute data set, wherein the face attribute data set comprises face attribute data.
The image acquisition device is used for acquiring pedestrian data, extracting face attribute data corresponding to the face attribute data of the pedestrian, and classifying and collecting the acquired face attribute data of a plurality of people to generate a face attribute data set.
Optionally, referring to fig. 2a, step 201 may specifically include:
and acquiring video data through the image acquisition equipment, and extracting video frames of the video data.
The image acquisition device may be a camera, or an electronic device with a camera or the like capable of acquiring images. The video data may include image data of pedestrians, roads, and the like. The video is composed of images of one frame and one frame, so that the image data to be grabbed can be grabbed in a video frame extraction mode.
And identifying pedestrians in the video frames, and marking pedestrian attribute images of the pedestrians, wherein the pedestrian attribute images comprise face attribute data.
After the video frame is extracted, the video frame can be identified, and pedestrians in the video frame are detected. The pedestrian may be an upper locked pedestrian requiring marking. Meanwhile, the pedestrian attribute of the pedestrian in the video frame is marked, specifically, the face attribute in the pedestrian attribute (the pedestrian attribute required for marking) may be marked, for example: the facial attributes of the nose, eyes, mouth, eyebrows, cheeks, etc. are marked. The marking can be automatic machine marking or marking by a marking person.
And dividing the marked face attribute data to obtain a face attribute data set.
After marking, the face attribute data of different types can be classified, and the face attribute data of each type is used as a class, so that a plurality of classes can be obtained, and all the classes are collected to obtain the face attribute data set. And in the face attribute data set, different classes of the face attribute data set can be divided into a training set, a verification set and a test set. In the CNN model of this embodiment, the training set, the verification set and the test set are extremely divided by the face attribute data, so that the model can be trained on the training set, the model can be evaluated on the verification set, and the number of layers and the number of neurons in each layer of the CNN model are considered to be set.
202. And constructing a convolutional neural network, wherein the convolutional neural network comprises a characteristic dimension reduction structure, a channel mixing structure and a characteristic classification structure, and the characteristic classification structure comprises a channel separation algorithm layer.
203. And carrying out standardization processing on the face attribute data in the face attribute data set.
The normalization may include performing a normalization on the size of the face attribute data, and in this embodiment, the size of the face attribute data input into the feature dimension-reduction structure needs to be normalized to 112×112, that is, the resolution of the image.
204. And carrying out standard convolution calculation on the face attribute data subjected to the standardization processing through a preset first standard convolution layer to obtain a face attribute feature map.
After face attribute data of 112×112 are obtained, a layer of convolution kernel is subjected to standard convolution operation of 3×3, so that the channel number is increased. The number of channels output by the first standard convolution layer may be an integer multiple of the number of packets of the first packet convolution layer of the first layer in the feature dimension reduction structure. After the channel is lifted by the first standard convolution layer, the output characteristic diagram is the face attribute characteristic diagram. In this embodiment, taking 12 as an example, the face attribute feature map is output as 12 channels after standard convolution by the first standard convolution layer, and the resolution is 112×112.
205. And inputting the face attribute feature map to a feature dimension reduction structure for carrying out feature dimension reduction training for a plurality of times so as to output the face attribute dimension reduction feature map.
In this embodiment, the feature dimension reduction structure may include 3 layers, and the feature dimension reduction training is performed for multiple times, that is, 3 times. The output of the upper layer is used as the input of the lower layer, and the face attribute dimension reduction feature map is output to the channel fusion structure to be used as the input after the dimension reduction of the 3-layer features is completed.
Optionally, referring to fig. 2b, the feature dimension reduction structure includes a first packet convolution layer, a second standard convolution layer, and a first average pooling layer, where step 205 specifically includes:
inputting the face attribute feature map into a first grouping convolution layer, a second standard convolution layer and a first average pooling layer in sequence to perform feature dimension reduction training; and repeatedly executing the steps of inputting the face attribute feature map to the first grouping convolution layer, the second standard convolution layer and the first average pooling layer in sequence to perform feature dimension reduction training until the face attribute dimension reduction feature map is output.
The characteristic dimension reduction structure comprises a first grouping convolution layer, a second standard convolution layer of 1 multiplied by 1 and a first average pooling layer. And repeatedly executing for 3 times, sequentially inputting the face attribute feature map to the first grouping convolution layer, the second standard convolution layer and the first average pooling layer for feature dimension reduction training, and then outputting the face attribute dimension reduction feature map. The face attribute feature map is used as an input image of a 1×1 second standard convolution layer, the feature dimension of the input image can be improved after convolution operation, the number of feature dimensions output each time is set to be 24, 48 and 96, and the number of channels output by the second standard convolution layer is an integer multiple of the number of groups of the first group convolution layer in the feature dimension reduction structure, so that the number of groups each time is 12, 24 and 48. The feature dimension of the output face attribute dimension reduction feature map is improved from 1×1 standard convolution to 96 channels through 48 channels, and the resolution of the feature map on each channel is 14×14, namely the face attribute dimension reduction feature map is output as 96×14×14.
206. And inputting the face attribute dimension reduction feature map into the constructed channel mixing structure to fuse dimensions among channels, and outputting the face attribute channel fusion feature map.
207. Inputting the face attribute channel fusion feature map into a feature classification structure, splitting the channel dimension through a channel separation algorithm layer, and setting a full connection layer according to a splitting result to obtain the target convolutional neural network model.
In this embodiment, the number of channels may be increased by performing normalization processing on the size of the input face attribute data, and then passing through a first standard convolution layer with a convolution kernel of 3×3; and the channel is lifted after the characteristic dimension reduction structure is processed for a plurality of times. In addition, a first grouping convolution layer, a second standard convolution layer of 1 multiplied by 1 and a first average pooling layer are constructed in the feature dimension reduction structure, the independence of the features of the attribute tasks can be controlled through the first grouping convolution layer, and the sharing of information can be controlled through the convolution kernel of 1 multiplied by 1.
As shown in fig. 3, fig. 3 is a flowchart of another training method of a lightweight convolutional neural network according to an embodiment of the present invention, which specifically includes the following steps:
301. and collecting a face attribute data set, wherein the face attribute data set comprises face attribute data.
302. And constructing a convolutional neural network, wherein the convolutional neural network comprises a characteristic dimension reduction structure, a channel mixing structure and a characteristic classification structure, and the characteristic classification structure comprises a channel separation algorithm layer.
303. And carrying out standard convolution on the face attribute data in the face attribute data set, and inputting the face attribute feature map obtained after standard convolution processing into a feature dimension reduction structure for feature dimension reduction training so as to output the face attribute dimension reduction feature map.
304. The channel mixing structure comprises a plurality of second grouping convolution layers and a channel mixing layer, wherein the face attribute dimension-reducing feature images are sequentially input into the plurality of second grouping convolution layers so as to sequentially extract face attribute features of the face attribute dimension-reducing feature images, and the channel mixing layer is used for fusing dimensions among the channels of the face attribute features.
In this embodiment, referring to fig. 3a, the channel mixing structure is provided with 3 groups in total, and 2 second packet convolution layers and 1 channel mixing layer are built in each group of channel mixing structure, and the number of packets corresponding to the 2 second packet convolution layers is consistent and is set to 12 groups. The face attribute dimension reduction feature map is output as an input image of 96×14×14 as a channel mixed structure, and is input to a second packet convolution layer of a first layer of the channel mixed structure first, the output of the second packet convolution layer of the first layer is used as the input of a second packet convolution layer of the second layer, the feature information in each group can be extracted through 2-layer packet convolution, and the feature information can be information without inter-group information sharing. The output of the second packet convolution layer of the second layer is used as the input of the channel mixing layer, and the relatively independent sets of information can be mixed in the channel dimension so as to realize the information sharing among channels. Furthermore, in order to avoid the problem of gradient extinction during training, residual branches may be provided in the channel blend structure.
305. And repeatedly executing the steps of sequentially inputting the face attribute dimension-reducing feature images into the multi-layer second packet convolution layer to sequentially extract the face attribute features of the face attribute dimension-reducing feature images, and fusing the dimensions of the face attribute features among channels through the channel mixing layer until the face attribute channel fusion feature images are output.
The output of the first channel mixed layer is used as the output of the second packet convolution layer of the first layer in the second group of channel mixed structures, the output is sequentially executed until training of the 3 groups of channel mixed structures is completed, and finally the face attribute channel fusion feature map is output.
306. The feature classification structure further comprises a third grouping convolution layer and a second averaging and pooling layer, the face attribute channel fusion feature map is input into the second averaging and pooling layer, and the resolution of the face attribute channel fusion feature map is reduced.
The feature classification structure includes a third packet convolution layer and a second averaging pool layer in addition to the channel separation algorithm layer, as shown in fig. 3 b. The face attribute channel fusion feature map is used as an input image of a second average pooling layer, and the resolution of the face attribute channel fusion feature map is reduced through processing of the second average pooling layer.
307. And inputting the face attribute channel fusion feature map with reduced resolution into a third grouping convolution layer for grouping convolution processing, and splitting the face attribute channel fusion feature map with inter-channel dimension fusion to a plurality of attribute recognition tasks again.
308. Splitting the face attribute channel fusion feature map subjected to the grouping convolution processing in the channel dimension through a channel separation algorithm layer, and setting a full connection layer according to a splitting result to obtain a target convolution neural network model.
The third grouping convolution layer is provided with a channel separation algorithm layer, and the grouping convolution result can be split according to the channel dimension. Specifically, the face attribute channel fusion feature map after the grouping convolution processing is a stack of arrays, and includes a plurality of channel dimensions, where each dimension may represent a different parameter, for example: the first dimension is the number of pictures, the second dimension is the number of layers output by the grouping convolution, the third dimension is the length of the pictures, the fourth dimension is the width of the pictures, and each layer after the grouping convolution is provided with an array. The splitting of the face attribute channel fusion feature map according to the channel dimension by the channel separation algorithm may refer to splitting a stack of complete arrays on different channel dimensions, where the complete arrays may be a cube array. After splitting, a plurality of splitting results are obtained, and each splitting result includes part of data in a complete array, for example: splitting the complete array according to the specified splitting direction and the specified splitting dimension number, and obtaining 6 arrays when the complete array has 6 layers and is specified to be split layer by layer; designating two layers to split to obtain 3 arrays, designating three layers to split to obtain 2 arrays. When the model is split, the split direction and the split dimension are preset during model training, and can be transverse split, longitudinal split and the like. And each splitting result corresponds to a full-connection layer, the full-connection layer can set up a classifier of the full-connection layer according to the corresponding splitting result,
Specifically, the channel separation algorithm layer is used for splitting the face attribute channel fusion feature map after the grouping convolution processing in the channel dimension, so that the face attribute features fused with the attributes can be distributed to each specific attribute task. Considering that not all data in each dimension in a complete array will work effectively for each specific attribute task, splitting in the channel dimension is performed. And the split result comprises a plurality of arrays, each split array can correspond to a specific attribute task, and interference of other irrelevant data in the complete array is not required. Finally, a plurality of full-connection layer classifiers can be set for the separated result and attribute task of each channel, and finally the target convolutional neural network model with higher recognition accuracy is obtained. The target convolutional neural network model is used for realizing the face attribute type prediction, and the average recognition accuracy of 92% can be realized on a self-constructed 7-class face attribute test set through data testing.
In the embodiment of the invention, the two second grouping convolution layers and the channel mixing layer are arranged in the channel mixing structure to fuse all the features in the face attribute dimension reduction feature map, so that the sharing of the face attribute information is realized. In addition, a third grouping convolution layer and a second averaging pooling layer are also constructed in the feature classification structure, the resolution of the face attribute channel fusion feature map is reduced through the second averaging pooling layer, the face attribute channel fusion feature map is input into the third grouping convolution layer, the face attribute channel fusion feature map with the inter-channel dimensions fused is split again, a channel separation algorithm layer is arranged behind the third grouping convolution layer, the grouping convolution result can be split according to the channel dimensions and finally distributed to a plurality of attribute identification tasks, the face attribute features fused with each attribute are favorably distributed to each specific attribute task to support the final decision classification, the more refined features are extracted to support the discrimination of each attribute task, and the obtained target convolutional neural network model can improve the accuracy of attribute identification; meanwhile, the built convolutional neural network structure has small parameter quantity and high running speed.
As shown in fig. 4, fig. 4 is a schematic structural diagram of a training device for a lightweight convolutional neural network according to an embodiment of the present invention, where a training device 400 for a lightweight convolutional neural network includes:
the acquisition module 401 is configured to acquire a face attribute data set, where the face attribute data set includes face attribute data;
the construction module 402 is configured to construct a convolutional neural network, where the convolutional neural network includes a feature dimension reduction structure, a channel mixing structure, and a feature classification structure, and the feature classification structure includes a channel separation algorithm layer;
the feature dimension reduction module 403 is configured to perform standard convolution on face attribute data in the face attribute data set, and input a face attribute feature map obtained after standard convolution processing to a feature dimension reduction structure to perform feature dimension reduction training, so as to output a face attribute dimension reduction feature map;
the channel fusion module 404 is configured to input the face attribute dimension reduction feature map to the constructed channel mixing structure to perform fusion of dimensions among channels, and output the face attribute channel fusion feature map;
the feature classification module 405 is configured to input the face attribute channel fusion feature map to a feature classification structure, split the channel dimension through a channel separation algorithm layer, and set a full connection layer according to the splitting result, so as to obtain a target convolutional neural network model.
Optionally, as shown in fig. 5, fig. 5 is a schematic structural diagram of another training device for a lightweight convolutional neural network according to an embodiment of the present invention, and the acquisition module 401 includes:
an acquisition unit 4011 for acquiring video data through an image acquisition device, and extracting video frames of the video data;
a marking unit 4012 for identifying pedestrians in the video frame and marking pedestrian attribute images of the pedestrians, the pedestrian attribute images including face attribute data;
the dividing unit 4013 is configured to divide the face attribute data after marking, and obtain a face attribute data set.
Optionally, as shown in fig. 6, fig. 6 is a schematic structural diagram of another training device for a lightweight convolutional neural network according to an embodiment of the present invention, where the feature dimension reduction module 403 includes:
a normalization processing unit 4031, configured to perform normalization processing on face attribute data in the face attribute data set;
the channel convolution unit 4032 is configured to perform standard convolution calculation on the face attribute data after the normalization processing through a preset first standard convolution layer to obtain a face attribute feature map;
the dimension-reduction training unit 4033 is configured to input the face attribute feature map to the feature dimension-reduction structure for performing multiple feature dimension-reduction training to output the face attribute dimension-reduction feature map.
Optionally, the feature dimension reduction structure includes a first packet convolutional layer, a second standard convolutional layer and a first average pooling layer, as shown in fig. 7, fig. 7 is a schematic structural diagram of another training device for a lightweight convolutional neural network according to an embodiment of the present invention, and the dimension reduction training unit 4033 includes:
the dimension reduction training subunit 40331 is configured to sequentially input the face attribute feature map to the first packet convolution layer, the second standard convolution layer, and the first average pooling layer for feature dimension reduction training;
and the repeated dimension reduction subunit 40332 is configured to repeatedly perform the step of sequentially inputting the face attribute feature map to the first packet convolution layer, the second standard convolution layer, and the first average pooling layer for feature dimension reduction training until the face attribute dimension reduction feature map is output.
Optionally, the channel mixing structure includes a plurality of second packet convolutional layers and a channel mixing layer, as shown in fig. 8, fig. 8 is a schematic structural diagram of another training device for a lightweight convolutional neural network according to an embodiment of the present invention, and the channel fusion module 404 includes:
the extracting unit 4041 is configured to sequentially input the face attribute dimension-reduction feature map to a multi-layer second packet convolution layer, so as to sequentially extract face attribute features of the face attribute dimension-reduction feature map, and fuse dimensions of the face attribute features between channels through the channel mixing layer, where the number of packets corresponding to the multi-layer second packet convolution layer is consistent;
And a repeated fusion unit 4042, configured to repeatedly perform the step of sequentially inputting the feature map of the face attribute dimension reduction to the multi-layer second packet convolution layer, so as to sequentially extract the face attribute features of the feature map of the face attribute dimension reduction, and perform the step of fusing dimensions between channels on the face attribute features through the channel mixing layer until the feature map of the face attribute channel fusion is output.
Optionally, the feature classification structure further includes a third packet convolution layer and a second average pooling layer, as shown in fig. 9, fig. 9 is a schematic structural diagram of another training device for a lightweight convolutional neural network according to an embodiment of the present invention, where the feature classification module 405 includes:
the resolution processing unit 4051 is configured to input the face attribute channel fusion feature map to the second average pooling layer, and reduce the resolution of the face attribute channel fusion feature map;
the distribution unit 4052 is configured to input the face attribute channel fusion feature map with reduced resolution into a third group convolution layer for group convolution processing, and split the face attribute channel fusion feature map with inter-channel dimensions fused to multiple attribute recognition tasks again;
the dimension splitting unit 4053 is configured to split the face attribute channel fusion feature map after the packet convolution processing in the channel dimension through the channel separation algorithm layer.
The training device for the lightweight convolutional neural network provided by the embodiment of the invention can realize various implementation modes in the training method embodiment of the lightweight convolutional neural network and has the corresponding beneficial effects, and in order to avoid repetition, the description is omitted here.
As shown in fig. 10, fig. 10 is a flowchart of a face attribute identifying method according to the present embodiment, and the face attribute identifying method includes the steps of:
1001. and obtaining data to be detected, wherein the data to be detected comprises face attribute data.
The data to be detected may be pedestrian data collected by a camera, or may be pedestrian data input by an upper layer. The data to be detected may be a face image, which may include a plurality of face attributes.
1002. And inputting the face attribute data in the data to be detected into the target convolutional neural network model in any embodiment to perform face attribute recognition.
The target convolutional neural network model is a trained model with high recognition rate, the data to be detected is input into the target convolutional neural network model in any embodiment to recognize the face attribute, and the face attribute of the data to be detected can be carefully analyzed to judge whether the pedestrian corresponding to the data to be detected is the pedestrian to be searched.
1003. And outputting the face attribute recognition result.
After the face attribute recognition result is output, whether the pedestrian corresponding to the data to be detected is the pedestrian to be searched or not can be judged.
The face attribute recognition method provided by the embodiment of the invention can be applied to the training method of the lightweight convolutional neural network, and can realize each implementation mode and corresponding beneficial effects in the training method embodiment of the lightweight convolutional neural network, and in order to avoid repetition, the description is omitted.
As shown in fig. 11, fig. 11 is a flowchart of a face attribute identifying apparatus provided in the present embodiment, and a face attribute identifying apparatus 1100 includes:
an obtaining module 1101, configured to obtain data to be detected, where the data to be detected includes face attribute data;
the recognition module 1102 is configured to input face attribute data in the data to be detected into the target convolutional neural network model in any embodiment to perform face attribute recognition;
the output module 1103 is configured to output the face attribute recognition result.
As shown in fig. 12, fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device 1200 includes: the method comprises the steps of a processor 1201, a memory 1202, a network interface 1203 and a computer program stored in the memory 1202 and capable of running on the processor 1201, wherein the steps in the training method of the lightweight convolutional neural network provided by the embodiment are realized when the processor 1201 executes the computer program.
Specifically, the processor 1201 is configured to perform the following steps:
collecting a face attribute data set, wherein the face attribute data set comprises face attribute data;
constructing a convolutional neural network, wherein the convolutional neural network comprises a characteristic dimension reduction structure, a channel mixing structure and a characteristic classification structure, and the characteristic classification structure comprises a channel separation algorithm layer;
carrying out standard convolution on face attribute data in the face attribute data set, and inputting a face attribute feature map obtained after standard convolution processing into a feature dimension reduction structure for feature dimension reduction training so as to output a face attribute dimension reduction feature map;
inputting the face attribute dimension reduction feature map into the constructed channel mixing structure to fuse dimensions among channels, and outputting the face attribute channel fusion feature map;
inputting the face attribute channel fusion feature map into a feature classification structure, splitting the channel dimension through a channel separation algorithm layer, and setting a full connection layer according to a splitting result to obtain the target convolutional neural network model.
Optionally, the step of collecting the face attribute data set performed by the processor 1201 includes:
acquiring video data through image acquisition equipment, and extracting video frames of the video data;
Identifying pedestrians in the video frames, and marking pedestrian attribute images of the pedestrians, wherein the pedestrian attribute images comprise face attribute data;
and dividing the marked face attribute data to obtain a face attribute data set.
Optionally, the step of performing standard convolution on face attribute data in the face attribute data set by the processor 1201, and inputting a face attribute feature map obtained after standard convolution processing to a feature dimension reduction structure to perform feature dimension reduction training includes:
carrying out standardization processing on face attribute data in the face attribute data set;
carrying out standard convolution calculation on the face attribute data subjected to standardization processing through a preset first standard convolution layer to obtain a face attribute feature map;
and inputting the face attribute feature map to a feature dimension reduction structure for carrying out feature dimension reduction training for a plurality of times so as to output the face attribute dimension reduction feature map.
Optionally, the feature dimension reduction structure includes a first packet convolution layer, a second standard convolution layer, and a first average pooling layer, and the step performed by the processor 1201 of inputting the face attribute feature map to the feature dimension reduction structure for performing multiple feature dimension reduction training includes:
inputting the face attribute feature map into a first grouping convolution layer, a second standard convolution layer and a first average pooling layer in sequence to perform feature dimension reduction training;
And repeatedly executing the steps of inputting the face attribute feature map to the first grouping convolution layer, the second standard convolution layer and the first average pooling layer in sequence to perform feature dimension reduction training until the face attribute dimension reduction feature map is output.
Optionally, the channel mixing structure includes a plurality of second packet convolution layers and a channel mixing layer, and the step performed by the processor 1201 of inputting the face attribute dimension reduction feature map to the constructed channel mixing structure to perform the fusion of dimensions between channels includes:
sequentially inputting the face attribute dimension-reducing feature images into a plurality of layers of second grouping convolution layers to sequentially extract face attribute features of the face attribute dimension-reducing feature images, and fusing dimensions among channels of the face attribute features through a channel mixing layer, wherein the grouping numbers corresponding to the plurality of layers of second grouping convolution layers are consistent;
and repeatedly executing the steps of sequentially inputting the face attribute dimension-reducing feature images into the multi-layer second packet convolution layer to sequentially extract the face attribute features of the face attribute dimension-reducing feature images, and fusing the dimensions of the face attribute features through the channel mixing layer until the face attribute channel fusion feature images are output.
Optionally, the feature classification structure further includes a third packet convolution layer and a second average pooling layer, and the step performed by the processor 1201 to input the face attribute channel fusion feature map to the feature classification structure, and split the channel dimension by the channel separation algorithm layer includes:
Inputting the face attribute channel fusion feature map to a second average pooling layer, and reducing the resolution of the face attribute channel fusion feature map;
inputting the face attribute channel fusion feature map with reduced resolution into a third grouping convolution layer for grouping convolution processing, and splitting the face attribute channel fusion feature map with inter-channel dimension fusion to a plurality of attribute recognition tasks again;
and splitting the face attribute channel fusion feature map subjected to the grouping convolution treatment in the channel dimension through a channel separation algorithm layer.
The electronic device 1200 provided by the embodiment of the present invention can implement each implementation manner in the training method embodiment of the lightweight convolutional neural network, and corresponding beneficial effects, and in order to avoid repetition, a detailed description is omitted here.
It should be noted that only 1201-1203 with components are shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the electronic device 1200 herein is a device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and its hardware includes, but is not limited to, a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Programmable gate array (FPGA), a digital processor (Digital Signal Processor, DSP), an embedded device, and the like.
Memory 1202 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 1202 may be an internal storage unit of the electronic device 1200, such as a hard disk or memory of the electronic device 1200. In other embodiments, the memory 1202 may also be an external storage device of the electronic device 1200, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1200. Of course, the memory 1202 may also include both internal storage units of the electronic device 1200 and external storage devices. In this embodiment, the memory 1202 is generally used to store an operating system installed in the electronic device 1200 and various application software, such as program codes of a training method of a lightweight convolutional neural network. Furthermore, the memory 1202 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 1201 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 1201 is generally used to control the overall operation of the electronic device 1200. In this embodiment, the processor 1201 is configured to execute a program code stored in the memory 1202 or process data, such as a program code for executing a training method of a lightweight convolutional neural network.
The network interface 1203 may include a wireless network interface or a wired network interface, which network interface 1203 is typically used to establish communication connections between the electronic device 1200 and other electronic devices.
The embodiment of the present invention further provides a computer readable storage medium, on which a computer program is stored, where the computer program when executed by the processor 1201 implements each process in the training method of the lightweight convolutional neural network provided by the embodiment, and the same technical effect can be achieved, so that repetition is avoided, and no further description is given here.
Those skilled in the art will appreciate that all or part of the process in implementing the training method of the lightweight convolutional neural network of the embodiments may be implemented by a computer program to instruct related hardware, and the program may be stored in a computer readable storage medium, and the program may include the process of the embodiments of the methods when executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM) or the like.
The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.

Claims (8)

1. The training method of the lightweight convolutional neural network is characterized by comprising the following steps of:
collecting a face attribute data set, wherein the face attribute data set comprises face attribute data;
constructing a convolutional neural network, wherein the convolutional neural network comprises a characteristic dimension reduction structure, a channel mixing structure and a characteristic classification structure, and the characteristic classification structure comprises a channel separation algorithm layer;
carrying out standard convolution on the face attribute data in the face attribute data set, and inputting a face attribute feature map obtained after standard convolution processing into the feature dimension reduction structure for feature dimension reduction training so as to output a face attribute dimension reduction feature map;
inputting the face attribute dimension reduction feature map to the constructed channel mixing structure to fuse dimensions among channels, and outputting a face attribute channel fusion feature map;
inputting the face attribute channel fusion feature map to the feature classification structure, splitting the channel dimension through the channel separation algorithm layer, and setting a full connection layer according to a splitting result to obtain a target convolutional neural network model;
The step of inputting the face attribute dimension reduction feature map to the constructed channel mixing structure to fuse dimensions among channels comprises the following steps: sequentially inputting the face attribute dimension-reducing feature images to a plurality of layers of the second grouping convolution layers to sequentially extract face attribute features of the face attribute dimension-reducing feature images, and fusing dimensions among channels of the face attribute features through the channel mixing layer, wherein the grouping numbers corresponding to the plurality of layers of the second grouping convolution layers are consistent; repeatedly executing the steps of sequentially inputting the face attribute dimension-reducing feature images to a plurality of layers of the second packet convolution layers to sequentially extract the face attribute features of the face attribute dimension-reducing feature images, and fusing dimensions among channels of the face attribute features through the channel mixing layer until the face attribute channel fusion feature images are output;
the feature classification structure further comprises a third grouping convolution layer and a second averaging pooling layer, the face attribute channel fusion feature map is input into the feature classification structure, and the step of splitting the channel dimension through the channel separation algorithm layer comprises the following steps: inputting the face attribute channel fusion feature map to the second average pooling layer, and reducing the resolution of the face attribute channel fusion feature map; inputting the face attribute channel fusion feature map with reduced resolution into the third grouping convolution layer for grouping convolution processing, and splitting the face attribute channel fusion feature map with inter-channel dimension fusion into a plurality of attribute recognition tasks again; and splitting the face attribute channel fusion feature map subjected to the grouping convolution processing in a channel dimension through the channel separation algorithm layer.
2. The training method of a lightweight convolutional neural network of claim 1, wherein the step of acquiring a face attribute dataset comprises:
acquiring video data through image acquisition equipment, and extracting video frames from the video data;
identifying pedestrians in the video frames, and marking pedestrian attribute images of the pedestrians, wherein the pedestrian attribute images comprise the face attribute data;
dividing the marked face attribute data to obtain the face attribute data set.
3. The training method of a lightweight convolutional neural network according to claim 1, wherein the step of performing standard convolution on the face attribute data in the face attribute data set, and inputting a face attribute feature map obtained after standard convolution processing to the feature dimension reduction structure to perform feature dimension reduction training comprises:
carrying out standardization processing on the face attribute data in the face attribute data set;
carrying out standard convolution calculation on the face attribute data subjected to standardization processing through a preset first standard convolution layer to obtain the face attribute feature map;
and inputting the face attribute feature map to the feature dimension reduction structure for carrying out multiple feature dimension reduction training so as to output the face attribute dimension reduction feature map.
4. A training method of a lightweight convolutional neural network as recited in claim 3, wherein the feature dimension reduction structure comprises a first packet convolutional layer, a second standard convolutional layer, and a first averaging pooling layer, and wherein the step of inputting the face attribute feature map into the feature dimension reduction structure for multiple feature dimension reduction training comprises:
inputting the face attribute feature map to the first grouping convolution layer, the second standard convolution layer and the first average pooling layer in sequence to perform feature dimension reduction training;
and repeatedly executing the step of carrying out feature dimension reduction training on the face attribute feature map which is sequentially input to the first grouping convolution layer, the second standard convolution layer and the first average pooling layer until the face attribute dimension reduction feature map is output.
5. The face attribute recognition method is characterized by comprising the following steps:
acquiring data to be detected, wherein the data to be detected comprises face attribute data;
inputting the face attribute data in the data to be detected into the target convolutional neural network model according to any one of claims 1-4 for face attribute recognition;
and outputting the face attribute recognition result.
6. A training device for a lightweight convolutional neural network, comprising:
the acquisition module is used for acquiring a face attribute data set, wherein the face attribute data set comprises face attribute data;
the construction module is used for constructing a convolutional neural network, wherein the convolutional neural network comprises a characteristic dimension reduction structure, a channel mixing structure and a characteristic classification structure, and the characteristic classification structure comprises a channel separation algorithm layer;
the feature dimension reduction module is used for carrying out standard convolution on the face attribute data in the face attribute data set, inputting a face attribute feature map obtained after standard convolution processing into the feature dimension reduction structure for feature dimension reduction training so as to output a face attribute dimension reduction feature map;
the channel fusion module is used for inputting the face attribute dimension reduction feature map into the constructed channel mixing structure to fuse dimensions among channels and outputting a face attribute channel fusion feature map;
the feature classification module is used for inputting the face attribute channel fusion feature diagram into the feature classification structure, splitting the channel dimension through the channel separation algorithm layer, and setting a full-connection layer according to the splitting result to obtain a target convolutional neural network model;
Wherein, the channel mixing structure includes a plurality of second packet convolution layers and a channel mixing layer, and the channel fusion module includes: the extraction unit is used for sequentially inputting the face attribute dimension-reduction feature images to a plurality of second grouping convolution layers so as to sequentially extract the face attribute features of the face attribute dimension-reduction feature images, and the channel mixing layer is used for carrying out the dimension fusion among the channels on the face attribute features, wherein the grouping numbers corresponding to the plurality of second grouping convolution layers are consistent; the repeated fusion unit is used for repeatedly executing the steps of sequentially inputting the face attribute dimension-reducing feature images to a plurality of layers of the second packet convolution layers so as to sequentially extract the face attribute features of the face attribute dimension-reducing feature images, and carrying out the inter-channel dimension fusion on the face attribute features through the channel mixing layer until the face attribute channel fusion feature images are output;
the feature classification structure further comprises a third grouping convolution layer and a second averaging pooling layer, and the feature classification module comprises: the resolution processing unit is used for inputting the face attribute channel fusion feature map to the second average pooling layer and reducing the resolution of the face attribute channel fusion feature map; the distribution unit is used for inputting the face attribute channel fusion feature map with reduced resolution into the third grouping convolution layer for grouping convolution processing, and splitting the face attribute channel fusion feature map with inter-channel dimension fusion into a plurality of attribute recognition tasks again; and the dimension splitting unit is used for splitting the face attribute channel fusion feature map subjected to the grouping convolution processing in the channel dimension through the channel separation algorithm layer.
7. A face attribute recognition apparatus, comprising:
the acquisition module is used for acquiring data to be detected, wherein the data to be detected comprises face attribute data;
the recognition module is used for inputting the face attribute data in the data to be detected into the target convolutional neural network model according to any one of claims 1-4 to recognize the face attribute;
and the output module is used for outputting the face attribute identification result.
8. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the training method of a lightweight convolutional neural network as claimed in any one of claims 1 to 4 when the computer program is executed.
CN202110085271.1A 2021-01-21 2021-01-21 Training method of lightweight convolutional neural network and face attribute recognition method Active CN112766176B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110085271.1A CN112766176B (en) 2021-01-21 2021-01-21 Training method of lightweight convolutional neural network and face attribute recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110085271.1A CN112766176B (en) 2021-01-21 2021-01-21 Training method of lightweight convolutional neural network and face attribute recognition method

Publications (2)

Publication Number Publication Date
CN112766176A CN112766176A (en) 2021-05-07
CN112766176B true CN112766176B (en) 2023-12-01

Family

ID=75702594

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110085271.1A Active CN112766176B (en) 2021-01-21 2021-01-21 Training method of lightweight convolutional neural network and face attribute recognition method

Country Status (1)

Country Link
CN (1) CN112766176B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115565051B (en) * 2022-11-15 2023-04-18 浙江芯昇电子技术有限公司 Lightweight face attribute recognition model training method, recognition method and device

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015177268A1 (en) * 2014-05-23 2015-11-26 Ventana Medical Systems, Inc. Systems and methods for detection of biological structures and/or patterns in images
CN107784654A (en) * 2016-08-26 2018-03-09 杭州海康威视数字技术股份有限公司 Image partition method, device and full convolutional network system
CN108647668A (en) * 2018-05-21 2018-10-12 北京亮亮视野科技有限公司 The construction method of multiple dimensioned lightweight Face datection model and the method for detecting human face based on the model
CN108830252A (en) * 2018-06-26 2018-11-16 哈尔滨工业大学 A kind of convolutional neural networks human motion recognition method of amalgamation of global space-time characteristic
CN109063666A (en) * 2018-08-14 2018-12-21 电子科技大学 The lightweight face identification method and system of convolution are separated based on depth
CN109446953A (en) * 2018-10-17 2019-03-08 福州大学 A kind of recognition methods again of the pedestrian based on lightweight convolutional neural networks
CN109961034A (en) * 2019-03-18 2019-07-02 西安电子科技大学 Video object detection method based on convolution gating cycle neural unit
CN109977812A (en) * 2019-03-12 2019-07-05 南京邮电大学 A kind of Vehicular video object detection method based on deep learning
CN110827260A (en) * 2019-11-04 2020-02-21 燕山大学 Cloth defect classification method based on LBP (local binary pattern) features and convolutional neural network
CN110929602A (en) * 2019-11-09 2020-03-27 北京工业大学 Foundation cloud picture cloud shape identification method based on convolutional neural network
CN111582044A (en) * 2020-04-15 2020-08-25 华南理工大学 Face recognition method based on convolutional neural network and attention model
CN112183295A (en) * 2020-09-23 2021-01-05 上海眼控科技股份有限公司 Pedestrian re-identification method and device, computer equipment and storage medium
WO2021003125A1 (en) * 2019-07-01 2021-01-07 Optimum Semiconductor Technologies Inc. Feedbackward decoder for parameter efficient semantic image segmentation

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015177268A1 (en) * 2014-05-23 2015-11-26 Ventana Medical Systems, Inc. Systems and methods for detection of biological structures and/or patterns in images
CN107784654A (en) * 2016-08-26 2018-03-09 杭州海康威视数字技术股份有限公司 Image partition method, device and full convolutional network system
CN108647668A (en) * 2018-05-21 2018-10-12 北京亮亮视野科技有限公司 The construction method of multiple dimensioned lightweight Face datection model and the method for detecting human face based on the model
CN108830252A (en) * 2018-06-26 2018-11-16 哈尔滨工业大学 A kind of convolutional neural networks human motion recognition method of amalgamation of global space-time characteristic
CN109063666A (en) * 2018-08-14 2018-12-21 电子科技大学 The lightweight face identification method and system of convolution are separated based on depth
CN109446953A (en) * 2018-10-17 2019-03-08 福州大学 A kind of recognition methods again of the pedestrian based on lightweight convolutional neural networks
CN109977812A (en) * 2019-03-12 2019-07-05 南京邮电大学 A kind of Vehicular video object detection method based on deep learning
CN109961034A (en) * 2019-03-18 2019-07-02 西安电子科技大学 Video object detection method based on convolution gating cycle neural unit
WO2021003125A1 (en) * 2019-07-01 2021-01-07 Optimum Semiconductor Technologies Inc. Feedbackward decoder for parameter efficient semantic image segmentation
CN110827260A (en) * 2019-11-04 2020-02-21 燕山大学 Cloth defect classification method based on LBP (local binary pattern) features and convolutional neural network
CN110929602A (en) * 2019-11-09 2020-03-27 北京工业大学 Foundation cloud picture cloud shape identification method based on convolutional neural network
CN111582044A (en) * 2020-04-15 2020-08-25 华南理工大学 Face recognition method based on convolutional neural network and attention model
CN112183295A (en) * 2020-09-23 2021-01-05 上海眼控科技股份有限公司 Pedestrian re-identification method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种新的正则化图像重建算法及参数优化;陈晓艳;《天津科技大学学报》;第29卷(第6期);第74-77页 *

Also Published As

Publication number Publication date
CN112766176A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
CN112418360B (en) Convolutional neural network training method, pedestrian attribute identification method and related equipment
Trnovszky et al. Animal recognition system based on convolutional neural network
CN108875624B (en) Face detection method based on multi-scale cascade dense connection neural network
US20220215227A1 (en) Neural Architecture Search Method, Image Processing Method And Apparatus, And Storage Medium
CN109711281A (en) A kind of pedestrian based on deep learning identifies again identifies fusion method with feature
EP3261017A1 (en) Image processing system to detect objects of interest
CN112990211B (en) Training method, image processing method and device for neural network
CN106951825A (en) A kind of quality of human face image assessment system and implementation method
CN109410184B (en) Live broadcast pornographic image detection method based on dense confrontation network semi-supervised learning
CN110222718B (en) Image processing method and device
EP3627379A1 (en) Methods for generating a deep neural net and for localising an object in an input image, deep neural net, computer program product, and computer-readable storage medium
US20100111375A1 (en) Method for Determining Atributes of Faces in Images
CN108960167A (en) Hair style recognition methods, device, computer readable storage medium and computer equipment
CN112464930A (en) Target detection network construction method, target detection method, device and storage medium
WO2023279799A1 (en) Object identification method and apparatus, and electronic system
CN109255339A (en) Classification method based on adaptive depth forest body gait energy diagram
CN112766176B (en) Training method of lightweight convolutional neural network and face attribute recognition method
CN113095199B (en) High-speed pedestrian identification method and device
CN104598898A (en) Aerially photographed image quick recognizing system and aerially photographed image quick recognizing method based on multi-task topology learning
CN113128308B (en) Pedestrian detection method, device, equipment and medium in port scene
CN113450297A (en) Fusion model construction method and system for infrared image and visible light image
CN112052816A (en) Human behavior prediction method and system based on adaptive graph convolution countermeasure network
Li et al. Incremental learning of infrared vehicle detection method based on SSD
CN115565146A (en) Perception model training method and system for acquiring aerial view characteristics based on self-encoder
Jewel et al. Bengali ethnicity recognition and gender classification using CNN & transfer learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PP01 Preservation of patent right

Effective date of registration: 20240109

Granted publication date: 20231201

PP01 Preservation of patent right