TW202115622A

TW202115622A - Face attribute recognition method, electronic device and computer-readable storage medium

Info

Publication number: TW202115622A
Application number: TW109133422A
Authority: TW
Inventors: 陳鄭豪; 徐靜; 朱烽; 趙瑞
Original assignee: 大陸商深圳市商湯科技有限公司
Priority date: 2019-09-30
Filing date: 2020-09-25
Publication date: 2021-04-16
Also published as: KR20220044828A; JP2022547595A; WO2021063056A1; CN111339813A; CN111339813B; TWI753588B

Abstract

The present application provides a face attribute recognition method, an electronic device and a computer-readable storage medium. The method comprises: acquiring n feature images, wherein n feature images are obtained after one or more feature extraction of face images, and N is a positive integer; the attribute features corresponding to each feature image in n feature images are obtained. Among them, the attribute feature is the feature used to represent the target face attribute in the feature image; according to the attribute feature, the attribute recognition result of the target face attribute in the face image is determined.

Description

Face attribute recognition method, electronic equipment and computer readable storage medium

本發明關於電腦視覺領域，尤其關於人臉屬性識別方法、電子設備和電腦可讀儲存介質。The present invention relates to the field of computer vision, and particularly relates to a face attribute recognition method, electronic equipment and computer-readable storage medium.

近年來，突發事件的防控與處置難度不斷加大，機場、海關或者邊防等重要關卡或者街道、商場等公共場所均安裝有監控攝影頭，通過人臉識別技術將攝影頭採集的人臉圖像進行屬性分析，可有效地協助安防工作人員應對突發事件。例如，通過人臉的表情屬性識別判斷一個人是否具有攻擊性，或者通過將採集的目標對象的人臉圖像與人臉圖像庫進行對比分析，從而獲得目標對象的身份資訊，方便安防工作人員能夠通過這些分析結果及時進行安防布控和危機處理。In recent years, the prevention, control and handling of emergencies have become more difficult. Important checkpoints such as airports, customs or border defenses, or public places such as streets, shopping malls, etc. have been installed with surveillance cameras. Face recognition technology is used to capture the faces of the cameras. Image attribute analysis can effectively assist security staff in responding to emergencies. For example, through the recognition of facial expression attributes to determine whether a person is aggressive, or by comparing and analyzing the collected face image of the target object with the face image database, the identity information of the target object can be obtained for the convenience of security staff Through these analysis results, security deployment and control and crisis management can be carried out in a timely manner.

本發明實施例提供一種人臉屬性識別方法、電子設備和電腦可讀儲存介質，能夠對人臉圖像的各個屬性進行高效準確的識別。The embodiments of the present invention provide a face attribute recognition method, an electronic device, and a computer-readable storage medium, which can efficiently and accurately recognize various attributes of a face image.

本發明實施例提供一種人臉屬性識別方法，包括：獲取n 個特徵圖像，其中，所述n 個特徵圖像是對人臉圖像進行一次或者多次特徵提取後得到的，n 為正整數；獲取所述n 個特徵圖像中的每一個特徵圖像對應的屬性特徵，其中，所述屬性特徵為所述特徵圖像中用於表徵目標人臉屬性的特徵；根據所述屬性特徵，確定所述人臉圖像的目標人臉屬性的屬性識別結果。An embodiment of the present invention provides a face attribute recognition method, including: acquiring n feature images, where the n feature images are obtained by performing feature extraction on the face image one or more times, and n is positive An integer; acquiring the attribute feature corresponding to each of the n feature images, where the attribute feature is a feature in the feature image that is used to characterize the attributes of the target face; according to the attribute feature To determine the attribute recognition result of the target face attribute of the face image.

上述人臉屬性識別方法，通過獲取特徵圖像以及特徵圖像對應的屬性特徵，確定人臉圖像中的目標人臉屬性的屬性識別結果。由於在不同的特徵圖像中獲取的屬性特徵均可以在不同程度上反映人臉屬性，所以利用不同特徵圖像對應的屬性特徵來預測人臉屬性，可以提高人臉屬性識別的準確性以及識別效率。The aforementioned face attribute recognition method determines the attribute recognition result of the target face attribute in the face image by acquiring the characteristic image and the attribute characteristics corresponding to the characteristic image. Since the attribute features obtained in different feature images can reflect the face attributes to varying degrees, the use of attribute features corresponding to different feature images to predict face attributes can improve the accuracy and recognition of face attributes. effectiveness.

在本發明的一些實施例中，所述獲取所述n 個特徵圖像中的每一個特徵圖像對應的屬性特徵，包括：將所述n 個特徵圖像中的每一個特徵圖像分別輸入對應的注意力模型，獲得所述每一個特徵圖像對應的注意力矩陣；根據所述注意力矩陣提取所述n 個特徵圖像中的每一個特徵圖像對應的屬性特徵。In some embodiments of the present invention, each of the feature attribute acquiring image feature corresponding to the n feature images, comprising: an image feature of each of the n feature images are input The corresponding attention model obtains the attention matrix corresponding to each of the feature images; and extracts the attribute feature corresponding to each feature image of the n feature images according to the attention matrix.

上述方法中，通過將特徵圖像輸入到多個注意力模型中，可以獲得該特徵圖像對應的多個注意力矩陣，其中一個注意力矩陣可以用於提取該特徵圖像中的一個人臉屬性特徵，並且一個人臉屬性特徵可用於表徵一個人臉屬性。因此，通過注意力模型以及得到的注意力矩陣可以提高單個人臉屬性的識別效果，從而提高人臉圖像中的所有人臉屬性的識別準確率。In the above method, by inputting a feature image into multiple attention models, multiple attention matrices corresponding to the feature image can be obtained, and one attention matrix can be used to extract a face attribute in the feature image Features, and a face attribute feature can be used to characterize a face attribute. Therefore, the recognition effect of a single face attribute can be improved through the attention model and the obtained attention matrix, thereby improving the recognition accuracy of all face attributes in the face image.

在本發明的一些實施例中，根據所述注意力矩陣提取所述n 個特徵圖像中的每一個特徵圖像對應的屬性特徵包括：將所述n 個特徵圖像中的每一個特徵圖像與對應的注意力矩陣進行乘法運算，獲得所述n 個特徵圖像中的每一個特徵圖像對應的屬性特徵。In some embodiments of the present invention, extracting attribute features corresponding to each of the n feature images according to the attention matrix includes: adding each feature image of the n feature images The image and the corresponding attention matrix are multiplied to obtain the attribute feature corresponding to each of the n feature images.

上述方法中，利用注意力機制將特徵圖像與對應的注意力矩陣進行乘法運算，可以得到特徵圖像對應的屬性特徵，這些屬性特徵可以將識別區域從整張圖像定位到感興趣的區域，從而濾除影響該屬性識別結果的其他雜訊等因素，提高屬性識別的精度以及識別效率。In the above method, the attention mechanism is used to multiply the feature image with the corresponding attention matrix, and the attribute features corresponding to the feature image can be obtained. These attribute features can locate the recognition area from the entire image to the area of interest , So as to filter out other noise and other factors that affect the attribute recognition result, and improve the accuracy and efficiency of attribute recognition.

在本發明的一些實施例中，所述根據所述屬性特徵，確定所述人臉圖像的目標人臉屬性的屬性識別結果包括：分別根據每一個所述屬性特徵，預測得到對應的識別結果；對所述識別結果進行加權平均計算，確定所述人臉圖像中目標人臉屬性的屬性識別結果。In some embodiments of the present invention, the determining the attribute recognition result of the target face attribute of the face image according to the attribute feature includes: respectively predicting and obtaining the corresponding recognition result according to each of the attribute characteristics ; Perform weighted average calculation on the recognition result to determine the attribute recognition result of the target face attribute in the face image.

上述方法中，對同一個屬性識別是通過將不同特徵圖像中對應的該屬性的屬性特徵進行加權平均計算，因此提高了屬性識別結果的精確度。In the above method, the same attribute recognition is performed by performing a weighted average calculation on the attribute features corresponding to the attribute in different feature images, thereby improving the accuracy of the attribute recognition result.

在本發明的一些實施例中，所述分別根據每一個所述屬性特徵，預測得到對應的識別結果包括：對所述屬性特徵中的每一個所述屬性特徵進行屬性預測處理，預測得到所述屬性特徵對應的所述識別結果。In some embodiments of the present invention, the predicting and obtaining the corresponding recognition result respectively according to each of the attribute characteristics includes: performing attribute prediction processing on each of the attribute characteristics, and predicting to obtain the The recognition result corresponding to the attribute feature.

上述方法中，對每一個屬性特徵分別進行屬性的預測處理，可以得到每一個屬性特徵中對應的屬性的識別結果，根據這些結果可以進一步得到不同屬性特徵對同一個屬性的識別結果的影響因數，也就是說，可以進一步判斷人臉圖像中需要深層次屬性特徵進行屬性識別的屬性以及淺層次特徵即可識別的屬性。In the above method, the attribute prediction processing is performed on each attribute feature separately, and the recognition result of the corresponding attribute in each attribute characteristic can be obtained. According to these results, the influence factors of different attribute characteristics on the recognition result of the same attribute can be further obtained. In other words, it is possible to further determine the attributes in the face image that require deep-level attribute features for attribute recognition and the attributes that can be recognized by shallow-level features.

在本發明的一些實施例中，所述預設的屬性預測處理包括全域平均池化處理。In some embodiments of the present invention, the preset attribute prediction processing includes global average pooling processing.

上述方法中，通過將每一個提取到的人臉屬性特徵進行全域平均池化，可以直接得到對應的識別結果，改善識別過程中的過擬合現象，同時還能提高識別效率，節省模型的運算時間和預算量。In the above method, by performing global average pooling on each extracted face attribute feature, the corresponding recognition result can be directly obtained, and the over-fitting phenomenon in the recognition process can be improved. At the same time, the recognition efficiency can be improved and the calculation of the model can be saved. The amount of time and budget.

在本發明的一些實施例中，在所述獲取n 個特徵圖像之前，所述方法還包括：獲取所述人臉圖像；通過特徵圖像提取模型對所述人臉圖像進行特徵提取，以獲得所述n 個特徵圖像。In some embodiments of the present invention, before the obtaining the n feature images, the method further includes: obtaining the face image; performing feature extraction on the face image through a feature image extraction model , To obtain the n feature images.

上述方法中，通過對獲取得到的人臉圖像進行特徵圖像的提取，可以初步過濾人臉圖像中的噪音，方便後續得到屬性特徵，提高人臉圖像的屬性識別準確性。In the above method, by extracting the feature image of the obtained face image, the noise in the face image can be preliminarily filtered, which facilitates the subsequent acquisition of attribute features, and improves the accuracy of the attribute recognition of the face image.

在本發明的一些實施例中，所述特徵圖像提取模型是通過人臉圖像樣本以及特徵圖像樣本訓練得到的，所述方法還包括：使用所述特徵圖像樣本以及對應的注意力矩陣樣本對神經網路進行訓練，獲得所述注意力模型。In some embodiments of the present invention, the feature image extraction model is obtained by training on face image samples and feature image samples, and the method further includes: using the feature image samples and the corresponding attention The matrix samples train the neural network to obtain the attention model.

上述方法中，通過利用大量的已知特徵圖像樣本以及已知注意力矩陣樣本，訓練得到注意力模型，可以提高注意力矩陣的精確度，同時還通過採集大量的人臉圖像樣本以及特徵圖像樣本，訓練得到特徵圖像提取模型，提高特徵提取的多尺度性，從而使得人臉屬性的識別準確率大大提高。In the above method, by using a large number of known feature image samples and known attention matrix samples, the attention model can be trained to improve the accuracy of the attention matrix. At the same time, it can also collect a large number of face image samples and features. Image samples are trained to obtain a feature image extraction model, which improves the multi-scale feature extraction, so that the recognition accuracy of face attributes is greatly improved.

在本發明的一些實施例中，所述n 個特徵圖像還包括所述人臉圖像。In some embodiments of the present invention, the n feature images further include the face image.

上述方法中，注意力模型實際就是一種特徵圖像提取的過程，因此可以直接將人臉圖像輸入注意力模型中進行計算，省去了特徵圖像提取模型對人臉圖像的特徵提取，提高了人臉屬性識別的效率。In the above method, the attention model is actually a process of feature image extraction, so the face image can be directly input into the attention model for calculation, eliminating the need for feature image extraction model to extract features of the face image. Improve the efficiency of face attribute recognition.

本發明實施例提供一種人臉屬性識別的裝置，包括：特徵圖像提取單元，配置為獲取n 個特徵圖像，其中，所述n 個特徵圖像是對人臉圖像進行一次或者多次特徵提取後得到的，n 為正整數；屬性特徵提取單元，配置為獲取所述n 個特徵圖像中每一個特徵圖像對應的屬性特徵，其中，所述屬性特徵為所述特徵圖像中用於表徵目標人臉屬性的特徵；人臉屬性識別單元，配置為根據所述屬性特徵，確定所述人臉圖像的目標人臉屬性的屬性識別結果。An embodiment of the present invention provides a face attribute recognition device, including: a feature image extraction unit configured to obtain n feature images, where the n feature images are performed on the face image one or more times N is a positive integer obtained after feature extraction; the attribute feature extraction unit is configured to obtain the attribute feature corresponding to each feature image in the n feature images, wherein the attribute feature is in the feature image A feature used to characterize the attributes of the target face; the face attribute recognition unit is configured to determine the attribute recognition result of the target face attribute of the face image according to the attribute characteristics.

本發明實施例提供一種儲存介質，所述儲存介質中儲存有電腦程式，所述電腦程式被硬體執行以實現上述方法。An embodiment of the present invention provides a storage medium in which a computer program is stored, and the computer program is executed by hardware to implement the above method.

本發明實施例提供一種電腦程式產品，當所述電腦程式產品被電腦讀取並執行時，上述方法將被執行。The embodiment of the present invention provides a computer program product, and the above method will be executed when the computer program product is read and executed by a computer.

本發明實施例提供一種電腦集群，包括：至少一個計算節點，每個計算節點包括處理器和記憶體，所述處理器通過執行所述記憶體中的代碼執行上述方法。An embodiment of the present invention provides a computer cluster including: at least one computing node, each computing node includes a processor and a memory, and the processor executes the foregoing method by executing code in the memory.

本發明實施例提供一種電腦非瞬態儲存介質，包括指令，當所述指令在電腦節點集群中的至少一個計算節點上運行時，使得所述計算節點集群執行上述方法。An embodiment of the present invention provides a computer non-transitory storage medium, including instructions, which when the instructions run on at least one computing node in a computer node cluster, cause the computing node cluster to execute the above method.

本發明實施例提供一種電子設備，包括處理器和記憶體，所述處理器執行所述記憶體中的代碼實現上述方法。An embodiment of the present invention provides an electronic device including a processor and a memory, and the processor executes the code in the memory to implement the above method.

以本發明的實施例部分使用的術語僅用於對本發明的具體實施例進行解釋，而非旨在限定本發明。The terms used in the embodiments of the present invention are only used to explain specific embodiments of the present invention, and are not intended to limit the present invention.

本發明實施例涉及的應用場景包括但不限於安防監控場景、手機娛樂場景，進一步還應用於表情識別、年齡識別等場景中。為了便於理解本發明實施例，這裡先說明本發明實施例主要涉及的應用場景：安防監控的場景。The application scenarios involved in the embodiments of the present invention include, but are not limited to, security monitoring scenarios, mobile phone entertainment scenarios, and are further applied to scenarios such as facial expression recognition and age recognition. In order to facilitate the understanding of the embodiments of the present invention, the application scenarios mainly involved in the embodiments of the present invention are first described here: security monitoring scenarios.

安防監控通常指的是應用光纖、同軸電纜或微波在其閉合的環路內傳輸視訊訊號，並從攝影到圖像顯示和記錄構成的獨立完整的系統。它能即時、形象、真實地反映被監控對象，不但可以延長觀察距離，還可以在惡劣環境下代替人工進行長時間監視，讓人們可以通過監控看到人眼無法看到的細微特徵，配合人臉識別設備還可以對監控區域的相關人員進行識別、警報和記錄，是維護社會安全的重要手段之一。Security monitoring usually refers to the use of optical fiber, coaxial cable or microwave to transmit video signals in its closed loop, and an independent and complete system from photography to image display and recording. It can reflect the monitored object in real time, vividly, and truthfully. It can not only extend the observation distance, but also replace manual long-term monitoring in harsh environments, allowing people to see subtle features that cannot be seen by the human eye through monitoring, and cooperate with people. Face recognition equipment can also identify, alert and record relevant personnel in the monitoring area, and is one of the important means to maintain social security.

在安防監控的場景中，通常通過人臉識別技術將攝影頭採集的目標人臉圖像進行多個屬性識別，綜合多個屬性識別的結果，確定目標人臉是否具有攻擊性、目標人臉對應的身份資訊等等。比如，先識別目標人臉圖像是否配戴帽子、是否背包、是否配戴面罩等等多個屬性，根據多個屬性識別的結果，結合一定的分析策略，確定目標人臉是否具有攻擊性；或者，先識別目標人臉圖像的眼睛、鼻子、嘴巴、痣、疤痕等等多個屬性，根據多個屬性識別的結果，結合一定的分析策略，識別目標人臉對應的身份資訊。應理解，上述舉例僅用於說明，本發明不作具體限定。In the security monitoring scene, the face image of the target collected by the camera is usually recognized by multiple attributes through face recognition technology, and the results of multiple attribute recognition are combined to determine whether the target face is aggressive and the target face corresponds Identity information and so on. For example, first identify whether the target face image wears a hat, whether a backpack, whether to wear a mask, etc., and based on the results of multiple attribute recognition, combined with certain analysis strategies, determine whether the target face is aggressive; Or, first identify the eyes, nose, mouth, moles, scars and other attributes of the target face image, and identify the identity information corresponding to the target face based on the results of multiple attribute recognition, combined with a certain analysis strategy. It should be understood that the above examples are only for illustration, and the present invention is not specifically limited.

相關技術中，多工卷積神經網路（Multi-Task Convolution Neural Net，MTCNN）常用於解決人臉屬性的識別問題，這種方法基於多分支網路共用特徵，可簡單有效地對人臉的各個獨立屬性進行識別。但是在實際應用過程中，並非所有屬性的識別均需要相同的特徵，例如，一些類似配戴耳環、眼鏡等屬性的識別需要較淺層特徵，而一些類似表情、性別等屬性的識別則需要深層次特徵，因此利用MTCNN進行人臉屬性識別時，可能會出現由於特徵不明顯而造成識別結果不精確的問題。In related technologies, Multi-Task Convolution Neural Net (MTCNN) is often used to solve the problem of face attribute recognition. This method is based on the shared features of multi-branch networks, which can easily and effectively Each independent attribute is identified. However, in the actual application process, not all attributes need to be identified with the same features. For example, some attributes like wearing earrings and glasses need shallow features, while some attributes like facial expressions and gender need deep features. Hierarchical features, so when using MTCNN for face attribute recognition, there may be problems with inaccurate recognition results due to unobvious features.

圖1為相關技術中一種基於MTCNN對人臉屬性進行識別的過程示意圖。由圖1可知，MTCNN包括輸入層110、共用特徵學習層120、全連接層130。其中，輸入層110配置為對輸入的人臉圖像進行圖像預處理，共用特徵學習層120配置為提取人臉圖像的屬性特徵，全連接層130配置為將共用對共用特徵學習層120輸出的人臉屬性特徵進行分類。需要說明的，共用語義特徵學習層120中包括一個或多個卷積層以及全連接層，配置為對人臉屬性特徵進行過濾以及優化。Figure 1 is a schematic diagram of a process of identifying face attributes based on MTCNN in related technologies. It can be seen from FIG. 1 that the MTCNN includes an input layer 110, a common feature learning layer 120, and a fully connected layer 130. Wherein, the input layer 110 is configured to perform image preprocessing on the input face image, the common feature learning layer 120 is configured to extract the attribute features of the face image, and the fully connected layer 130 is configured to share the common feature learning layer 120. The output face attribute features are classified. It should be noted that the shared semantic feature learning layer 120 includes one or more convolutional layers and a fully connected layer, which are configured to filter and optimize face attribute features.

具體實現中，對於一張輸入圖像（比如人臉圖像），MTCNN首先通過輸入層110對輸入圖像進行簡單圖像處理，例如，圖像二值化；然後通過共用特徵學習層120對輸入圖像進行特徵提取，其中，共用特徵學習層120中包括不同的卷積層和一個或者多個全連接層，在多層卷積層後增加一個全連接層，濾除不相關的屬性特徵，然後將濾除後的屬性特徵再經過多層卷積層的特徵提取，從而得到共用特徵，例如，將上述濾除後的屬性特徵再次經過10層卷積層的卷積操作，將第10層卷積層提取的特徵作為共用特徵；最後將該共用特徵輸入全連接層130，從而利用這個共用特徵進行不同的屬性識別任務。In specific implementation, for an input image (such as a face image), MTCNN first performs simple image processing on the input image through the input layer 110, for example, image binarization; and then through the shared feature learning layer 120 Feature extraction is performed on the input image. The shared feature learning layer 120 includes different convolutional layers and one or more fully connected layers. After the multi-layer convolutional layer, a fully connected layer is added to filter out irrelevant attribute features. The filtered attribute features are then extracted through the feature extraction of multiple convolutional layers to obtain shared features. For example, the filtered attribute features are again subjected to the convolution operation of the 10th convolutional layer to extract the features from the 10th convolutional layer. As a shared feature; finally, the shared feature is input to the fully connected layer 130, so that the shared feature is used to perform different attribute recognition tasks.

可以理解，上述人臉屬性識別過程中，MTCNN需要根據最後一層卷積層提取的共用特徵，來解決不同屬性的識別問題。這樣訓練出的神經網路，由於每種屬性識別都是根據同一個共用特徵來進行的，因此，其對圖片每個區域的特徵其實是等價處理的，而並不會對某個“區域”過多關注，這樣就會使得一些細微特徵，無法被神經網路精確識別，換句話說，並非所有的人臉屬性進行識別時所需要的特徵，都是經過多層卷積提取後得到的最後一層特徵。舉例來講，將目標對象的人臉圖像進行3次卷積操作可以得到3層特徵，在判斷目標對象是否配戴眼鏡的情況下，卷積神經網路可能只需要淺層次特徵（如2層特徵），即可得到該屬性的特徵，從而識別出目標對象是否配戴眼鏡；在判斷目標對象的種族的情況下，卷積神經網路可能需要深層次特徵（如4層特徵），才能識別出目標對象的種族。此時，利用3層特徵判斷目標對象是否配戴眼鏡時，可能由於特徵的過度提取造成表徵該屬性的特徵資訊丟失，從而影響屬性識別結果的精確度，而且降低屬性識別的效率；而利用3層特徵判斷目標對象的種族時，可能由於3層特徵無法完全表徵種族這個屬性，從而降低該屬性識別結果的精確度。It can be understood that in the above-mentioned face attribute recognition process, MTCNN needs to solve the recognition problem of different attributes based on the common features extracted by the last convolutional layer. The neural network trained in this way, because each attribute recognition is based on the same common feature, it actually treats the features of each area of the picture equivalently, and does not treat a certain "area" "Too much attention will make some subtle features unable to be accurately recognized by the neural network. In other words, not all the features required for recognition of face attributes are the last layer obtained after multi-layer convolution extraction. feature. For example, performing three convolution operations on the face image of the target object can obtain three layers of features. In the case of judging whether the target object wears glasses, the convolutional neural network may only need shallow features (such as 2-layer features), you can get the characteristics of this attribute, so as to identify whether the target object wears glasses; in the case of judging the race of the target object, the convolutional neural network may require deep-level features (such as 4-layer features), In order to identify the race of the target object. At this time, when using the three-layer feature to determine whether the target object wears glasses, the feature information that characterizes the attribute may be lost due to the excessive extraction of features, which affects the accuracy of the attribute recognition result and reduces the efficiency of attribute recognition; When determining the race of the target object by layer features, it may be that the three-layer feature cannot fully characterize the attribute of race, thereby reducing the accuracy of the attribute recognition result.

因此，鑒於上述安防監控領域人臉屬性識別的過程中，基於共用特徵進行屬性識別時，容易造成特徵丟失或者特徵不明顯等問題，本發明實施例提供了一種人臉屬性識別的方法，該方法通過將不同卷積層提取的特徵輸入不同的人臉屬性識別模型中，從而提高人臉屬性識別的精確度以及識別效率。Therefore, in view of the above-mentioned process of face attribute recognition in the security monitoring field, when attribute recognition is performed based on shared features, it is easy to cause problems such as feature loss or inconspicuous features. The embodiment of the present invention provides a method for face attribute recognition. By inputting features extracted from different convolutional layers into different face attribute recognition models, the accuracy and efficiency of face attribute recognition are improved.

圖2是本發明實施例提供的一種人臉屬性識別神經網路的結構示意圖，如圖2所示，本發明實施例提供的人臉屬性識別神經網路包括：輸入層210、卷積層220、注意力學習層230、全域平均池化層（Global Average Pooling，GAP）240以及識別層250。下面對本發明實施例提供的人臉屬性識別神經網路中的各個層進行詳細說明如下。Fig. 2 is a schematic structural diagram of a facial attribute recognition neural network provided by an embodiment of the present invention. As shown in Fig. 2, the facial attribute recognition neural network provided by an embodiment of the present invention includes: an input layer 210, a convolutional layer 220, An attention learning layer 230, a global average pooling layer (Global Average Pooling, GAP) 240, and a recognition layer 250. Hereinafter, each layer in the facial attribute recognition neural network provided by the embodiment of the present invention will be described in detail as follows.

輸入層210：Input layer 210:

將目標對象的人臉圖像

輸入至輸入層，輸出和輸入相等，即，不對輸入的人臉圖像進行任何處理。為了陳述簡便，此處假設經輸入層輸出的圖像與輸入圖像相等，但是，在實際應用中，可以在輸入層中設置歸一化處理模組或者其他圖像預處理模組，此處不作具體限定。The face image of the target object

Input to the input layer, the output is equal to the input, that is, no processing is performed on the input face image. For the sake of simplicity, it is assumed that the image output through the input layer is equal to the input image. However, in practical applications, a normalization processing module or other image preprocessing module can be set in the input layer. Here There is no specific limitation.

卷積層220。Convolutional layer 220.

將輸入層輸出的圖像作為卷積層的輸入，經過一個或者多個卷積層的卷積計算生成n 個特徵圖像，所述n 個特徵圖像分別是不同層次的特徵圖像，可以表徵人臉在不同層次的屬性特徵。為了方便陳述，本發明實施例提出“區塊”的概念，將卷積神經網路的卷積層分成多個區塊，每一個區塊可以包括一個或多個卷積核，每一個區塊輸出一個特徵圖像，換句話說，區塊B ₁ ,B ₂ ,…,B_n 分別可輸出特徵圖像I ₁ ,I ₂ ,…,I_n ，其中，n 是正整數。The input layer is an image convolution output as an input layer, or a plurality of convolutional convolutional layer is calculated to generate n feature images, wherein said n images are different levels of features of the image, it can be characterized al The attributes of the face at different levels. For ease of presentation, the embodiment of the present invention proposes the concept of "block", which divides the convolutional layer of the convolutional neural network into multiple blocks. Each block can include one or more convolution kernels. A feature image, in other words, blocks B ₁ , B ₂ ,..., B _n can output feature images I ₁ , I ₂ ,..., I _{n respectively} , where n is a positive integer.

特徵圖像I ₁ ,I ₂ ,…,I_n 可以是這樣得到的：以區塊B_i 輸出的特徵圖像I_i 為例，將區塊

輸出的特徵圖像I_i- ₁ 作為區塊B_i 的輸入，經過u 個卷積核

（l =1,2,…,u ）的卷積操作生成u 個特徵圖像

（l =1,2,…,u ），其中，

，i 、l 、u 均是正整數。將特徵圖像b_u 作為區塊B_i 輸出的特徵圖像I_i ，其中，特徵圖像b_u 是卷積得到的最後一個特徵圖像，也就是說，特徵圖像I_i 是將特徵圖像I_i- ₁ 經過u 次卷積操作提取到的特徵圖像。Wherein the image _{_{I 1, I 2, ...,}} I n thus obtained may be: to characterize the block B _i I _i output from the image, for example, the block

The output feature image I _i- _{1 is} used as the input of block B _i , after u convolution kernels

( L =1,2,..., u ) convolution operation generates u feature images

( L =1,2,…, u ), where

, I , l , u are all positive integers. The feature image b _u as the block B _i I _i output from the feature image, wherein the image feature b _u is the last convolution of the feature image, i.e., image I _i is the characteristic feature of FIG. Like the feature image extracted from I _i- ₁ after u times of convolution operation.

需要說明的是，特徵圖像I ₁ 是將原始人臉圖像輸入至區塊B ₁ 中得到的。另外，上述卷積層的各個參數中，卷積核

（包括元素、尺寸、步長等等）、偏置值

可以是人為根據需要提取的特徵或者輸入圖像的大小等等進行設置。以卷積核

為例，當需要提取的特徵是人臉屬性的特徵時，卷積核

的元素可以採用局部二值模式（Local Binary Patterns，LBP）運算元的元素，又例如，當輸入的特徵圖像I_i 比較大時，卷積核

的大小也可以比較大，反之，當輸入的特徵圖像I_i 比較小時，卷積核

的大小也可以比較小，又例如，當輸入的特徵圖像I_i 比較大時，卷積核

的步長也可以比較大，反之，當輸入的特徵圖像I_i 比較小時，卷積核

的步長也可以比較小。It should be noted that the characteristic image I ₁ is obtained by inputting the original face image into the block B _1. In addition, among the various parameters of the above convolutional layer, the convolution kernel

(Including element, size, step size, etc.), offset value

It can be artificially set according to the features that need to be extracted or the size of the input image, etc. Convolution kernel

For example, when the feature to be extracted is the feature of the face attribute, the convolution kernel

The elements of can use the elements of Local Binary Patterns (LBP) operands. For example, when the input feature image I _{i is} relatively large, the convolution kernel

The size of can also be relatively large. Conversely, when the input feature image I _{i is} relatively small, the convolution kernel

The size of can also be relatively small, for example, when the input feature image I _{i is} relatively large, the convolution kernel

The step size of can also be relatively large, on the contrary, when the input feature image I _{i is} relatively small, the convolution kernel

The step size can also be relatively small.

注意力學習層230。Attention learning layer 230.

將區塊B ₁ ,B ₂ ,…,B_n 輸出的n 個特徵圖像I ₁ ,I ₂ ,…,I_n 中的每一個特徵圖像作為注意力學習層的輸入，經過注意力學習層的學習後得到n ×m 個人臉屬性特徵，其中，每一個特徵圖像對應有m 個人臉屬性特徵，m 是正整數，m 表示需要提取的人臉屬性的個數。The block _{_{B 1, B 2, ...,}} n B _n characteristic output from the image I _1, I _2, ..., wherein each image I _n in the input layer as attention learning, attention learning through layer After learning, n × m face attribute features are obtained, where each feature image corresponds to m face attribute features, m is a positive integer, and m represents the number of face attributes that need to be extracted.

以特徵圖像I_i 提取的m 個人臉屬性特徵為例：將特徵圖像I_i 輸入注意力學習層中，經過m 個注意力模型的注意力學習生成m 個注意力矩陣V_ij （j =1,2,…,m ），其中，一個注意力矩陣對應一個人臉屬性，即一個注意力矩陣用於提取對應人臉屬性的屬性特徵，並且，j 是正整數。將特徵圖像I_i 與注意力矩陣V_ij 進行點乘運算，提取到特徵圖像I_i 對應的第j 個屬性的人臉屬性特徵A_ij 。每個注意力特徵的獲取過程如下公式（1-1）：

（1-1）；其中，Attention是一個注意力模型，

可以為特徵圖像I_i 與注意力矩陣V_ij 之間的映射關係，其中

可以通過大量的已知特徵圖像I 以及已知屬性的注意力矩陣進行訓練得到。為了陳述簡便，上面只陳述了特徵圖像I_i 的第j 個屬性的人臉屬性特徵A_ij 的提取，實際應用中，特徵圖像I_i 中的人臉屬性特徵A_i ₁ ,A_i ₂ ,…,A_im 的提取方式與特徵圖像I_i 的第j 個屬性的人臉屬性特徵A_ij 的提取方式類似，此處不再展開贅述。另外，特徵圖像I ₁ ,I ₂ ,…,I_n 中的第j 個屬性的人臉屬性特徵A _1j ,A _2j ,…,A_nj 的提取方式均與特徵圖像I_i 中第j 個屬性的人臉屬性特徵A_ij 的提取方式類似，此處不再展開贅述。 M facial attribute features to an image feature extraction as an example I _i: I _i input image feature learning attention layer through the m-th focus attention model learning attention generating the m matrix V _ij (j = 1,2,..., m ), where an attention matrix corresponds to a face attribute, that is, an attention matrix is used to extract the attribute characteristics of the corresponding face attribute, and j is a positive integer. The feature image I _i and the attention matrix V _{ij are} subjected to a dot product operation, and the face attribute feature A _ij of the j- th attribute corresponding to the feature image I _{i is} extracted. The acquisition process of each attention feature is as follows (1-1):

(1-1); Among them, Attention is an attention model,

It can be the mapping relationship between the feature image I _i and the attention matrix V _{ij, where}

It can be obtained by training a large number of known feature images I and attention matrices with known attributes. For the sake of simplicity, the above only stated the extraction of the face attribute feature A _ij of the j- th attribute of the feature image I _i . In practical applications, the face attribute features A _i ₁ , A _i ₂ in the feature image I _i ,..., The extraction method of A _im is similar to the extraction method of the face attribute feature A _ij of the j- th attribute of the feature image I _{i, and will not be repeated here.} In addition, the face attribute features A _{1 j} , A _{2 j} ,..., A _nj of the j- th attribute in the feature images I ₁ , I ₂ ,..., I _n are all extracted in the same way as the face attribute features in the feature image I _i . The method for extracting the face attribute features A _ij of j attributes is similar, and will not be repeated here.

全域平均池化層240：Global average pooling layer 240:

將注意力學習層輸出的n ×m 個人臉屬性特徵A ₁₁ ，A ₁₂ ，…，A _1m ，A ₂₁ ，A ₂₂ ，…，A _2m ，..，A_n ₁ ，A_n ₂ ，…，A_nm 作為全域平均池化層的輸入，經過池化視窗進行池化之後，得到n ×m 個識別結果P_ij ，其中，每個識別結果P_ij 的獲取過程如下公式（1-2）：

（1-2）；這裡，P_ij 表示特徵圖像I_i 的第j 個屬性的預測結果，GAP表示全域平均池化。The n × m face attribute features A ₁₁ , A ₁₂ ,..., A _{1 m} , A ₂₁ , A ₂₂ ,..., A _{2 m} ,..., A _n ₁ , A _n ₂ ,... , a _nm as average input gamut pooled layer, followed by pooling window through pooled to give n × m recognition results P _ij, wherein P _ij of each of the recognition process of obtaining the following equation (1-2):

(1-2); Here, P _ij represents the prediction result of the j- th attribute of the feature image I _i , and GAP represents global average pooling.

識別層250。Recognition layer 250.

將全域平均池化層輸出的n ×m 個識別結果P_ij 按照屬性分類為第一個屬性的識別結果P ₁₁ ，P ₂₁ ，…，P_n ₁ ，第二個屬性的識別結果P ₁₂ ，P ₂₂ ，…，P_n ₂ ，...，第m個屬性的識別結果P _1m ，P _2m ，…，P_nm ，對每一個屬性的識別結果進行加權平均計算，從而得到人臉圖像中該屬性的識別結果。 The n × m recognition results P _ij output by the global average pooling layer are classified into the recognition results of the first attribute P ₁₁ , P ₂₁ ,..., P _n _{1 according to} attributes, and the recognition results of the second attribute P ₁₂ , P ₂₂ ,..., P _n ₂ ,..., the recognition result of the m-th attribute P _{1 m} , P _{2 m} ,..., P _nm , the weighted average calculation is performed on the recognition result of each attribute to obtain the face image The recognition result of the attribute in.

以第j 個人臉屬性的識別結果為例：將特徵圖像I ₁ ,I ₂ ,…,I_n 的第j 個屬性的識別結果P _1j ，P _2j ，…，P_nj 進行加權平均計算，獲得人臉圖像的第j 個人臉屬性的識別結果P _j ，其中，第j 個屬性的識別結果P _j 的獲取過程如下公式（1-3）：

（1-3）；這裡，w_ij 是權重數值，w_ij 可以通過大量的已知屬性的預測結果以及對應的屬性識別結果進行訓練得到。為了陳述簡便，上面只陳述了人臉圖像的第j 個屬性的識別結果P _j 的獲取過程，實際應用中，人臉圖像的其他屬性的識別結果與人臉圖像的第j 個屬性的識別結果P _j 的獲取過程類似，此處不再進行贅述。Take the recognition result of the j-th face attribute as an example: calculate the weighted average of the recognition results P _{1 j} , P _{2 j} ,..., P _nj of the j- th attribute of the feature images I ₁ , I ₂ ,..., I _n , Obtain the recognition result P _j of the j-th face attribute of the face image, where the process of obtaining the recognition result P _j of the j-th attribute is as follows:

(1-3); Here, w _ij is the weight value, and w _ij can be obtained through training of a large number of known attribute prediction results and corresponding attribute recognition results. For the sake of simplicity, the above only stated the process of obtaining the recognition result P _j of the j- th attribute of the face image. In practical applications, the recognition results of other attributes of the face image are the same as the j- th attribute of the face image. The process of obtaining the recognition result P _j is similar, and will not be repeated here.

舉例說明，將目標對象的人臉圖像輸入至上述人臉屬性識別神經網路中，經過輸入層210、卷積層220、注意力學習層230、全域平均池化層240以及識別層250的計算，可以得到目標對象的性別屬性結果，例如，目標對象的性別是男性的概率為0.8，性別是女性的概率為0.2，則識別層250輸出目標對象是男性的結果。For example, the face image of the target object is input into the aforementioned facial attribute recognition neural network, and the input layer 210, the convolutional layer 220, the attention learning layer 230, the global average pooling layer 240, and the recognition layer 250 are calculated , The gender attribute result of the target object can be obtained. For example, the probability that the gender of the target object is male is 0.8, and the probability that the gender is female is 0.2, then the recognition layer 250 outputs the result that the target object is male.

可以理解的，上述人臉屬性識別神經網路僅僅作為一種舉例，在實際應用中，還可以是上述神經網路的其他形變形式，例如，可以包括更多的卷積層、更多的池化層，也可以用全連接層代替全域平均池化層，另外，也可以對人臉圖像在輸入層進行二值化、標準化等更多的預處理，本發明實施例不作具體限定。It is understandable that the aforementioned facial attribute recognition neural network is merely an example. In practical applications, it can also be other deformed forms of the aforementioned neural network. For example, it can include more convolutional layers and more pooling layers. A fully connected layer can also be used instead of the global average pooling layer. In addition, more preprocessing such as binarization and standardization can also be performed on the input layer of the face image, which is not specifically limited in the embodiment of the present invention.

需要說明的，上述人臉屬性識別網路是一種卷積神經網路，其中，卷積神經網路可以包括殘差神經網路（Residual Network，ResNet）、VGG網路（VGG Network，VGGNet）等等，此處不作具體限定。人臉圖像的各個屬性的識別可以在同一個卷積神經網路中進行，也可以在不同卷積神經網路中進行，此處不作具體限定。另外，人臉圖像的各個特徵圖像可以用同一個卷積核進行提取，也可以用不同的卷積核進行提取，此處不作具體限定。It should be noted that the above-mentioned face attribute recognition network is a convolutional neural network, where the convolutional neural network can include residual neural network (Residual Network, ResNet), VGG network (VGG Network, VGGNet), etc. Etc., there is no specific limitation here. The recognition of each attribute of the face image can be performed in the same convolutional neural network, or can be performed in different convolutional neural networks, and there is no specific limitation here. In addition, each feature image of the face image can be extracted with the same convolution kernel, or different convolution kernels can be used for extraction, which is not specifically limited here.

基於圖2所示的人臉屬性識別神經網路，本發明實施例提供的一種人臉屬性識別方法，如圖3所示，圖3是本發明實施例提供的一種人臉屬性識別方法的流程示意圖。本發明實施例的人臉屬性識別方法，包括如下步驟。Based on the facial attribute recognition neural network shown in FIG. 2, an embodiment of the present invention provides a face attribute recognition method, as shown in FIG. 3, which is a flow chart of a face attribute recognition method provided by an embodiment of the present invention Schematic. The face attribute recognition method of the embodiment of the present invention includes the following steps.

S101、獲取n 個特徵圖像，其中，所述n 個特徵圖像是對人臉圖像進行一次或者多次特徵提取後得到的，n 為正整數。S101. Acquire n feature images, where the n feature images are obtained by performing feature extraction on a face image one or more times, and n is a positive integer.

在本發明實施例中，在獲取n 個特徵圖像之前，首先獲取人臉圖像，然後通過特徵圖像提取模型對人臉圖像進行特徵圖像的提取，以獲得上述n 個特徵圖像。應理解，在對採集得到的人臉圖像進行特徵圖像提取之前，還可以對人臉圖像進行圖像預處理，例如，圖像的二值化處理或者圖像的標準化處理等等，然後再對預處理後的人臉圖像進行特徵提取，使得獲得的特徵圖像可以包含更多可以用於識別人臉屬性的屬性特徵。In the embodiment of the present invention, before obtaining the n feature images, first obtain the face image, and then perform the feature image extraction on the face image through the feature image extraction model to obtain the above n feature images . It should be understood that before performing feature image extraction on the collected face image, image preprocessing may also be performed on the face image, for example, image binarization processing or image standardization processing, etc., Then, feature extraction is performed on the preprocessed face image, so that the obtained feature image can contain more attribute features that can be used to identify the attributes of the face.

在本發明實施例中，特徵圖像提取模型是通過人臉圖像樣本以及特徵圖像樣本訓練得到的。在實現中，特徵圖像提取模型可以由卷積神經網路訓練得到，也可以由尺度不變特徵變化（Scale-Invariant Feature Transform，SIFT）演算法訓練得到，此處不作具體限定。在使用卷積神經網路訓練得到特徵圖像提取模型的情況下，特徵圖像提取模型中可以包括一個卷積層，也可以包括多個卷積層，此處不作具體限定。In the embodiment of the present invention, the feature image extraction model is obtained through training of face image samples and feature image samples. In the implementation, the feature image extraction model can be obtained by convolutional neural network training, or it can be obtained by the Scale-Invariant Feature Transform (SIFT) algorithm training, which is not specifically limited here. In the case of using a convolutional neural network to train to obtain a feature image extraction model, the feature image extraction model may include one convolutional layer or multiple convolutional layers, which is not specifically limited here.

在本發明的一些實施例中，特徵圖像可以是人臉圖像，也可以是通過將人臉圖像輸入一個或者多個卷積層後得到的圖像，此處不作具體限定。另外，上述n 個特徵圖像的提取可以分別使用同一個卷積核，也可以分別使用不同的卷積核，此處不作具體限定。In some embodiments of the present invention, the characteristic image may be a face image, or may be an image obtained by inputting a face image into one or more convolutional layers, which is not specifically limited here. In addition, the extraction of the aforementioned n feature images may use the same convolution kernel respectively, or may use different convolution kernels respectively, which is not specifically limited here.

S102、將n 個特徵圖像中的每一個特徵圖像分別輸入對應的注意力模型，獲得每一個特徵圖像對應的注意力矩陣。S102. Input each feature image of the n feature images into a corresponding attention model to obtain an attention matrix corresponding to each feature image.

在本發明實施例中，使用上述特徵圖像樣本以及對應的注意力矩陣樣本對神經網路進行訓練，獲得注意力模型。例如，將目標人臉圖像經過3個卷積層後得到的特徵圖像作為特徵圖像樣本，將提取人臉圖像中目標對象眼睛大小的權值矩陣作為注意力矩陣樣本，利用上述特徵圖像樣本和注意力矩陣樣本可訓練得到用於提取人臉圖像中目標對象眼睛大小的注意力模型。換句話說，注意力模型反映了注意力矩陣與特徵圖像之間的映射關係，注意力模型可以表示為如下公式（1-4）：

（1-4）；其中，x 是特徵圖像，y 是注意力矩陣，

是注意力矩陣與輸入的特徵圖像之間的映射關係。

可以是通過大量的樣本集訓練得到，其中，每一個樣本集包括特徵圖像樣本以及對應的注意力矩陣樣本，一個樣本集用於訓練一個注意力模型。In the embodiment of the present invention, the neural network is trained using the above-mentioned characteristic image samples and the corresponding attention matrix samples to obtain an attention model. For example, take the feature image obtained by the target face image through 3 convolutional layers as the feature image sample, and extract the weight matrix of the target object's eye size in the face image as the attention matrix sample, using the above feature map The image sample and the attention matrix sample can be trained to obtain the attention model used to extract the eye size of the target object in the face image. In other words, the attention model reflects the mapping relationship between the attention matrix and the feature image. The attention model can be expressed as the following formula (1-4):

(1-4); where x is the feature image, y is the attention matrix,

It is the mapping relationship between the attention matrix and the input feature image.

It can be obtained by training a large number of sample sets, where each sample set includes a characteristic image sample and a corresponding attention matrix sample, and one sample set is used to train an attention model.

在本發明實施例中，注意力模型可以是通過卷積神經網路訓練得到的，其中，卷積神經網路可以是簡單結構的網路，比如只有二個卷積層的小網路，從而可以高效準確地提取出人臉圖像中目標對象眼睛大小的屬性特徵，卷積神經網路也可以是具有10個卷積層的複雜網路，配置為提取人臉圖像中目標對象的年齡等細微屬性的屬性特徵，此處不作具體限定，並且，所述注意力模型訓練的具體過程這裡不再進行贅述。另外，卷積神經網路可以是殘差神經網路（Residual Network，ResNet），還可以是VGG網路（VGG Network，VGGNet）等等，此處不作具體限定。In the embodiment of the present invention, the attention model can be obtained through convolutional neural network training, where the convolutional neural network can be a network with a simple structure, such as a small network with only two convolutional layers, so that Efficiently and accurately extract the attribute characteristics of the target object's eye size in the face image. The convolutional neural network can also be a complex network with 10 convolutional layers, configured to extract the age of the target object in the face image. The attribute characteristics of the attributes are not specifically limited here, and the specific process of the attention model training is not repeated here. In addition, the convolutional neural network may be a residual neural network (Residual Network, ResNet), or a VGG network (VGG Network, VGGNet), etc., which is not specifically limited here.

在本發明實施例中，在識別人臉圖像中的m 個屬性的情況下，需要訓練得到m 個注意力模型，其中，一個注意力模型用於提取一個屬性的屬性特徵。可以將n 個特徵圖像中的每一個特徵圖像輸入m 個注意力模型，獲得每一個特徵圖像對應的m 個注意力矩陣，以及n 個特徵圖像對應的n ×m 個注意力矩陣。其中，一個注意力矩陣用於提取對應的特徵圖像的一個人臉屬性特徵，m 為正整數。In the embodiment of the present invention, in the case of recognizing m attributes in a face image, m attention models need to be trained, where one attention model is used to extract the attribute feature of one attribute. Wherein each of the n image feature image of m may be inputted attention model is obtained for each of m corresponding to the image feature matrix attention, and n corresponding to the image feature number n × m matrix attention . Among them, an attention matrix is used to extract a face attribute feature of the corresponding feature image, and m is a positive integer.

在本發明實施例中，上述屬性可以包括很多種，以人臉為例，人臉的屬性包括局部屬性和全域屬性，其中，局部屬性包括髮型、眉毛疏密、眼睛大小、眼珠顏色、鼻樑高低、是否配戴眼鏡、是否配戴飾品等；全域屬性包括性別、年齡、民族、表情等。In the embodiment of the present invention, the above attributes may include many kinds. Taking a human face as an example, the attributes of a human face include local attributes and global attributes. The local attributes include hairstyle, eyebrow density, eye size, eyeball color, and nose bridge height. , Whether to wear glasses, whether to wear accessories, etc.; global attributes include gender, age, ethnicity, facial expressions, etc.

S103、根據注意力矩陣，提取n 個特徵圖像中的每一個特徵圖像對應的屬性特徵，其中，屬性特徵為特徵圖像中用於表徵目標人臉屬性的特徵。S103. According to the attention matrix, extract the attribute feature corresponding to each feature image in the n feature images, where the attribute feature is a feature used to characterize the attributes of the target face in the feature image.

在本發明實施例中，將n 個特徵圖像中的每一個特徵圖像與對應的注意力矩陣進行乘法運算，獲得n 個特徵圖像中的每一個特徵圖像對應的屬性特徵。其中，屬性特徵為特徵圖像中用於表徵目標人臉屬性的特徵，例如，當識別的人臉屬性為是否配戴眼鏡時，屬性特徵是與眼睛區域有關的特徵。In the embodiment of the present invention, each feature image of the n feature images is multiplied with the corresponding attention matrix to obtain the attribute feature corresponding to each feature image of the n feature images. Among them, the attribute feature is a feature used to characterize the attributes of the target face in the feature image. For example, when the recognized face attribute is whether to wear glasses, the attribute feature is a feature related to the eye area.

可以理解的，在人臉圖像中有m 個屬性需要識別的情況下，根據n 個特徵圖像對應的n ×m 個注意力矩陣，將特徵圖像與對應的注意力矩陣進行乘法運算，可以獲取得到的n 個特徵圖像的屬性特徵有n ×m 個，包括：第1個特徵圖像的m 個屬性特徵A ₁₁ ，A ₁₂ ，…，A _1m ，第2個特徵圖像的m 個屬性特徵A ₂₁ ，A ₂₂ ，…，A _2m ，..，第n 個特徵圖像的m 個屬性特徵A_n ₁ ，A_n ₂ ，…，A_nm ，其中，A_nm 為第n 個特徵圖像中用於表徵第m 個人臉屬性的特徵。It will be appreciated, the case where m attributes to be identified, according to the characteristics of the image corresponding to the n n × m matrix number attention, the attention feature image corresponding to a matrix multiplication in the face image, n may be acquired attributes of the features of the image obtained with a number n × m, comprising: m attributes of a feature characteristic of the image _{_{a 11, a 12, ...,}} a 1 m, the second feature image m attribute features A ₂₁ , A ₂₂ ,..., A _{2 m} , .., m attribute features A _n ₁ , A _n ₂ ,..., A _{nm of} the nth feature image, where A _nm is the nth The features used to characterize the face attributes of the m-th person in each feature image.

S104、根據屬性特徵，確定人臉圖像中目標人臉屬性的屬性識別結果。S104: Determine the attribute recognition result of the target face attribute in the face image according to the attribute feature.

在本發明的一些實施例中，根據屬性特徵，確定所述人臉圖像中目標人臉屬性的屬性識別結果包括：分別根據每一個屬性特徵，預測得到對應的識別結果；對識別結果進行加權平均計算，確定人臉圖像中目標人臉屬性的屬性識別結果。In some embodiments of the present invention, determining the attribute recognition result of the target face attribute in the face image according to the attribute feature includes: predicting the corresponding recognition result according to each attribute feature; weighting the recognition result Average calculation to determine the attribute recognition result of the target face attribute in the face image.

在本發明的一些實施例中，分別根據每一個屬性特徵，預測得到對應的識別結果包括：對屬性特徵中的每一個屬性特徵進行屬性預測處理，預測得到屬性特徵對應的識別結果。In some embodiments of the present invention, predicting the corresponding recognition result according to each attribute feature includes: performing attribute prediction processing on each attribute feature in the attribute feature, and predicting the recognition result corresponding to the attribute feature.

在本發明的一些實施例中，屬性預測處理包括全域平均池化處理。可以對屬性特徵A ₁₁ 進行全域平均池化，獲得第1個特徵圖像的第1個屬性的屬性識別結果P ₁₁ ，對屬性特徵A ₂₁ 進行全域平均池化，獲得第2個特徵圖像的第1個屬性的屬性識別結果P ₂₁ ，…，對屬性特徵A_nm 進行全域平均池化，獲得第n 個特徵圖像的第m 個屬性的屬性識別結果P_nm ；對屬性識別結果P ₁₁ ，P ₂₁ ，…，P_n ₁ 進行加權平均計算，獲得人臉圖像的第1個屬性的屬性識別結果，對屬性識別結果P ₁₂ ，P ₂₂ ，…，P_n ₂ 進行加權平均計算，獲得人臉圖像的第2個屬性的屬性識別結果，…，對屬性識別結果P _1m ，P _2m ，…，P_nm 進行加權平均計算，獲得人臉圖像的第m 個屬性的屬性識別結果。In some embodiments of the present invention, the attribute prediction processing includes global average pooling processing. It is possible to perform global average pooling on the attribute feature A ₁₁ to obtain the attribute recognition result P ₁₁ of the first attribute of the first feature image, and perform global average pooling on the attribute feature A ₂₁ to obtain the second feature image The attribute recognition result of the first attribute P ₂₁ ,..., the attribute characteristic A _nm is pooled globally to obtain the attribute recognition result P _nm of the m- th attribute of the n- th characteristic image; for the attribute recognition result P ₁₁ , P ₂₁ , …, P _n ₁ perform weighted average calculation to obtain the attribute recognition result of the first attribute of the face image, and perform weighted average calculation on the attribute recognition results P ₁₂ , P ₂₂ , …, P _n ₂ to obtain the person The attribute recognition result of the second attribute of the face image,..., the weighted average calculation is performed on the attribute recognition results P _{1 m} , P _{2 m} ,..., P _nm to obtain the attribute recognition result of the m-th attribute of the face image .

上述人臉屬性識別的方法能夠根據獲取得到的人臉圖像，經過特徵圖像提取模型得到多個不同層次的特徵圖像，然後將這些不同層次的特徵圖像輸入注意力模型，以獲取對應的注意力矩陣，通過將特徵圖像以及注意力矩陣相乘，得到多個屬性特徵，從而更高效、準確的識別人臉圖像中各個屬性。The above face attribute recognition method can obtain multiple different levels of feature images through the feature image extraction model based on the acquired face images, and then input these different levels of feature images into the attention model to obtain the corresponding The attention matrix of, by multiplying the feature image and the attention matrix to obtain multiple attribute features, so as to recognize each attribute in the face image more efficiently and accurately.

下面以識別屬性為是否配戴眼鏡、性別以及表情為例，對圖3所示的人臉屬性識別方法進行舉例說明，如圖4所示，圖4是本發明實施例提供的一種對目標對象是否配戴眼鏡、性別以及表情進行識別的流程示意圖。本發明實施例的步驟可以如下。The following takes the recognition attributes as whether to wear glasses, gender, and expression as an example to illustrate the face attribute recognition method shown in FIG. 3, as shown in FIG. 4, which is a method for identifying a target object according to an embodiment of the present invention. Schematic diagram of the process of identifying whether to wear glasses, gender, and facial expressions. The steps of the embodiment of the present invention may be as follows.

1、獲取目標對象的人臉圖像。1. Obtain the face image of the target object.

2、經過一次卷積計算，得到第一特徵圖像。2. After a convolution calculation, the first feature image is obtained.

3、提取第一特徵圖像中目標對象是否配戴眼鏡的特徵、目標對象的性別特徵以及目標對象的表情特徵。首先將第一特徵圖像分別輸入第一注意力模型、第二注意力模型以及第三注意力模型中，得到第一注意力矩陣、第二注意力矩陣以及第三注意力矩陣。其中，第一注意力矩陣用於提取目標對象是否配戴眼鏡的特徵，第二注意力模型用於提取目標對象的性別特徵，第三注意力模型用於提取目標對象的表情特徵；然後將第一特徵圖像分別與第一注意力矩陣、第二注意力矩陣和第三注意力矩陣相乘，從而提取到第一特徵圖像中目標對象是否配戴眼鏡的特徵、目標對象的性別特徵以及目標對象的表情特徵。3. Extract the characteristics of whether the target object wears glasses, the gender characteristics of the target object, and the expression characteristics of the target object in the first characteristic image. First, the first feature image is input into the first attention model, the second attention model, and the third attention model, respectively, to obtain the first attention matrix, the second attention matrix, and the third attention matrix. Among them, the first attention matrix is used to extract the characteristics of whether the target object wears glasses, the second attention model is used to extract the gender characteristics of the target object, and the third attention model is used to extract the expression characteristics of the target object; A feature image is respectively multiplied by the first attention matrix, the second attention matrix, and the third attention matrix, so as to extract the feature of whether the target object wears glasses, the gender feature of the target object in the first feature image The facial expression characteristics of the target object.

4、將第一特徵圖像再進行三次卷積計算，得到第二特徵圖像。4. Perform three more convolution calculations on the first feature image to obtain the second feature image.

5、提取第二特徵圖像中目標對象是否配戴眼鏡的特徵、目標對象的性別特徵以及目標對象的表情特徵，實現步驟可以參見步驟3。5. Extract the characteristics of whether the target object wears glasses, the gender characteristics of the target object, and the expression characteristics of the target object in the second characteristic image. Refer to step 3 for the implementation steps.

6、將第二特徵圖像進行五次卷積計算，得到第三特徵圖像。6. Perform five convolution calculations on the second feature image to obtain the third feature image.

7、提取第三特徵圖像中目標對象是否配戴眼鏡的特徵、目標對象的性別特徵以及目標對象的表情特徵，實現步驟可以參見步驟3。7. Extract the characteristics of whether the target object wears glasses, the gender characteristics of the target object, and the expression characteristics of the target object in the third characteristic image. Refer to step 3 for the implementation steps.

8、將步驟3、步驟5和步驟7中提取的9個特徵進行全域平均池化操作，分別得到對應的識別結果。8. Perform the global average pooling operation on the 9 features extracted in step 3, step 5 and step 7 to obtain corresponding recognition results respectively.

9、將第一特徵圖像、第二特徵圖像以及第三特徵圖像中目標對象是否配戴眼鏡的識別結果進行加權平均計算，得到目標對象配戴眼鏡或者未配戴眼鏡。9. Perform a weighted average calculation on the recognition results of whether the target object wears glasses in the first characteristic image, the second characteristic image, and the third characteristic image to obtain whether the target object wears glasses or does not wear glasses.

舉例說明，假設第一特徵圖像中目標對象配戴眼鏡的概率為0.4，未配戴眼鏡的概率為0.6；第二特徵圖像中目標對象配戴眼鏡的識別結果為0.25，未配戴眼鏡的概率為0.75；第三特徵圖像中目標對象配戴眼鏡的識別結果為0.2，未配戴眼鏡的概率為0.8；且第一特徵圖像在識別目標對象是否配戴眼鏡中的權重為0.2，第二特徵圖像在識別目標對象是否配戴眼鏡中的權重為0.4，第二特徵圖像在識別目標對象是否配戴眼鏡中的權重為0.4，經過計算，可得知目標對象配戴眼鏡的概率為0.4x0.2+0.25x0.4+0.2x0.4=0.26，目標未配戴眼鏡的概率為

。因此，人臉屬性識別裝置輸出目標對象未配戴眼鏡。For example, suppose that the probability of the target object wearing glasses in the first feature image is 0.4, and the probability of not wearing glasses is 0.6; the recognition result of the target object wearing glasses in the second feature image is 0.25, and the probability of not wearing glasses is 0.25 The probability of is 0.75; the recognition result of the target object wearing glasses in the third feature image is 0.2, and the probability of not wearing glasses is 0.8; and the weight of the first feature image in identifying whether the target object wears glasses is 0.2 , The weight of the second feature image in identifying whether the target object wears glasses is 0.4, and the weight of the second feature image in identifying whether the target object wears glasses is 0.4. After calculation, it can be known that the target object wears glasses The probability of is 0.4x0.2+0.25x0.4+0.2x0.4=0.26, and the probability of the target not wearing glasses is

. Therefore, the face attribute recognition device outputs that the target object does not wear glasses.

10、將第一特徵圖像、第二特徵圖像以及第三特徵圖像中目標對象的性別的識別結果進行加權平均計算，得到目標對象為男性或者女性。實現步驟可以參見步驟9。10. Perform a weighted average calculation on the gender recognition results of the target object in the first feature image, the second feature image, and the third feature image to obtain that the target object is male or female. See step 9 for implementation steps.

11、將第一特徵圖像、第二特徵圖像以及第三特徵圖像中目標對象的表情的識別結果進行加權平均計算，得到目標對象是開心或是悲傷或是惱怒等等。實現步驟可以參見步驟9。11. Perform a weighted average calculation on the recognition results of the expressions of the target object in the first feature image, the second feature image, and the third feature image to obtain whether the target object is happy, sad, or angry, and so on. See step 9 for implementation steps.

結合上文圖1-圖4所示的相關實施例，下面闡述本發明實施例涉及的相關裝置。With reference to the related embodiments shown in Figures 1 to 4 above, the related devices involved in the embodiments of the present invention are described below.

參見圖5，圖5是本發明實施例提供的一種人臉屬性識別裝置的結構示意圖。本發明實施例的人臉屬性識別裝置包括特徵圖像提取單元510，屬性特徵提取單元520、人臉屬性識別單元530。Referring to FIG. 5, FIG. 5 is a schematic structural diagram of a face attribute recognition apparatus provided by an embodiment of the present invention. The face attribute recognition device in the embodiment of the present invention includes a feature image extraction unit 510, an attribute feature extraction unit 520, and a face attribute recognition unit 530.

特徵圖像提取單元510，配置為獲取n 個特徵圖像，其中，所述n 個特徵圖像是對人臉圖像進行一次或者多次特徵提取後得到的，n 為正整數。The feature image extraction unit 510 is configured to obtain n feature images, where the n feature images are obtained by performing feature extraction on a face image one or more times, and n is a positive integer.

屬性特徵提取單元520，配置為獲取n 個特徵圖像中每一個特徵圖像對應的屬性特徵，其中，屬性特徵為特徵圖像中用於表徵目標人臉屬性的特徵。The attribute feature extraction unit 520 is configured to obtain the attribute feature corresponding to each feature image in the n feature images, where the attribute feature is a feature used to characterize the attributes of the target face in the feature image.

人臉屬性識別單元530，配置為根據屬性特徵，確定人臉圖像中目標人臉屬性的屬性識別結果。The face attribute recognition unit 530 is configured to determine the attribute recognition result of the target face attribute in the face image according to the attribute feature.

在本發明實施例中，屬性特徵提取單元520還配置為：將n 個特徵圖像中的每一個特徵圖像分別輸入對應的注意力模型，獲得每一個特徵圖像對應的注意力矩陣；根據注意力矩陣提取所述n 個特徵圖像中的每一個特徵圖像對應的屬性特徵。In the embodiment of the present invention, the attribute feature extraction unit 520 is further configured to: input each feature image of the n feature images into the corresponding attention model to obtain the attention matrix corresponding to each feature image; The attention matrix extracts the attribute feature corresponding to each of the n feature images.

在本發明實施例中，屬性特徵提取單元520還配置為：將n 個特徵圖像中的每一個特徵圖像與對應的注意力矩陣進行乘法運算，獲得n 個特徵圖像中的每一個特徵圖像對應的屬性特徵。In an embodiment of the present invention, the attribute feature extraction unit 520 is further configured to: n feature images corresponding to each of the attention image feature matrix multiplication to obtain the n wherein each feature image The attribute characteristics corresponding to the image.

在本發明實施例中，人臉屬性識別單元530還配置為：分別根據每一個屬性特徵，預測得到對應的識別結果；對識別結果進行加權平均計算，確定人臉圖像中目標人臉屬性的屬性識別結果。In the embodiment of the present invention, the face attribute recognition unit 530 is further configured to: respectively obtain a corresponding recognition result according to each attribute feature; perform a weighted average calculation on the recognition result to determine the target face attribute in the face image Attribute recognition result.

在本發明實施例中，人臉屬性識別單元530還配置為：對屬性特徵中的每一個屬性特徵進行屬性預測處理，預測得到屬性特徵對應的所述識別結果。In the embodiment of the present invention, the face attribute recognition unit 530 is further configured to perform attribute prediction processing on each attribute feature in the attribute feature, and predict the recognition result corresponding to the attribute feature.

在本發明的一些實施例中，屬性預測處理包括全域平均池化處理。In some embodiments of the present invention, the attribute prediction processing includes global average pooling processing.

在本發明實施例中，在所述獲取n 個特徵圖像之前，人臉屬性識別裝置500還配置為：獲取人臉圖像；通過特徵圖像提取模型對所述人臉圖像進行特徵提取，以獲得n 個特徵圖像。In the embodiment of the present invention, before acquiring the n feature images, the face attribute recognition device 500 is further configured to: acquire a face image; perform feature extraction on the face image through a feature image extraction model , To obtain n feature images.

在本發明的一些實施例中，特徵圖像可以是人臉圖像，也可以是通過將人臉圖像輸入一個或者多個卷積層後得到的，此處不作具體限定。另外，n 個特徵圖像的提取可以分別使用同一個卷積核，也可以使用不同的卷積核，此處不作具體限定。In some embodiments of the present invention, the characteristic image may be a face image, or may be obtained by inputting the face image into one or more convolutional layers, which is not specifically limited here. In addition, the extraction of n feature images may use the same convolution kernel or different convolution kernels, which is not specifically limited here.

在本發明的一些實施例中，人臉屬性識別裝置500還包括：特徵圖像提取模型訓練單元540以及注意力模型訓練單元550。In some embodiments of the present invention, the face attribute recognition device 500 further includes: a feature image extraction model training unit 540 and an attention model training unit 550.

特徵圖像提取模型訓練單元540，配置為訓練特徵圖像提取模型，特徵圖像提取模型是通過人臉圖像樣本以及特徵圖像樣本訓練得到的。在實現中，特徵圖像提取模型可以由卷積神經網路訓練得到，也可以由尺度不變特徵變化（Scale-Invariant Feature Transform，SIFT）演算法訓練得到，此處不作具體限定。在使用卷積神經網路訓練得到特徵圖像提取模型的情況下，特徵圖像提取模型中可以包括一個卷積層，也可以包括多個卷積層，此處不作具體限定。The feature image extraction model training unit 540 is configured to train a feature image extraction model, and the feature image extraction model is obtained through training of face image samples and feature image samples. In the implementation, the feature image extraction model can be obtained by convolutional neural network training, or it can be obtained by the Scale-Invariant Feature Transform (SIFT) algorithm training, which is not specifically limited here. In the case of using a convolutional neural network to train to obtain a feature image extraction model, the feature image extraction model may include one convolutional layer or multiple convolutional layers, which is not specifically limited here.

注意力模型訓練單元550，配置為使用上述特徵圖像樣本以及對應的注意力矩陣樣本對神經網路進行訓練，獲得注意力模型。例如，將目標人臉圖像經過3個卷積層後得到的特徵圖像作為特徵圖像樣本，將提取人臉圖像中目標對象眼睛大小的權值矩陣作為注意力矩陣樣本，利用上述特徵圖像樣本和注意力矩陣樣本可訓練得到用於提取人臉圖像中目標對象眼睛大小的注意力模型。換句話說，注意力模型反映了注意力矩陣與特徵圖像之間的映射關係，注意力模型可以表示為如下公式（1-5）：

（1-5）；其中，x 是特徵圖像，y 是注意力矩陣，

是注意力矩陣與輸入的特徵圖像之間的映射關係。

可以是通過大量的樣本集訓練得到，其中，每一個樣本集包括特徵圖像樣本以及對應的注意力矩陣樣本，一個樣本集用於訓練一個注意力模型。The attention model training unit 550 is configured to train the neural network using the above-mentioned characteristic image samples and the corresponding attention matrix samples to obtain an attention model. For example, take the feature image obtained by the target face image through 3 convolutional layers as the feature image sample, and extract the weight matrix of the target object's eye size in the face image as the attention matrix sample, using the above feature map The image sample and the attention matrix sample can be trained to obtain the attention model used to extract the eye size of the target object in the face image. In other words, the attention model reflects the mapping relationship between the attention matrix and the feature image. The attention model can be expressed as the following formula (1-5):

(1-5); where x is the feature image, y is the attention matrix,

可以理解，目標對象的性別或是表情等屬性的識別屬於全域屬性識別，因此在判斷目標對象全域屬性（如，性別或是表情）時可以先判斷多個局部屬性（如，髮型、是否配戴耳環或者眼睛大小等）。也就是說，上述人臉屬性識別裝置可以先對目標對象的局部屬性特徵進行識別，根據局部屬性的識別結果綜合評價人臉的全域屬性。It can be understood that the recognition of attributes such as gender or expression of the target object belongs to global attribute recognition. Therefore, when judging the global attributes of the target object (such as gender or expression), multiple local attributes (such as hairstyle, whether to wear or not) can be judged first. Earrings or eye size, etc.). In other words, the above-mentioned face attribute recognition device may first recognize the local attribute characteristics of the target object, and comprehensively evaluate the global attributes of the face according to the recognition result of the local attributes.

需要說明的，上述人臉識別裝置還可以用於安防領域，幫助安防工作人員快速準確地篩選具有攻擊性的人物，例如，在機場登機口，使用上述人臉識別裝置可以識別出乘機人員的人臉屬性特徵，判斷乘機人員是否具有攻擊性。在乘機人員具有攻擊性的情況下，將乘機人員的人臉圖像與已有的人臉圖像庫進行對比，從而確定乘機人員的身份資訊。It should be noted that the aforementioned facial recognition device can also be used in the security field to help security staff quickly and accurately screen offensive persons. For example, at an airport boarding gate, using the aforementioned facial recognition device can identify passengers’ Face attribute characteristics to determine whether the flight crew is aggressive. In the case that the flight crew is aggressive, the flight crew's facial image is compared with the existing facial image database to determine the flight crew's identity information.

上述人臉屬性識別裝置通過特徵圖像提取單元對獲取得到的人臉圖像進行特徵提取得到多個不同層次的特徵圖像，然後在注意力特徵學習單元中，將這些不同層次的特徵圖像輸入注意力模型，獲取對應的注意力矩陣，通過將特徵圖像以及注意力矩陣相乘，得到多個屬性特徵，最後在人臉屬性識別單元中，將多個屬性特徵進行加權平均計算，從而更高效、準確的識別人臉圖像中各個屬性。The above-mentioned face attribute recognition device performs feature extraction on the acquired face image through the feature image extraction unit to obtain multiple feature images of different levels, and then combine these feature images of different levels in the attention feature learning unit Enter the attention model to obtain the corresponding attention matrix, and obtain multiple attribute features by multiplying the feature image and the attention matrix. Finally, in the face attribute recognition unit, the multiple attribute features are weighted and averaged, thereby Recognize each attribute in the face image more efficiently and accurately.

本發明實施例的人臉識別裝置可以在單個計算節點中實現，也可以在雲計算基礎設施上實現，此處不作具體限定。下面將分別介紹如何在單個計算節點和雲計算基礎設施上人臉識別裝置。The face recognition apparatus of the embodiment of the present invention may be implemented in a single computing node, or may be implemented on a cloud computing infrastructure, which is not specifically limited here. The following will separately introduce how to implement a face recognition device on a single computing node and cloud computing infrastructure.

如圖6所示，計算節點600可以包括處理器610、記憶體620以及通信網路630。其中，處理器配置為運行特徵圖像提取單元611、屬性特徵提取單元612、人臉屬性識別單元613、特徵圖像提取模型訓練單元614以及注意力模型訓練單元615等等。記憶體620配置為儲存人臉圖像、特徵圖像、屬性特徵、注意力矩陣等等。計算節點600還提供了兩種對外的介面，分別是面向裝置的維護人員的管理介面640以及面向使用者的使用者介面650。其中，介面的形態可以是多樣的，例如web介面、命令列工具、REST介面等。As shown in FIG. 6, the computing node 600 may include a processor 610, a memory 620, and a communication network 630. The processor is configured to run a feature image extraction unit 611, an attribute feature extraction unit 612, a face attribute recognition unit 613, a feature image extraction model training unit 614, an attention model training unit 615, and so on. The memory 620 is configured to store face images, feature images, attribute features, attention matrix, and so on. The computing node 600 also provides two external interfaces, namely, a management interface 640 for maintenance personnel of the device and a user interface 650 for users. Among them, the form of the interface can be various, such as web interface, command line tool, REST interface and so on.

在本發明的一些實施例中，管理介面640配置為供維護人員可以通過輸入大量的人臉圖像樣本以及對應的特徵圖像樣本，以用於訓練特徵圖像提取模型；還可以通過輸入大量特徵圖像樣本以及對應的注意力矩陣樣本，以用於對注意力模型進行訓練；使用者介面650配置為供使用者輸入需要被屬性識別的目標對象的人臉圖像，以及通過使用者介面650向使用者輸出人臉屬性的識別結果。In some embodiments of the present invention, the management interface 640 is configured to allow maintenance personnel to input a large number of face image samples and corresponding feature image samples to train the feature image extraction model; it can also input a large number of face image samples and corresponding feature image samples. The feature image samples and the corresponding attention matrix samples are used to train the attention model; the user interface 650 is configured for the user to input the face image of the target object that needs to be attributed, and through the user interface 650 outputs the recognition result of face attributes to the user.

應當理解，計算節點600僅為本發明實施例提供的一個例子，並且，計算節點600可具有比示出的部件更多或更少的部件，可以組合兩個或更多個部件，或者可具有部件的不同配置實現。It should be understood that the computing node 600 is only an example provided by the embodiment of the present invention, and the computing node 600 may have more or fewer components than shown, may combine two or more components, or may have Different configurations of components are realized.

如圖7所示，雲計算基礎設施可以是雲服務集群700。所述雲服務集群700是由節點，以及節點之間的通信網路構成。上述節點可以是計算節點，也可以是運行在計算節點上的虛擬機器。節點按功能可分為兩類：計算節點710、儲存節點720以及通信介面730。計算節點710配置為運行特徵圖像提取單元711、屬性特徵提取單元712、人臉屬性識別單元713、特徵圖像提取模型訓練單元714以及注意力模型訓練單元715。儲存節點720配置為儲存人臉圖像、特徵圖像以及屬性特徵等等。雲服務集群700還提供了兩種對外的介面，分別是面向問答引擎的維護人員的管理介面740以及面向使用者的使用者介面750。其中，介面的形態可以是多樣的，例如web介面、命令列工具、表現層狀態轉換（Representational State Transfer，REST）介面等。As shown in FIG. 7, the cloud computing infrastructure may be a cloud service cluster 700. The cloud service cluster 700 is composed of nodes and a communication network between the nodes. The aforementioned nodes may be computing nodes or virtual machines running on computing nodes. Nodes can be divided into two categories according to their functions: computing nodes 710, storage nodes 720, and communication interfaces 730. The computing node 710 is configured to run a feature image extraction unit 711, an attribute feature extraction unit 712, a face attribute recognition unit 713, a feature image extraction model training unit 714, and an attention model training unit 715. The storage node 720 is configured to store face images, feature images, attribute features, and so on. The cloud service cluster 700 also provides two external interfaces, namely a management interface 740 for maintenance personnel of the question and answer engine and a user interface 750 for users. Among them, the form of the interface can be diverse, such as a web interface, a command line tool, a Representational State Transfer (REST) interface, and so on.

在本發明的一些實施例中，管理介面740配置為供維護人員可以通過輸入大量的人臉圖像樣本以及對應的特徵圖像樣本，以用於訓練特徵圖像提取模型；還可以通過輸入大量特徵圖像樣本以及對應的注意力矩陣樣本，以用於對注意力模型進行訓練；使用者介面750配置為供使用者輸入需要被屬性識別的目標對象的人臉圖像，以及通過使用者介面750向使用者輸出人臉屬性的識別結果。In some embodiments of the present invention, the management interface 740 is configured to allow maintenance personnel to input a large number of face image samples and corresponding feature image samples to train the feature image extraction model; it can also input a large number of face image samples and corresponding feature image samples. The feature image samples and the corresponding attention matrix samples are used to train the attention model; the user interface 750 is configured for the user to input the face image of the target object that needs to be attributed, and through the user interface 750 outputs the recognition result of face attributes to the user.

應當理解，雲服務集群700僅為本發明實施例提供的一個例子，並且，雲服務集群700可具有比示出的部件更多或更少的部件，可以組合兩個或更多個部件，或者可具有部件的不同配置實現。It should be understood that the cloud service cluster 700 is only an example provided by the embodiment of the present invention, and the cloud service cluster 700 may have more or fewer components than shown, two or more components may be combined, or It can be implemented with different configurations of components.

參見圖8，本發明實施例提供了另一實施方式的人臉屬性識別裝置800的結構示意圖，本實施方式的人臉屬性識別裝置可以在如圖6所示的電腦節點中實現，至少包括：處理器810、通信介面820以及記憶體830，其中，處理器810、通信介面820和記憶體830通過匯流排840進行耦合。Referring to FIG. 8, an embodiment of the present invention provides a schematic structural diagram of a face attribute recognition apparatus 800 of another embodiment. The face attribute recognition apparatus of this embodiment can be implemented in a computer node as shown in FIG. 6, and includes at least: The processor 810, the communication interface 820, and the memory 830, wherein the processor 810, the communication interface 820, and the memory 830 are coupled through the bus 840.

處理器810包括一個或者多個通用處理器，其中，通用處理器可以是能夠處理電子指令的任何類型的設備，包括中央處理器（Central Processing Unit，CPU）、微處理器、微控制器、主處理器、控制器以及專用積體電路（Application Specific Integrated Circuit，ASIC）等等。處理器810讀取記憶體830中儲存的程式碼，與通信介面820配合執行本發明上述實施例中由人臉屬性識別裝置500執行的方法的部分或者全部步驟。The processor 810 includes one or more general-purpose processors, where the general-purpose processor can be any type of device capable of processing electronic instructions, including a central processing unit (Central Processing Unit, CPU), a microprocessor, a microcontroller, and a host Processor, controller and application specific integrated circuit (Application Specific Integrated Circuit, ASIC) and so on. The processor 810 reads the program code stored in the memory 830, and cooperates with the communication interface 820 to execute part or all of the steps of the method executed by the face attribute recognition apparatus 500 in the foregoing embodiment of the present invention.

通信介面820可以為有線介面（例如乙太網介面），配置為與其他計算節點或裝置進行通信。The communication interface 820 may be a wired interface (for example, an Ethernet interface), and is configured to communicate with other computing nodes or devices.

記憶體830可以包括易失性記憶體，例如隨機存取記憶體（Random Access Memory，RAM）；記憶體也可以包括非易失性記憶體（Non-Volatile Memory），例如唯讀記憶體（Read-Only Memory，ROM）、快閃記憶體（Flash Memory）、硬碟（Hard Disk Drive，HDD）或固態硬碟（Solid-State Drive，SSD）記憶體還可以包括上述種類的記憶體的組合。記憶體813可以儲存有程式碼以及程式資料。其中，程式碼包括特徵圖像提取單元的代碼、屬性特徵提取單元的代碼、人臉屬性識別單元的代碼、特徵圖像提取模型訓練單元的代碼以及注意力模型訓練單元的代碼。程式資料包括：大量人臉圖像、特徵圖像以及對應的注意力矩陣，以用於對特徵圖像提取模型以及注意模型進行訓練；程式資料還包括大量用於計算屬性識別結果的權重值等等。The memory 830 may include volatile memory, such as random access memory (Random Access Memory, RAM); the memory may also include non-volatile memory (Non-Volatile Memory), such as read-only memory (Read Only Memory). -Only Memory, ROM, Flash Memory, Hard Disk Drive (HDD), or Solid-State Drive (SSD) memory may also include a combination of the above types of memory. The memory 813 can store program codes and program data. Among them, the program code includes the code of the feature image extraction unit, the code of the attribute feature extraction unit, the code of the face attribute recognition unit, the code of the feature image extraction model training unit, and the code of the attention model training unit. The program data includes a large number of face images, feature images and the corresponding attention matrix, which are used to train the feature image extraction model and attention model; the program data also includes a large number of weight values used to calculate the attribute recognition results, etc. Wait.

其中，所述處理器810通過調用記憶體830中的程式碼，用於執行以下步驟：處理器810配置為獲取n 個特徵圖像，其中，n 個特徵圖像是對人臉圖像進行一次或者多次特徵提取後得到的，n 為正整數；處理器810配置為獲取n 個特徵圖像中的每一個特徵圖像對應的屬性特徵，其中，屬性特徵為特徵圖像中用於表徵目標人臉屬性的特徵；處理器810配置為根據屬性特徵，確定人臉圖像的目標人臉屬性的屬性識別結果。Wherein, the processor 810 is used to execute the following steps by calling the program code in the memory 830: The processor 810 is configured to obtain n feature images, where n feature images are performed on the face image once. Or obtained after multiple feature extractions, n is a positive integer; the processor 810 is configured to obtain the attribute feature corresponding to each feature image in the n feature images, where the attribute feature is used to characterize the target in the feature image The feature of the face attribute; the processor 810 is configured to determine the attribute recognition result of the target face attribute of the face image according to the attribute feature.

處理器810還配置為使用人臉圖像樣本以及特徵圖像樣本訓練得到特徵圖像提取模型；以及使用上述特徵圖像樣本以及對應的注意力矩陣樣本對神經網路進行訓練，獲得所述注意力模型。The processor 810 is further configured to train a feature image extraction model using face image samples and feature image samples; and use the feature image samples and corresponding attention matrix samples to train the neural network to obtain the attention Force model.

參見圖9，圖9是本發明實施例提供的又一實施方式的人臉屬性識別裝置的結構示意圖。本實施方式的人臉屬性識別裝置可以在如圖7所示的雲服務集群中實現，包括至少一個計算節點910以及至少一個儲存節點920。Refer to FIG. 9, which is a schematic structural diagram of a face attribute recognition apparatus according to another implementation manner according to an embodiment of the present invention. The face attribute recognition apparatus of this embodiment can be implemented in a cloud service cluster as shown in FIG. 7, and includes at least one computing node 910 and at least one storage node 920.

計算節點910包括一個或多個處理器911、通信介面912和記憶體913，處理器911、通信介面912和記憶體913之間可以通過匯流排914連接。The computing node 910 includes one or more processors 911, a communication interface 912, and a memory 913, and the processor 911, the communication interface 912, and the memory 913 may be connected through a bus 914.

處理器911包括一個或者多個通用處理器，其中，通用處理器可以是能夠處理電子指令的任何類型的設備，包括CPU、微處理器、微控制器、主處理器、控制器以及ASIC等等。它能夠是僅用於計算節點910的專用處理器或者能夠與其它計算節點910共用。處理器911讀取記憶體913中儲存的程式碼，與通信介面912配合執行本發明上述實施例中由人臉屬性識別裝置500執行的方法的部分或者全部步驟。The processor 911 includes one or more general-purpose processors, where the general-purpose processor may be any type of device capable of processing electronic instructions, including a CPU, a microprocessor, a microcontroller, a main processor, a controller, and an ASIC, etc. . It can be a dedicated processor used only for the computing node 910 or can be shared with other computing nodes 910. The processor 911 reads the program code stored in the memory 913, and cooperates with the communication interface 912 to execute part or all of the steps of the method executed by the face attribute recognition apparatus 500 in the foregoing embodiment of the present invention.

通信介面912可以為有線介面（例如乙太網介面），配置為與其他計算節點或用戶進行通信。當通信介面912為有線介面時，通信介面912可以採用傳輸控制協定/網際協定（Transmission Control Protocol/Internet Protocol，TCP/IP）之上的協定族，例如，機器人即服務（Robot as a Service，RAAS）協議、遠端函式呼叫（Remote Function Call，RFC）協定、簡單對象訪問協定（Simple Object Access Protocol，SOAP）協定、簡單網路管理協定（Simple Network Management Protocol，SNMP）協定、公共對象請求代理體系結構（Common Object Request Broker Architecture，CORBA）協定以及分散式協定等等。The communication interface 912 may be a wired interface (for example, an Ethernet interface), configured to communicate with other computing nodes or users. When the communication interface 912 is a wired interface, the communication interface 912 may adopt a protocol family over Transmission Control Protocol/Internet Protocol (TCP/IP), for example, Robot as a Service (RAAS). ) Protocol, remote function call (Remote Function Call, RFC) protocol, simple object access protocol (Simple Object Access Protocol, SOAP) protocol, simple network management protocol (Simple Network Management Protocol, SNMP) protocol, public object request agent Architecture (Common Object Request Broker Architecture, CORBA) agreement and decentralized agreement, etc.

記憶體913可以包括易失性記憶體，例如RAM；記憶體也可以包括非易失性記憶體，例如ROM、快閃記憶體、HDD或固態硬碟SSD記憶體還可以包括上述種類的記憶體的組合。The memory 913 may include volatile memory, such as RAM; the memory may also include non-volatile memory, such as ROM, flash memory, HDD, or solid-state drive. SSD memory may also include the above-mentioned types of memory. The combination.

儲存節點920包括一個或多個處理器911、通信介面912和記憶體913。其中，儲存控制器921、通信介面912和記憶體913之間可以通過匯流排924連接。The storage node 920 includes one or more processors 911, a communication interface 912, and a memory 913. Among them, the storage controller 921, the communication interface 912, and the memory 913 can be connected through a bus 924.

處理器911包括一個或者多個通用處理器，其中，通用處理器可以是能夠處理電子指令的任何類型的設備，包括CPU、微處理器、微控制器、主處理器、控制器以及ASIC等等。它能夠是僅用於儲存節點920的專用處理器或者能夠與其它儲存節點920共用。處理器911執行各種類型的數位儲存指令，例如儲存在記憶體913中的軟體或者固件程式，它能使儲存節點920提供較寬的多種服務。例如，處理器911能夠執行程式或者處理資料，以執行本文討論的方法的至少一部分。The processor 911 includes one or more general-purpose processors, where the general-purpose processor may be any type of device capable of processing electronic instructions, including a CPU, a microprocessor, a microcontroller, a main processor, a controller, and an ASIC, etc. . It can be a dedicated processor used only for the storage node 920 or can be shared with other storage nodes 920. The processor 911 executes various types of digital storage instructions, such as software or firmware programs stored in the memory 913, which enables the storage node 920 to provide a wide variety of services. For example, the processor 911 can execute programs or process data to perform at least a part of the methods discussed herein.

通信介面912可以為有線介面（例如乙太網介面），配置為與其他計算設備或使用者進行通信。The communication interface 912 may be a wired interface (for example, an Ethernet interface), configured to communicate with other computing devices or users.

儲存節點920包括一個或多個儲存控制器921、儲存陣列922。其中，儲存控制器921和儲存陣列922之間可以通過匯流排923連接。The storage node 920 includes one or more storage controllers 921 and a storage array 922. Among them, the storage controller 921 and the storage array 922 can be connected through a bus 923.

儲存控制器921包括一個或者多個通用處理器，其中，通用處理器可以是能夠處理電子指令的任何類型的設備，包括CPU、微處理器、微控制器、主處理器、控制器以及ASIC等等。它能夠是僅用於單個儲存節點920的專用處理器或者能夠與計算節點900或者其它儲存節點920共用。可以理解，在本實施例中，每個儲存節點包括一個儲存控制器，在其他的實施例中，也可以多個儲存節點共用一個儲存控制器，此處不作具體限定。The storage controller 921 includes one or more general-purpose processors, where the general-purpose processor may be any type of device capable of processing electronic instructions, including a CPU, a microprocessor, a microcontroller, a main processor, a controller, and an ASIC, etc. Wait. It can be a dedicated processor used only for a single storage node 920 or can be shared with the computing node 900 or other storage nodes 920. It can be understood that in this embodiment, each storage node includes a storage controller. In other embodiments, multiple storage nodes may share a storage controller, which is not specifically limited here.

儲存陣列922可以包括多個記憶體。記憶體可以是非易失性記憶體，例如ROM、快閃記憶體、HDD或SSD記憶體還可以包括上述種類的記憶體的組合。例如，儲存陣列可以是由多個HDD或者多個SDD組成，或者，儲存陣列可以是由HDD以及SDD組成。其中，多個記憶體在儲存控制器921的協助下按不同的方式組合起來形成記憶體組，從而提供比單個記憶體更高的儲存性能和提供資料備份技術。可選地，儲存陣列922可以包括一個或者多個資料中心。多個資料中心可以設置在同一個地點，或者，分別在不同的地點，此處不作具體限定。儲存陣列922可以儲存有程式碼以及程式資料。其中，程式碼包括特徵圖像提取單元的代碼、注意力特徵學習單元的代碼、人臉屬性識別單元的代碼以及注意力模型訓練單元的代碼。程式資料包括：大量人臉圖像、特徵圖像以及對應的注意力矩陣，以用於對特徵圖像提取模型以及注意模型進行訓練；程式資料還包括大量用於計算屬性識別結果的權重值等等。The storage array 922 may include multiple memories. The memory may be a non-volatile memory, for example, ROM, flash memory, HDD or SSD memory may also include a combination of the above types of memory. For example, the storage array may be composed of multiple HDDs or multiple SDDs, or the storage array may be composed of HDDs and SDDs. Among them, multiple memories are combined in different ways with the assistance of the storage controller 921 to form a memory group, thereby providing higher storage performance than a single memory and providing data backup technology. Optionally, the storage array 922 may include one or more data centers. Multiple data centers can be set up at the same location, or in different locations, and there is no specific limitation here. The storage array 922 can store program codes and program data. Among them, the program code includes the code of the feature image extraction unit, the code of the attention feature learning unit, the code of the face attribute recognition unit, and the code of the attention model training unit. The program data includes a large number of face images, feature images and the corresponding attention matrix, which are used to train the feature image extraction model and attention model; the program data also includes a large number of weight values used to calculate the attribute recognition results, etc. Wait.

其中，計算節點910通過調用儲存節點920中的程式碼，配置為執行以下步驟：計算節點910配置為獲取n 個特徵圖像，其中，n 個特徵圖像是對人臉圖像進行一次或者多次特徵提取後得到的，n 為正整數；計算節點910配置為獲取n 個特徵圖像中的每一個特徵圖像對應的屬性特徵，其中，屬性特徵為特徵圖像中用於表徵目標人臉屬性的特徵；計算節點910配置為根據屬性特徵，確定人臉圖像的目標人臉屬性的屬性識別結果。The computing node 910 is configured to perform the following steps by calling the program code in the storage node 920: The computing node 910 is configured to obtain n feature images, where n feature images are performed on the face image once or more. N is a positive integer obtained after sub-feature extraction; the computing node 910 is configured to obtain the attribute feature corresponding to each feature image in the n feature images, where the attribute feature is used to represent the target face in the feature image The feature of the attribute; the computing node 910 is configured to determine the attribute recognition result of the target face attribute of the face image according to the attribute feature.

計算節點910還配置為使用人臉圖像樣本以及特徵圖像樣本訓練得到特徵圖像提取模型；以及使用上述特徵圖像樣本以及對應的注意力矩陣樣本對神經網路進行訓練，獲得所述注意力模型。The computing node 910 is also configured to use face image samples and feature image samples to train to obtain a feature image extraction model; and use the aforementioned feature image samples and corresponding attention matrix samples to train the neural network to obtain the attention Force model.

多工人臉屬性學習是目前在視頻監控安防領域一個十分熱門且棘手的問題，在安防領域中起到至關重要的作用。比如可以通過表情屬性識別判斷一個人是否有攻擊性，通過配戴危險物識別判斷行人是否安全等。相關技術中可以利用一個多分支神經網路共用特徵，並對各個單獨的屬性（如配戴耳環，眼睛，表情）等進行分類。該網路非常簡單但是卻有明顯的缺點。以多工卷積神經網路為例，在該網路中，多工往往共用前網路所有的特徵，然後在卷積網路的最後一層進行GAP提取共用語義特徵F，並使用這個F進行不同的分類任務。這樣造成的問題是，並非所有屬性所需要的特徵都是最後一層的特徵，例如，一些類似配戴耳環、眼鏡等細細微性特徵可能需要比較淺層且帶有注意力機制的二維特徵，一些類似性別、表情等屬性則更需要深層的語義特徵。Multi-worker face attribute learning is currently a very hot and thorny problem in the field of video surveillance security, and it plays a vital role in the field of security. For example, it can be judged whether a person is aggressive through facial expression attribute recognition, and whether pedestrians are safe through identification of wearing dangerous objects. Related technologies can use a multi-branch neural network to share features and classify individual attributes (such as wearing earrings, eyes, and expressions). The network is very simple but has obvious shortcomings. Take a multiplexed convolutional neural network as an example. In this network, multiplexes often share all the features of the previous network, and then perform GAP in the last layer of the convolutional network to extract the shared semantic feature F, and use this F to perform Different classification tasks. The problem caused by this is that not all the features required by the attributes are the features of the last layer. For example, some subtle features such as wearing earrings and glasses may require relatively shallow two-dimensional features with an attention mechanism. Some attributes like gender and facial expressions require deeper semantic features.

本發明實施例在該網路的基礎上進行提升，提供一種可以更高效更準確識別人臉屬性的方法。該方法採用深度卷積神經網路（CNN）對人臉圖像特徵進行抽取，在不同層得到注意力特徵，然後利用這些注意力特徵進行分類，最後將所有分類結果進行加權平均，得到最後的分類結果。在實施時，可以把標準的神經網路分成多個不同尺度的區塊（block），事實上很多神經網路已經分好區塊，比如VGG網路和殘差網路；在不同尺度的區塊B_i 上學習出N個注意力模型，其中N表示N個任務；使用N個注意力機制進行屬性分類，在同一個屬性上，將各個區塊上的預測結果P_ij 進行加權平均得到最後的結果P_j _，其中，i為區塊的序號，j為屬性的序號。The embodiments of the present invention are improved on the basis of the network, and provide a method that can recognize face attributes more efficiently and accurately. This method uses deep convolutional neural network (CNN) to extract facial image features, obtain attention features in different layers, and then use these attention features to classify, and finally all classification results are weighted and averaged to get the final Classification results. In implementation, the standard neural network can be divided into multiple blocks of different scales. In fact, many neural networks have been divided into blocks, such as VGG networks and residual networks; in areas of different scales N attention models are learned on block B _i , where N represents N tasks; N attention mechanisms are used for attribute classification, and on the same attribute, the prediction results P _ij on each block are weighted and averaged to get the final The result of P _j _, where i is the serial number of the block and j is the serial number of the attribute.

圖10為本發明實施例提供的一種人臉屬性識別方法的實現流程示意圖，如圖10所示，將神經網路分成多個不同尺度的區塊101i，區塊101i為一個（N，W_i ，H_i ，C_i ）類型的張量，其中，i=1,2……，n，n為區塊的數量，W_i 為區塊101i對應的圖像寬度，H_i 為區塊101i對應的圖像長度，C為區塊101i對應的圖像通道數。針對待識別的人臉圖像，通過每一區塊101i進行特徵提取，可以生成相應的注意力圖102i，注意力圖102i為一個（N，W_i ，H_i ，1）類型的張量。將每一區塊101i與相應的注意力圖102i相乘得到一個注意力特徵，再對該注意力特徵通過GAP層1030進行屬性分類，得到預測結果104i，預測結果104i為一個（N，2）類型的張量。將所有預測結果104i進行加權平均，得到總的預測結果1040。最後，使用該總的預測結果進行人臉屬性分類。FIG. 10 is a schematic diagram of the implementation process of a face attribute recognition method provided by an embodiment of the present invention. As shown in FIG. 10, the neural network is divided into a plurality of blocks 101i of different scales, and the block 101i is one (N, W _i , H _i , C _i ) type tensor, where i=1,2..., n, n are the number of blocks, W _i is the image width corresponding to block 101i, and H _i is block 101i corresponding The image length of, C is the number of image channels corresponding to block 101i. For the face image to be recognized, feature extraction is performed on each block 101i, and a corresponding attention map 102i can be generated. The attention map 102i is a (N, W _i , H _i , 1) type tensor. Multiply each block 101i with the corresponding attention map 102i to obtain an attention feature, and then perform attribute classification on the attention feature through the GAP layer 1030 to obtain a prediction result 104i, which is a (N, 2) type Tensor. All prediction results 104i are weighted and averaged to obtain a total prediction result 1040. Finally, the total prediction result is used to classify the face attributes.

本發明實施例提供的人臉屬性識別方法，使用多尺度注意力機制深挖不同層的特徵，可以使用各個屬性在網路不同深度的特徵進行屬性分類，從而可以解決相關技術中分類時特徵不夠強的問題，對比相關技術中只用最後一層語義資訊做分類的方案，可以更高效更準確識別人臉屬性。The face attribute recognition method provided by the embodiment of the present invention uses a multi-scale attention mechanism to dig deep into the features of different layers, and can use the features of each attribute at different depths in the network for attribute classification, thereby solving the problem of insufficient features in the classification in related technologies. A strong problem, compared with the related technology that only uses the last layer of semantic information for classification, it can recognize face attributes more efficiently and accurately.

在上述實施例中，可以全部或部分地通過軟體、硬體、固件或者其任意組合來實現。當使用軟體實現時，可以全部或部分地以電腦程式產品的形式實現。所述電腦程式產品包括一個或多個電腦指令。在電腦上載入和執行所述電腦程式指令時，全部或部分地產生按照本發明實施例所述的流程或功能。所述電腦可以是通用電腦、專用電腦、電腦網路、或者其他可程式設計裝置。所述電腦指令可以儲存在電腦可讀儲存介質中，或者從一個電腦可讀儲存介質向另一個電腦可讀儲存介質傳輸，例如，所述電腦指令可以從一個網站網站、電腦、伺服器或資料中心通過有線（例如同軸電纜、光纖、數位用戶線路）或無線（例如紅外、無線、微波等）方式向另一個網站網站、電腦、伺服器或資料中心進行傳輸。所述電腦可讀儲存介質可以是電腦能夠存取的任何可用介質或者是包含一個或多個可用介質集成的伺服器、資料中心等資料存放裝置。所述可用介質可以是磁性介質，（例如，軟碟、儲存盤、磁帶）、光介質（例如，數位多功能盤（Digital Video Disc，DVD））、或者半導體介質（例如，SSD）等。在所述實施例中，對各個實施例的描述都各有側重，某個實施例中沒有詳述的部分，可以參見其他實施例的相關描述。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present invention are generated in whole or in part. The computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be from a website, computer, server, or data. The center transmits to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like integrated with one or more available media. The usable medium may be a magnetic medium, (for example, a floppy disk, a storage disk, and a magnetic tape), an optical medium (for example, a Digital Video Disc (DVD)), or a semiconductor medium (for example, SSD). In the embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

在本發明所提供的幾個實施例中，應該理解到，所揭露的裝置，也可以通過其它的方式實現。例如以上所描述的裝置實施例僅是示意性的，例如所述單元的劃分，僅僅為一種邏輯功能劃分，實際實現時可以有另外的劃分方式，例如多個單元或元件可結合或者可以集成到另一個系統，或一些特徵可以忽略或不執行。另一點，所顯示或討論的相互之間的間接耦合或者直接耦合或通信連接可以是通過一些介面，裝置或單元的間接耦合或通信連接，可以是電性或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed device may also be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or elements can be combined or integrated into Another system, or some features can be ignored or not implemented. In addition, the displayed or discussed indirect coupling or direct coupling or communication connection between each other may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.

所述作為分離部件說明的單元可以是或者也可以不是物理上分開的，作為單元顯示的部件可以是或者也可以不是物理單元，即可以位於一個地方，或者，也可以分佈到多個網路單元上。可以根據實際的需要選擇其中的部分或者全部單元來實現本發明實施例的方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. on. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present invention.

另外，在本發明各實施例中的各功能單元可集成在一個處理單元中，也可以是各單元單獨物理存在，也可以是兩個或兩個以上單元集成在一個單元中。所述集成的單元既可以採用硬體的形式實現，也可以採用軟體功能單元的形式實現。In addition, the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit can be implemented in the form of hardware or software functional unit.

所述集成的單元如果以軟體功能單元的形式實現並作為獨立的產品銷售或使用時，可以儲存在一個電腦可讀取儲存介質中。基於這樣的理解，本發明技術方案本質上或者說對現有技術做出貢獻的部分或者該技術方案的全部或部分可以以軟體產品的形式體現出來，該電腦軟體產品儲存在一個儲存介質中，包括若干指令用以使得一台電腦設備（可為個人電腦、伺服器或者網路設備等）執行本發明各個實施例所述方法的全部或部分步驟。而前述的儲存介質例如可包括：U盤、移動硬碟、唯讀記憶體、隨機存取記憶體、磁碟或光碟等各種可儲存程式碼的介質。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium may include, for example, various media capable of storing program codes, such as a U disk, a removable hard disk, a read-only memory, a random access memory, a floppy disk, or an optical disk.

以上所述，僅為本發明的具體實施方式，但本發明的保護範圍並不局限於此，任何熟悉本技術領域的技術人員在本發明揭露的技術範圍內，可輕易想到各種等效的修改或替換，這些修改或替換都應涵蓋在本發明的保護範圍之內。因此，本發明的保護範圍應以申請專利範圍的保護範圍為準。The above are only specific embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily think of various equivalent modifications within the technical scope disclosed by the present invention. Or replacement, these modifications or replacements should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the patent application.

工業實用性本發明涉及一種人臉屬性識別方法、電子設備和電腦可讀儲存介質，所述方法包括：獲取n個特徵圖像，其中，所述n個特徵圖像是對人臉圖像進行一次或者多次特徵提取後得到的，n為正整數；獲取所述n個特徵圖像中的每一個特徵圖像對應的屬性特徵，其中，所述屬性特徵為所述特徵圖像中用於表徵目標人臉屬性的特徵；根據所述屬性特徵，確定所述人臉圖像的目標人臉屬性的屬性識別結果。根據本發明實施例的人臉屬性識別方法，可以利用不同特徵圖像對應的屬性特徵來預測人臉屬性，可以提高人臉屬性識別的準確性以及識別效率。Industrial applicability The present invention relates to a face attribute recognition method, an electronic device and a computer-readable storage medium. The method includes: obtaining n feature images, wherein the n feature images are performed on the face image once or more. Obtained after sub-feature extraction, n is a positive integer; acquiring the attribute feature corresponding to each feature image in the n feature images, where the attribute feature is used to characterize the target person in the feature image The feature of the face attribute; according to the attribute feature, the attribute recognition result of the target face attribute of the face image is determined. According to the face attribute recognition method of the embodiment of the present invention, the attribute features corresponding to different feature images can be used to predict the face attributes, which can improve the accuracy and efficiency of face attribute recognition.

110:輸入層 120:共用特徵學習層 130:全連接層 210:輸入層 220:卷積層 230:注意力學習層 240:全域平均池化層 250:識別層 500:人臉屬性識別裝置 510:特徵圖像提取單元 520:屬性特徵提取單元 530:人臉屬性識別單元 540:特徵圖像提取模型訓練單元 550:注意力模型訓練單元 600:計算結點 610:處理器 611:特徵圖像提取單元 612:屬性特徵提取單元 613:人臉屬性識別單元 614:特徵圖像提取模型訓練單元 615:注意力模型訓練單元 620:記憶體 630:通信網路 640:管理介面 650:使用者介面 700:雲服務集群 710:計算節點 711:特徵圖像提取單元 712:屬性特徵提取單元 713:人臉屬性識別單元 714:特徵圖像提取模型訓練單元 715:注意力模型訓練單元 720:儲存節點 730:通信介面 740:管理介面 750:使用者介面 800:人臉屬性識別裝置 810:處理器 820:通信介面 830:記憶體 840:匯流排 900:計算節點 910:計算節點 911:處理器 912:通信介面 913:記憶體 914:匯流排 920:儲存節點 921:儲存控制器 922:儲存陣列 923:匯流排 924:匯流排 S101~S104:步驟110: Input layer 120: Shared feature learning layer 130: Fully connected layer 210: Input layer 220: Convolutional layer 230: Attention Learning Layer 240: Global average pooling layer 250: recognition layer 500: Face attribute recognition device 510: Feature image extraction unit 520: Attribute feature extraction unit 530: face attribute recognition unit 540: Feature image extraction model training unit 550: Attention Model Training Unit 600: compute node 610: processor 611: Feature image extraction unit 612: Attribute feature extraction unit 613: Face Attribute Recognition Unit 614: Feature image extraction model training unit 615: Attention Model Training Unit 620: memory 630: Communication Network 640: Management Interface 650: User Interface 700: Cloud service cluster 710: compute node 711: Feature image extraction unit 712: Attribute Feature Extraction Unit 713: Face Attribute Recognition Unit 714: Feature image extraction model training unit 715: Attention Model Training Unit 720: storage node 730: Communication interface 740: Management Interface 750: User Interface 800: Face attribute recognition device 810: processor 820: Communication interface 830: memory 840: Bus 900: compute node 910: compute node 911: processor 912: Communication Interface 913: Memory 914: Bus 920: storage node 921: Storage Controller 922: Storage Array 923: Bus 924: Bus S101~S104: steps

為了更清楚地說明本發明實施例技術方案，下面將對實施例描述中所需要使用的附圖作簡單地介紹，顯而易見地，下面描述中的附圖是本發明的一些實施例，對於本領域普通技術人員來講，在不付出創造性勞動的前提下，還可以根據這些附圖獲得其他的附圖。圖1是相關技術中一種基於MTCNN對人臉屬性進行識別的過程示意圖。圖2是本發明實施例提供的一種人臉屬性識別神經網路的結構示意圖。圖3是本發明實施例提供的一種人臉屬性識別方法的流程示意圖。圖4是本發明實施例提供的一種對目標對象是否配戴眼鏡、性別以及表情進行識別的流程示意圖。圖5是本發明實施例提供的一種人臉屬性識別裝置的結構示意圖。圖6是本發明實施例提供的一種計算節點的結構示意圖。圖7是本發明實施例提供的一種雲服務集群的結構示意圖。圖8是本發明實施例提供的另一種人臉屬性識別裝置的結構示意圖。圖9是本發明實施例提供的另一種人臉屬性識別裝置的結構示意圖。圖10是本發明實施例提供的一種人臉屬性識別方法的實現流程示意圖。In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present invention. Ordinary technicians can obtain other drawings based on these drawings without creative work. Figure 1 is a schematic diagram of a process of identifying face attributes based on MTCNN in related technologies. Fig. 2 is a schematic structural diagram of a facial attribute recognition neural network provided by an embodiment of the present invention. FIG. 3 is a schematic flowchart of a method for identifying face attributes according to an embodiment of the present invention. FIG. 4 is a schematic diagram of a process for recognizing whether a target object wears glasses, gender, and expression provided by an embodiment of the present invention. Fig. 5 is a schematic structural diagram of a face attribute recognition device provided by an embodiment of the present invention. Fig. 6 is a schematic structural diagram of a computing node provided by an embodiment of the present invention. Fig. 7 is a schematic structural diagram of a cloud service cluster provided by an embodiment of the present invention. Fig. 8 is a schematic structural diagram of another face attribute recognition apparatus provided by an embodiment of the present invention. Fig. 9 is a schematic structural diagram of another face attribute recognition device provided by an embodiment of the present invention. FIG. 10 is a schematic diagram of the implementation process of a method for identifying face attributes according to an embodiment of the present invention.

S101~S104:步驟S101~S104: steps

Claims

A method for face attribute recognition includes: obtaining n feature images, where the n feature images are obtained after one or more feature extractions are performed on the face image, and n is a positive integer; The attribute feature corresponding to each feature image in the n feature images, where the attribute feature is a feature used to characterize the attributes of the target face in the feature image; and the attribute feature is determined according to the attribute feature The attribute recognition result of the target face attribute in the face image.

The method of claim 1 request entries, wherein each of said acquiring a characteristic image corresponding to the characteristic attributes of the n feature images, comprising: a feature of each of the n image feature image The corresponding attention models are input respectively to obtain the attention matrix corresponding to each of the feature images; and the attribute feature corresponding to each feature image of the n feature images is extracted according to the attention matrix.

The method according to claim 2, wherein extracting the attribute characteristics corresponding to each of the n characteristic images according to the attention matrix includes: combining each of the n characteristic images The feature image and the corresponding attention matrix are multiplied to obtain the attribute feature corresponding to each feature image of the n feature images.

The method according to any one of claim items 1 to 3, wherein the determining the attribute recognition result of the target face attribute in the face image according to the attribute feature includes: According to each of the attribute characteristics, the corresponding recognition result is predicted; A weighted average calculation is performed on the recognition result to determine the attribute recognition result of the target face attribute in the face image.

The method according to claim 4, wherein the predicting and obtaining the corresponding recognition result respectively according to each of the attribute characteristics includes: Attribute prediction processing is performed on each of the attribute characteristics, and the recognition result corresponding to the attribute characteristic is predicted.

The method according to claim 5, wherein the attribute prediction processing includes global average pooling processing.

The method according to any one of claim items 1 to 3, wherein, before the obtaining the n feature images, the method further includes: obtaining the face image; Perform feature extraction on the face image to obtain the n feature images.

The method according to claim 7, wherein the feature image extraction model is obtained by training on face image samples and feature image samples, and the method further includes: Use the characteristic image samples and the corresponding attention matrix samples to train the neural network to obtain the attention model.

The method according to any one of claim items 1 to 3, wherein the n characteristic images further include the face image.

A computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by hardware to implement the method described in any one of claim items 1 to 9.

An electronic device includes a processor and a memory, and the processor executes the code in the memory to implement the method described in any one of claim items 1 to 9.