CN105426850B

CN105426850B - Associated information pushing device and method based on face recognition

Info

Publication number: CN105426850B
Application number: CN201510819092.0A
Authority: CN
Inventors: 张广程; 罗予晨; 曹强; 刘祖希; 于志兴; 周璐璐; 向许波; 张果琲; 马堃
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2015-11-23
Filing date: 2015-11-23
Publication date: 2021-08-31
Anticipated expiration: 2035-11-23
Also published as: CN105426850A

Abstract

The equipment comprises a shell, a camera device, a display screen, a processor, a memory, a data transmission device, a power supply and the like. The processor further comprises a face recognition unit and an associated information retrieval unit, and the memory comprises a face database and an associated information storage unit. The camera device of the equipment sends image information detected in real time to the face recognition unit, the unit recognizes the face information and the attribute value thereof through a face recognition algorithm and sends the result to the associated information retrieval unit, and the unit acquires the associated information of the face information and the attribute value through retrieving internal and external resources and finally sends the associated information to the display screen for display. The device can provide more accurate and personal product or service push aiming at different users.

Description

Associated information pushing device and method based on face recognition

Technical Field

The application relates to the field of image processing and information fusion, in particular to associated information pushing equipment and method based on face recognition.

Background

The face recognition technology in the early days can only recognize the size and position of the face in a static picture, and with the development of the technology, the face recognition technology has further developed to accurately recognize the face in a dynamic video, and further recognize and locate the five sense organs of the face, and the attributes of the face, such as age, sex, expression, whether to wear glasses, whether to wear a mask, length of eyebrow, thickness, length of hair, size of eyes, open or closed eyes, type of beard, and the like.

As the technology is becoming more sophisticated, the application of face recognition technology is also becoming larger. The human face is used as part of human body biological characteristics, has uniqueness, can be used for identity confirmation, and can be used as biological information such as fingerprints and irises to be applied to the field of information security. The face recognition technology is used as part of image processing, and is also widely applied to cameras and image processing software. In addition, gestures, human face detection and the like are used as input at present, and the method is widely applied to the entertainment field such as games. The human face detection is used as input, so that the interactivity of the game is more interesting and natural.

In recent years, there has been a trend toward the application of face recognition technology to the media field. Compared with common plane media, the dynamic interactive media is realized by using the face recognition technology, so that the media positioning is more accurate, the product delivery is more targeted, and the effect is more obvious.

Disclosure of Invention

The technical problem to be solved by the application is to provide a device and a method for retrieving and acquiring the associated information corresponding to the information by recognizing the face information, so that better use experience is provided for a user, and the popularization efficiency of media is improved. The device provides a brand-new man-machine interaction mode. The user merely needs to stand in front of the device, and the device returns some tailored information associated with the user's facial information that the user wishes to know through the captured facial information of the user.

The related information pushing equipment based on face recognition comprises a camera device, a display screen and a processor, wherein the processor further comprises a face recognition unit and a related information retrieval unit; the camera device is used for acquiring image information and sending the image information to the face recognition unit;

the face recognition unit recognizes at least one piece of face information and an attribute value thereof by detecting the image information output by the camera device, and sends the recognized face information and the attribute value thereof to the associated information retrieval unit;

and the associated information retrieval unit retrieves and acquires the associated information corresponding to the attribute value and sends the associated information to a display screen for display.

In a preferred embodiment, the face recognition unit includes:

the collecting and labeling unit is used for collecting the face pictures and labeling the corresponding categories of the attributes to form a training data set;

the system comprises a detection alignment unit, a comparison unit and a display unit, wherein the detection alignment unit is used for detecting a human face and human face key points and aligning the human face through a plurality of key points;

an encoding unit configured to encode attributes having an order in a category;

a neural network construction unit for constructing a deep neural network;

the neural network training unit is used for training the deep neural network in the neural network construction unit by utilizing the training data set formed in the collecting and labeling unit and deploying the neural network model obtained by training;

and the face attribute recognition unit is used for predicting the face information and the attribute value in the picture by using the neural network model in the neural network training unit.

In another preferred embodiment, the face recognition unit includes:

the calculating unit is used for calculating the pyramid characteristic layer number and the scaling ratio, and calculating the pyramid characteristic layer number n _ scales and the scaling ratio S according to the size of an image to be recognized and a minimum scaling width and height parameter obtained based on face model training, wherein the scaling ratio needs to ensure the area maximization of a scaling region;

the extraction unit is used for extracting feature data according to face classification of the pyramid feature layer, dividing the pyramid feature layer into a complete extraction layer and an approximate extraction layer, and performing complete extraction and approximate extraction respectively to obtain feature data according to each layer of face classification;

the complete extraction is to obtain a zoomed image from an image to be recognized based on the current obtained zoom ratio, and calculate feature data according to face classification according to a face model by taking the obtained zoomed image as input;

the approximate extraction is to scale feature data on which the face classification obtained from the complete extraction layer is based;

the classification unit is used for classifying feature data according to face classification of pyramid feature layers and classifying pyramid feature data of each layer by applying a classification algorithm, wherein the pyramid feature data comprise complete extraction layer data and approximate extraction layer data;

face recognition result unit: and combining the classification results of each layer in the classification extraction data unit to obtain final face information and attribute values thereof.

The application relates to a method for pushing associated information based on face recognition, which comprises the following steps of;

step S1: acquiring image information;

step S2: detecting the image information, judging whether face information exists or not, if so, further identifying at least one piece of face information and an attribute value, and if not, waiting for the next input;

step S3: the associated information corresponding to the face information and the attribute value in step S2 is retrieved and acquired.

Preferably, the method further comprises:

step S4: the association information obtained in step S3 is displayed.

In a preferred embodiment, the recognizing the face information and the attribute value further includes:

s21: collecting face pictures and labeling corresponding categories of multiple attributes to form a training data set;

s22: detecting a human face and human face key points, and aligning the human face through a plurality of key points;

s23: encoding attributes in the categories having an order;

s24: constructing a deep neural network;

s25: training the deep neural network designed in the step S24 by using the training data set formed in the step S21, and deploying a neural network model obtained through training;

s26: the face information and the attribute value thereof in the picture are predicted by using the neural network model in step S25.

In another preferred embodiment, the recognizing the face information and the attribute value further includes:

s21, calculating pyramid feature layer number and scaling ratio, and calculating pyramid feature layer number n _ scales and scaling ratio S according to the size of an image to be recognized and a minimum scaling width and height parameter obtained based on face model training, wherein the scaling ratio needs to ensure the area maximization of a scaling region;

s22, extracting feature data of the pyramid feature layer according to the face classification, dividing the pyramid feature layer into a complete extraction layer and an approximate extraction layer, and performing complete extraction and approximate extraction respectively to obtain feature data of each layer of face classification;

s23, classifying feature data of the pyramid feature layers according to the face classification, and classifying the pyramid feature data of each layer by applying a classification algorithm, wherein the pyramid feature data comprise complete extraction layer data and approximate extraction layer data;

and S24, obtaining the face information and the attribute value thereof, and combining the classification result of each layer to obtain the face information and the attribute value thereof.

Preferably, before the step S21, the method further includes: s20, preprocessing the image, converting the color space and the row and column format of the image to be recognized into the color space and the row and column format required by the human face model.

Preferably, the method further comprises: and S25, establishing a corresponding relation between the obtained face information and the attribute value thereof in the corresponding image area.

According to the method and the device, the unique face information and the attribute value of the face information of the user are obtained by using a DSP-based rapid and efficient face recognition algorithm in an embedded system, internal and external resources can be searched through the face information and the attribute value of the face information, product or service information most relevant to the attribute value is found and displayed on a screen seen by the user, man-machine interaction experience is well achieved, and the product and service push is more accurate and has pertinence and personal characteristics.

Drawings

Fig. 1 is an overall framework diagram of an associated information pushing device based on face recognition;

FIG. 2 is a functional structure diagram of a face recognition unit based on a depth algorithm;

FIG. 3 is a functional structure diagram of a face recognition unit based on a pyramid algorithm;

FIG. 4 is an entity diagram of an associated information pushing device based on face recognition;

FIG. 5 is a schematic diagram of a deep neural network;

fig. 6 is a flowchart of an association information pushing method based on face recognition.

Detailed Description

In one embodiment, the present invention discloses: and the associated information pushing equipment is based on face recognition.

The present application will be described in further detail below with reference to fig. 1-5.

As shown in fig. 1 and 4, the associated information pushing device 1 based on face recognition is composed of an outer part 2 and a main body 3, wherein the outer part 2 comprises a shell 21, a camera 22, a display screen 23, and a processor 31, a memory 32, a data transmission device 33, a power supply 34 and the like are arranged in the main body 3. The associated information push device 1 can be suspended on a wall surface or placed on the ground through a base. The bottom of the housing 21 has an opening for the input of the power circuit. The display 23 may be a conventional lcd display, or may be other conventional displays such as CRT display, plasma display, or conventional touch display such as resistive touch screen, capacitive touch screen, infrared touch screen or surface acoustic wave touch screen. If the display screen 23 is a touch display screen, in addition to displaying images and information output by the processor 31, touch input or gesture control from the outside can be received, so that human-computer interaction is realized.

A camera device 22, which may be one or more cameras 22, may be located in the middle of the top of the exterior portion 2 or other location that facilitates the acquisition of images appearing in front of the display screen 23, and sends the acquired image or video information to the processor 31.

The processor 31 includes a calculation unit 311, a face recognition unit 312, and an association information retrieval unit 313. The memory includes a face database 321 and an associated information storage unit 322. Wherein the camera 22 sends the image sequence or video stream information in the environment in front of the display 23, acquired in real time, to the face recognition unit 312 of the processor 31.

The unit detects an input image sequence or video stream, judges whether face information exists or not, further identifies the face information and the face attribute if the face information exists, and waits for the next input if the face information does not exist. The face information and its attribute value may be at least one or more, depending on the image processing capability and the presence of several persons.

Furthermore, the device may also be in other structural forms than the outer part 2 and the body 3, for example, a device composed of different components fixed or installed in different positions or areas, as long as the push of the associated information and the interaction with the user are facilitated.

When detecting an image sequence or a video stream, all received images may be detected, or preferably, images may be detected. When the image sequence is a single frame, the image itself is a key frame; and when the image sequence is a plurality of frames, selecting N frames with good quality from the sequence as key frames. The quality judgment can be carried out by scoring indexes such as the definition and the size of the face picture, a real face, shielding, illumination and the like, and selecting the first N frames with high scores as key frames. If the input is a video stream, face detection is preferably performed every 6 frames. Preferably, for the video stream decoding process, 4: 2: YUV data of 0 and stored on the memory after formatting.

When detecting an image, the face recognition unit 312 extracts information such as a face position, a face key point, and a face attribute value in the image, where the face key point may include position information such as an eye corner, an end of an eyebrow, a mouth corner, and a nose tip, and the face attribute value includes an appearance attribute such as a user gender, an age, whether glasses are worn, a hat, a mask, and an expression. The related information in the invention can have one-to-one, many-to-many and many-to-one mapping relation with the appearance attributes such as gender, age, whether wearing glasses, hats, masks and expressions; the mapping may be further adjusted by one or more thresholds or a sequence of thresholds. The face recognition unit 312 sends the extracted face and the attribute value thereof to the association information retrieval unit 313, and sends the extracted face and the attribute value information thereof to the display screen 23 for display and storage in the face database 321 of the memory.

Particularly, in another embodiment, when the associated information is sent to the display screen for display, when the associated information is displayed, the original face or the beautification-processed face in the image information is also displayed at the same time, and the face display area is set in the associated information display area. The associated information and the face can be respectively displayed in different areas, wherein the face can be a face image originally captured by a camera or a face image processed by beautification processing, image identification processing or other processing modes. When displaying: the area displaying the human face and the area displaying the related information can be isolated and are not communicated with each other; or the area for displaying the face is positioned in the area for displaying the associated information, and the face is surrounded by the associated information, wherein the former is favorable for displaying various different associated information, and the latter can highlight that the face is positioned in the associated information, thereby being favorable for focusing visually.

The face recognition unit 312 may be a face recognition and detection device known in the art, such as a face recognition device based on conventional machine learning, or a face recognition device based on deep learning.

Referring to fig. 2, in a preferred embodiment, the face recognition unit 312 is a face recognition unit based on deep learning and multi-task learning. The unit specifically includes:

1) the collecting and labeling unit 3120a is configured to collect face pictures and label corresponding categories of multiple attributes to form a training data set.

The classes of the face attributes are composed of local attributes and global attributes. Local attributes include, but are not limited to, hair color, hair length, eyebrow thickness or thinning, eye size, eyes open or closed, nose bridge height, mouth size, mouth open or closed, whether glasses are worn, whether sunglasses are worn, whether a mask is worn, etc. Global attributes include, but are not limited to, race, gender, age, color value, expression, and the like.

And manually marking the corresponding attributes of the collected face pictures, and forming a training data set according to the categories corresponding to the attributes.

2) The detection alignment unit 3121a is configured to detect a face and face key points, and align the face through a plurality of key points.

The key points of the human face comprise position information of canthus, tail ends of eyebrows, corners of mouth, nose tips and the like.

The face is detected using an AdaBoost classifier (adaptive enhancement classifier) or a conventional deep learning face detection algorithm. Different face photos, with different poses. Each face first detects its key points, calculates an affine or similarity transformation from the face to a standard face based on its key points, and aligns the face to the standard face.

3) An encoding unit 3122a is used for encoding the attributes with sequence in the category.

The encoding mode is exemplified by an age attribute, and for the age a, the encoding mode is one of the following forms or a combination of the forms.

(1) Encoding as x1x2 … xi …, where xi is a binary number, xi is 1 if i is less than or equal to a, and xi is 0 if i is greater than a.

(2) Encoded as x1x2 … xi …, where xi is a binary value, xi is 1 if i is equal to a divided by k, otherwise xi is 0. Where k may be any positive integer, either manually defined or randomly selected.

The encoding method can be applied to any attribute, preferably to attributes having an order.

4) The neural network constructing unit 3123a is used for constructing a deep neural network, the front end of which may be any combination of a convolutional layer (convolutional layer), a pooling layer (pooling layer) and a non-linear layer (non-linear layer), and the rear end of which is a loss layer such as softmax, crossntropy.

The structure of the deep neural network is shown in fig. 5.

A is an input layer, which mainly reads in a face picture, an attribute type and an attribute type code, and outputs an aligned face picture, attribute type or code thereof by preprocessing the face picture. The A-layer input layer outputs the preprocessed human face picture to the B-layer convolution layer. And simultaneously, the A-layer input layer inputs the preprocessed attribute categories and/or the codes thereof to the G-layer loss layer.

The layer B is a convolution layer, the input of which is a preprocessed image or image characteristic, and a new characteristic is output through linear transformation. The output of which is characterized by the input of the C nonlinear layer.

The layer C is a nonlinear layer, and the nonlinear transformation is carried out on the input characteristics through a nonlinear function, so that the output characteristics have stronger expression capability. The output of the non-linear layer C is characterized by the input of the pooling layer D.

D is a Pooling layer (Pooling layer) that can map multiple values to one value. The layer can not only further strengthen the nonlinearity of the learned features, but also make the spatial size (spatial size) of the output features smaller, while the features extracted from the translation invariance of the strengthened learned features, namely the face translation, remain unchanged. The output characteristics of pooling layer D may again be used as an input to convolutional layer B or fully-connected layer E.

As shown in fig. 5, the large box outside the B, C, D layers indicates that the B, C, D layers may be repeated one or more times, i.e. the convolutional layer B, the combination of the non-linear layer C and the pooling layer D may be repeated one or more times, and each time the output of the pooling layer may be re-input to the convolutional layer B. The multiple combination of the three layers B, C and D can better process the input pictures, so that the characteristics of the pictures have the best expression capability.

And E is a full connection layer, which performs linear transformation on the input of the full connection layer and projects the learned characteristics to a better subspace to facilitate attribute prediction. The output characteristics of the fully connected layer E are used as inputs to the non-linear layer F.

F is a nonlinear layer, and the nonlinear layer C performs nonlinear transformation on the input features as well. Its input characteristics may be the input of the lossy layer G or the input of the fully connected layer E.

As shown in fig. 5, the full connection of the large boxes outside the layer E and the non-linear layer F indicates that the layer E and the layer F may be repeated one or more times.

G is one or more lossy layers responsible for calculating the error of the predicted attribute class or its encoding with the input attribute class or its encoding.

In general, input layer A is responsible for simple processing of input. And the combination of the convolutional layer B, the nonlinear layer C and the pooling layer D is responsible for the feature extraction of the picture. Fully connected layer E and non-linear layer F are mappings of features to attribute classes or encodings thereof. The lossy layer G is responsible for calculating the prediction error. The extracted features are ensured to have rich expression capability through the multilayer design of the deep neural network, and the attributes are better predicted. Meanwhile, the plurality of attribute categories and codes are simultaneously connected with the loss layer G, so that a plurality of tasks can learn at the same time, and the characteristics learned by the deep network can be shared.

5) The neural network training unit 3124a is configured to train a deep neural network in the neural network construction unit by collecting a training data set formed in the labeling unit, and deploy a neural network model obtained through training.

And training to obtain the depth network parameters in the neural network construction unit through a backward-transmitted gradient descent algorithm. Replacing the input layer a such that it inputs only pictures; the lossy layer G is replaced so that it only inputs features and face attributes. Thereby obtaining an input face picture and simultaneously outputting a deep neural network model with a plurality of attributes of the face.

6) And the face attribute recognition unit 3125a is configured to recognize face information and attribute values thereof in the picture through a neural network model in the neural network training unit.

Referring to fig. 4, in another preferred embodiment, the face recognition unit 312 is a DSP-based face recognition unit in an embedded system, and specifically includes:

the calculating unit 3121b is configured to calculate the number of pyramid feature layers and a scaling ratio, specifically calculate the number of pyramid feature layers n _ scales and the scaling ratio S according to the size of an image to be recognized and a minimum scaling width and height parameter obtained based on face model training, where the scaling ratio needs to ensure that an area of a scaling region is maximized;

preferably, the calculation formula of the pyramid feature layer number n _ scales is as follows:

n_scales＝num×(log(ratio)/log(2.0)+1)

in the formula: num refers to the pyramid feature layer number obtained based on the face model training, and is mainly used for determining the image zooming times according to the image size, namely a variable for determining the pyramid feature layer number; wherein:

ratio＝min(w/min_w,h/min_h)

in the formula: min _ w refers to the minimum image width in the face model, and min _ h refers to the minimum image height in the face model; w denotes: width of image to be recognized, h denotes: the height of the image to be identified.

The extracting unit 3122b is configured to extract feature data according to which the face of the pyramid feature layer is classified: dividing the pyramid feature layer into a complete extraction layer and an approximate extraction layer, and respectively performing complete extraction and approximate extraction to obtain feature data according to each layer of face classification;

preferably, the complete extraction comprises:

s201: acquiring a zoomed image from the image to be identified based on the zoom coefficient obtained by the previous step;

s202: and taking the image obtained in the step S201 as input, and calculating feature data according to the face classification according to the face model.

Regardless of the manner in which the full extraction is performed, the approximate extraction is further operated on the basis of the results of the full extraction.

In a preferred embodiment, an approximate extraction operation is provided: namely: the approximate extraction is based on the complete extraction layer, and approximate extraction data of a lambda layer are calculated, namely if the complete extraction layer is on the Nth layer, the approximate extraction is to calculate the approximate extraction data on the N + 1-N + lambda layer; wherein, λ is an acceleration ratio parameter of the face model, and is a division standard for dividing the pyramid layer number into two categories, namely a complete extraction layer and an approximate extraction layer. The purpose of the approximate extraction is to accelerate the extraction process of the whole pyramid feature data and reduce the total data calculation amount.

Further, the scaling ratio S is calculated by:

S＝(S_a/S_r)-β_j

in the formula: s_a、S_rApproximate extraction layer, scaling with respect to the original image size resulting from the full extraction layer, respectively; beta is a_jIs the j-th characteristic type parameter, beta, on which face classification is based_jIs related to the type of feature data upon which a particular face classification is based.

In the above full extraction and approximate extraction, the feature data according to which the face is classified is mentioned, and the contents of the data are: the feature data according to which the face classification is based includes one or more combinations of color channels, gradient magnitudes, and gradient histograms.

In the complete extraction, any three characteristic values can be selected for calculation, namely 1-3 calculations can be selected. The more selections, the more accurate the detection result. For example, in one embodiment, the features according to which the face classification is selected include color channels, gradient magnitudes, and gradient histograms, then β_jAs the parameters of the face model, the value range of j is 0,1, 2. If the color channel and the gradient amplitude are selected to be calculated, 0,1 is selected; and if the selection is complete, 0,1 and 2 are selected. However, in the calculation, the number of calculations for the approximate extraction needs to be consistent with the full extraction. As for the calculation, the principle of selecting specific feature types and feature numbers is based on the balance between the detection accuracy and the detection time, and the two are inversely proportional.

The classification unit 3123b classifies feature data according to which the face of the pyramid feature layer is classified: classifying the pyramid feature data of each layer by applying a classification algorithm; the pyramid feature data comprises complete extraction layer data and approximate extraction layer data;

in a preferred embodiment, the classification is based on an AdaBoost classification algorithm of a decision tree, and the face model matching of the sliding window is realized through a five-layer full binary tree structure, so as to obtain a matching result meeting a threshold condition.

The number of layers is the optimal solution determined according to the training model result; the face model used for matching is a model that has been trained. The matching result may be represented by a score, for example, each node in the binary tree has a corresponding score, when the image to be detected is matched with a certain node, the score of the image to be detected is increased by the score of the node, and when the image to be detected is not matched, the score is unchanged. After the matching with each node is completed, the high and low of the matching degree are represented by the high and low of the final score. The score of each node can be manually specified in advance, but the preferred mode is that the score is obtained by training through establishing a corresponding model.

Further, the score of each node in the binary tree is obtained by training.

Further preferably, the matching result appears in a scoring form, and before merging, classifying and extracting, the window position, score and area information are subjected to accepting and rejecting judgment. Namely: before the face recognition result unit 3124 combines the classification results of each layer, it further performs a cut-off judgment on the classification result of each layer according to the window position, score, and area information, that is: if the values of the window position, the score and the area information are lower than the judgment threshold value of the human face, the values are discarded; wherein: the merging includes: merging the human faces at the same position; the threshold value comprises parameters set in the face model. Through the processing in this apparatus embodiment, the final face information and its attribute value R are obtained. The characterization mode of R is as follows:

and R [ { n, [ (x, y), (w, h), angle, score ] [ ] } wherein n is the number of detected faces, and the content in the following [ ] represents each piece of detected face information, including the coordinate position (x, y) relative to the vertex at the upper left corner of the image, the width and height (w, h) corresponding to the area of the region where the face is located, the face angle and the detection scoring result score.

The face recognition result unit 3124 b: and combining the classification results of each layer in the classification extraction data unit to obtain the face information and the attribute value thereof.

The face information and the attribute value thereof comprise face information, face key point information and face attribute value. The face recognition unit 312 can effectively increase the operation speed on the premise of ensuring the detection accuracy by classifying the result on the basis of the rapid multi-scale pyramid feature data extraction. In the calculation unit 3121b, in order to make the scaled image of each layer smoother, fine adjustment is required to make the scaling factor change smoothly between the scaled image sizes and ensure the maximization of the area to be scaled. In the extraction unit 3122b, to speed up the feature extraction process and reduce the total data computation amount, the total pyramid feature layer number is divided into two categories, full extraction and approximate extraction, and the division criterion is the scaling S. In the classification unit 3123b, feature data according to which multiple layers of human faces are classified is obtained, and it is ensured that accurate human face information and attribute values thereof are obtained in the human face recognition result unit 3124 b. In the face recognition result unit 3124b, faces that meet the parameter threshold set in the face model in each layer and are at the same position in the window are merged.

Optionally, the face recognition unit 312 further includes an image preprocessing unit 3120b, which performs image preprocessing on the received image sequence or video stream to make it conform to the image required by the image processing format.

Optionally, the face recognition unit 312 further includes a face identification unit 3125 b: and establishing a corresponding relation between the obtained face information and the attribute value thereof in a corresponding image area. The corresponding relationship may be established by performing identification of the face information and the attribute value thereof in the corresponding image region, or by other identification or expression methods (including but not limited to storing the corresponding relationship, or identifying the corresponding image region, etc.).

Optionally, the face recognition unit 312 further includes a face model establishing unit 3126b, configured to establish and train a face model. Face models may be built and trained in this unit using face detection algorithms including neural network based, eigenfaces, sample-based learning methods, or other algorithms. Preferably, the convolutional neural network method is used for establishing and training the face model, so that the detection precision is high, and the calculation amount is low, so that the face detection of the device is more sensitive.

The associated information retrieval unit 313 may find associated information by the face information and the attribute value thereof input by the face recognition unit 312. Preferably, the related information retrieving unit 313 further includes an internal resource retrieving unit 3131 and an external resource retrieving unit 3132, wherein the internal resource retrieving unit 3131 is configured to retrieve information related to the face information and the attribute value in the related information storage unit 322 in the memory 32, where the information may be pre-stored related information, such as information related to the attribute value being glasses, masks, sunglasses, etc., information related to the same product of the same brand or the same style, or information related to a vehicle, a digital product, or a lady bag, clothing, etc., when the attribute value is a gender value of men or women, etc. The information of the related information storage unit 322 may be the result of external search by the related information search unit 313. The external resource retrieving unit 3132 is used to search for associated information from an external resource, for example, to search for information from the internet or a local area network. The searching method can be realized by extracting keywords in the attribute values, and can also be common searching modes such as picture searching and the like. The external information obtained by the external resource retrieving unit may be stored in the associated information storage unit 322 in the memory 32. The related information may be not only information of a product but also information of a service, such as beauty, hairdressing, oral cavity, massage, sauna, fitness, and the like. The invention can further identify hair, face, whether exposed teeth are related to oral cavity improvement possibility based on the recognition of human face, and further, the specific situation of body shape outline is related to massage, sauna, fitness and other services to a certain extent.

In a specific embodiment, the face recognition unit 312 detects the face information of the user standing in front of the display screen, finds the attribute value of wearing glasses, searches for other glasses information of the same brand or type or style, and sends the information to the display screen 23 for display. In another specific embodiment, if the face recognition unit 312 detects that the user in front of the display screen 23 is a young woman, the associated information retrieval unit 313 may obtain the information of the sales promotion activity of the current day market through the network or the stored information, and send the information to the display screen 23 for display.

In another specific embodiment:

the association information storage unit 322 is further configured to store: the associated information retrieval unit retrieves and acquires associated information corresponding to the face information and the attribute value thereof to serve as historical associated information corresponding to the face information and the attribute value thereof;

an internal resource retrieving unit 3131, further configured to: based on the historical associated information, retrieving associated information corresponding to the face information and the attribute value thereof in the associated information storage unit;

an external resource retrieving unit 3132, further configured to: and searching the associated information corresponding to the face information and the attribute value thereof from external resources based on the historical associated information.

That is, the historical related information can be used for the retrieval of the related information at present or even in the future.

More specifically, if the current face recognized in the action range of the camera comprises Zhang III:

(1) one case is that, for the previous face information of Zhang III and its attribute value, the associated information once retrieved at that time is brand A glasses:

if the current third is suspected to wear the glasses of brand A, new glasses of brand A and/or glasses with similar positioning or style to brand A can be taken as current associated information (one or the most preferred choice) and the priority of the glasses of brand A is increased in the glasses category; furthermore, other clothes and accessories matched with the glasses of the brand A can be used as one of the current associated information;

if the current third is suspected to be the B brand of glasses, a new pair of B brand of glasses and/or a pair of B brand of glasses with similar positioning or style can be used as the current associated information (one or the most preferred choice), and the priority of the A brand of glasses is reduced in the glasses category; furthermore, other clothes and accessories matched with the glasses of the brand B can be used as one of the current associated information;

(2) another situation is that there is face information and its attribute values that match the gender of zhang san and that are similar in age, and/or in image, and/or in quality, especially in relation to one or more celebrities (this may be pre-stored in the system and may be part of the historical association information):

if the current third person wears the glasses, the glasses related to the one or more celebrities can be used as the current associated information (one or the most preferred choice); further, other clothes and accessories related to the one or more celebrities can be used as one of the current associated information.

In each of the above cases, if the corresponding service consumption in the history consumption record can be acquired, the history related information may be associated with the service that has been consumed by zhang san or another celebrity and that satisfies a certain consumption number.

The data transmission device 33 provides an interface for data transmission with other external devices. The device 33 can communicate with external devices in both wireless and wired ways, wherein the external devices can be but are not limited to keyboards, mice, displays, printers, USB devices, mobile devices, etc. In a specific embodiment, the external device is a mobile phone, and the mobile phone can communicate with the associated information pushing apparatus 1 through a wireless or wired connection via the data transmission apparatus 33. The related information pushing device retrieves related information of the face information, such as sales promotion activities of shopping malls, similar commodity recommendation of worn ornaments or catering recommendation information which are interesting to users, by acquiring the face information of mobile phone users, and sends the information to the mobile phone users in the modes of short messages, multimedia messages, addition of WeChat public numbers and the like.

In another embodiment, the invention discloses: a method for pushing associated information based on face recognition.

The method for pushing the associated information based on face recognition in the present application will be described in further detail with reference to fig. 1 to 6.

As shown in the flow chart of fig. 6, the steps of the method are as follows:

step S1: image information is acquired. Image information before the associated information push apparatus 1, which may be a sequence of images or a video stream, is acquired by the camera apparatus 22.

Step S2: and detecting the image information, judging whether face information exists, if so, further identifying the face and the face attribute, and if not, waiting for the next input.

When detecting the image sequence or video stream acquired by the imaging device 22, all the received images may be detected, or preferably, the images may be detected. When the image sequence is a single frame, the image itself is a key frame; and when the image sequence is a plurality of frames, selecting N frames with good quality from the sequence as key frames. The quality judgment can be carried out by scoring indexes such as the definition and the size of the face picture, a real face, shielding, illumination and the like, and selecting the first N frames with high scores as key frames. If the input is a video stream, face detection is preferably performed every 6 frames. Preferably, for the video stream decoding process, 4: 2: YUV data of 0 and stored on the memory after formatting. When the image is detected, the information such as the face position, the face key point, the face attribute value and the like in the image can be extracted, the face key point can comprise the position information such as the canthus, the tail end of the eyebrow, the mouth corner, the nose tip and the like, and the face attribute value comprises the appearance attributes such as the gender, the age, whether the user wears glasses, a hat, a mask and expressions.

When the face and the attribute value are identified, a face and attribute value identification method based on traditional machine learning known in the art or a face and attribute value identification method based on deep learning can be adopted.

In a preferred embodiment, the method for identifying the face and the attribute value is a face detection method based on deep learning and multi-task learning, and comprises the following specific steps:

step S21 a: and collecting the face pictures and labeling the corresponding classes of the attributes to form a training data set.

In step S22a, a face is detected, and face key points are used to align the face with a plurality of key points.

Step S23a, encoding the attributes in the category having order.

Step S24a, constructing a deep neural network, where the front end can be any combination of convolutional layers (convolutional layers), pooling layers (pooling layers) and non-linear layers (non-linear layers), and the back end is a lossy layer such as softmax, crossntropy, etc.

The structure of the deep neural network is shown in fig. 5.

Step S25 a: and (5) training the deep neural network designed in the step S24a by using the training data set formed in the step S21a, and deploying the trained neural network model.

The deep network parameters designed in step S4 are obtained by training through a backward-passed gradient descent algorithm. Replacing the input layer a such that it inputs only pictures; the lossy layer G is replaced so that it only inputs features and face attributes. Thereby obtaining an input face picture and simultaneously outputting a deep neural network model with a plurality of attributes of the face.

And step S26a, predicting the attribute of the human face in the picture by using the deep neural network model in the step S25 a.

In another preferred embodiment, the method for identifying the face and the attribute value is a face detection method based on an embedded system DSP, and specifically includes the following steps:

s21b, calculating pyramid feature layer number and scaling: calculating pyramid characteristic layer numbers n _ scales and a scaling S according to the size of an image to be recognized and a minimum scaling width and height parameter obtained based on face model training, wherein the scaling needs to ensure the area maximization of a scaling region;

n_scales＝num×(log(ratio)/log(2.0)+1)

ratio＝min(w/min_w,h/min_h)

S22b, extracting feature data according to the face classification of the pyramid feature layer: dividing the pyramid feature layer into a complete extraction layer and an approximate extraction layer, and respectively performing complete extraction and approximate extraction to obtain feature data according to each layer of face classification;

preferably, the complete extraction comprises the following sub-steps:

s221: acquiring a zoomed image from the image to be identified based on the zoom coefficient obtained by the previous step;

s222: and taking the image obtained in the step S221 as input, and calculating feature data according to the face classification according to the face model.

Regardless of the manner in which the full extraction is performed, the approximate extraction is further operated on the basis of the results of the full extraction. In one embodiment, an approximate extraction mode of operation is provided: namely: the approximate extraction is to calculate the approximate extraction data of a lambda layer on the basis of a newly obtained complete extraction layer, namely if the complete extraction layer is on the Nth layer, the approximate extraction is to calculate the approximate extraction data on the N + 1-N + lambda layer; wherein, λ is an acceleration ratio parameter of the face model, and is a division standard for dividing the pyramid layer number into two categories, namely a complete extraction layer and an approximate extraction layer.

The purpose of the approximate extraction is to accelerate the extraction process of the whole pyramid feature data and reduce the total data calculation amount.

Further, the scaling ratio S is calculated by:

S＝(S_a/S_r)-β_j

in the formula: s_a、S_rRespectively, an approximate extraction layer, obtained from a complete extraction layerScaling of the image relative to the original image size; beta is a_jThe characteristic type parameter, beta, on which the jth face classification is based_jIs related to the type of feature data upon which a particular face classification is based.

The feature data on which the face classification is based is mentioned in both the above-mentioned complete extraction and the approximate extraction, and the content of this data is provided in detail in one method embodiment, namely: the feature data according to which the face classification is based includes one or more combinations of color channels, gradient magnitudes, and gradient histograms.

In the complete extraction, any three characteristic values can be selected for calculation, namely 1-3 calculations can be selected. The more selections, the more accurate the detection result. For example, in one embodiment, the features according to which the face classification is selected include color channels, gradient magnitudes, and gradient histograms, then β_jAs the parameters of the face model, the value range of j is 0,1, 2. If the color channel and the gradient amplitude are selected by calculation, 0 and 1 are selected; and if the selection is complete, 0,1 and 2 are selected. However, in the calculation, the number of calculations for the approximate extraction needs to be consistent with the full extraction. As for the calculation, the principle of selecting specific feature types and feature numbers is based on the balance between the detection accuracy and the detection time, and the two are inversely proportional.

S23b, classifying feature data of the pyramid feature layer according to the face classification: classifying the pyramid feature data of each layer by applying a classification algorithm; the pyramid feature data comprises complete extraction layer data and approximate extraction layer data;

preferably, the classification algorithm in S23 utilizes an AdaBoost classification algorithm based on a decision tree to implement face model matching of a sliding window through a five-level full binary tree structure, so as to obtain a matching result meeting a threshold condition.

S24b, obtaining the face information and the attribute value thereof: and combining the classification results of each layer to obtain the face information and the attribute value thereof.

The face information and the attribute value thereof comprise face information, face key point information and face attribute value. In a preferred embodiment, an AdaBoost classification algorithm based on a decision tree is used, face model matching of a sliding window is realized through a five-layer full binary tree structure, a matching result meeting a threshold condition is obtained, the matching result appears in a scoring mode, and before the step S24, window position, score and area information are subjected to accepting and rejecting judgment. Namely: before merging the classification results of each layer, the step S24 further includes performing a trade-off determination on the classification result of each layer according to the window position, score, and area information, that is: if the values of the window position, the score and the area information are lower than the judgment threshold value of the human face, the values are discarded; merging, namely merging the human faces at the same position; the threshold is a parameter set in the face model. Through the processing in the method embodiment, the final face information and the attribute value R thereof are obtained. The characterization mode of R is as follows:

R＝{n,[(x,y),(w,h),angle,score][...]}

wherein n is the number of detected faces, the content in the following [ ] represents each detected face information, including the coordinate position (x, y) relative to the vertex at the upper left corner of the image, the width and height (w, h) corresponding to the area of the region where the face is located, the face angle and the detection scoring result score.

The method classifies on the basis of rapid multi-scale pyramid feature data extraction, and can effectively reduce the calculated amount on the premise of ensuring the detection accuracy. In step S21b, in order to make the scaled image of each layer smoother, fine adjustment is required in calculating the scaling factor so that the scaled image size changes smoothly and the maximum of the area to be scaled is ensured. In step S22b, to speed up the feature extraction process and reduce the total data computation amount, the total pyramid feature layer number is divided into two categories, full extraction and approximate extraction, and the division criterion is the scaling S. In step S23b, feature data according to which multiple layers of face classifications are obtained is provided to ensure that accurate face information and attribute values thereof are obtained in step S24 b. In step 23b, the faces in each layer that meet the parameter threshold set in the face model and are at the same position in the window are merged.

In one embodiment of the method, before face image detection, image preprocessing is performed on an image to be recognized, that is: before the step S21b, the method further includes:

s20b, image preprocessing: and converting the color space and the row and column format of the image to be recognized into the color space and the row and column format required by the human face model.

Optionally, before image preprocessing, a face model is established. The establishment method of the face model comprises a face detection algorithm based on a neural network, a characteristic face, a sample learning method or other algorithms. Preferably, a convolutional neural network method is used for establishing and training a face model, the algorithm is high in detection precision and low in calculation amount, and face detection can be fast and effective.

Optionally, the format requirements of the face model include: the color space and the row and column format of the image to be recognized need to be consistent with the color space and the row and column format required by the human face model. Among the selectable color spaces are gray, rgb, hsv, luv, and the like. The images are stored in a column format.

Step S3: the associated information corresponding to the face and the attribute value thereof in step S2 is retrieved and acquired. The associated information retrieval unit 314 finds information associated with the attribute value by the obtained face and the attribute value thereof. In a specific embodiment, the attribute value is to wear glasses, and the associated information retrieving unit may obtain information of other glasses of the same brand, type, or style through existing information stored in the associated information storage unit 323, or through a network such as internet, local area network, or the like, and display the information on the display 23. In another specific embodiment, if the attribute value is lady, the related information retrieving unit may obtain the information of the sales promotion of the current market through the network or the stored information, and send the information to the display 23 for display.

Step S4: and displaying the associated information. The display screen 23 may obtain the associated information sent by the associated information retrieving unit 314 and display the information below the face information. In a specific embodiment, the associated information may be information about shopping malls, or information about recommended goods, or a map of a location where the user is located, according to the obtained attribute values of the user.

Optionally, the method further includes:

step S5: establishing historical associated information, wherein the historical associated information is associated information which is retrieved and acquired in the step S3 and corresponds to the face information and the attribute values thereof;

step S6: and retrieving and acquiring the face information and the associated information corresponding to the attribute value thereof based on the historical associated information.

As described above, by using the historical association information, more accurate expression and correspondence of the association information of the current and subsequent face information and the attribute values thereof can be obtained.

The present disclosure has been described in detail, and the principles and embodiments of the present disclosure have been explained herein by using specific examples, which are provided only for the purpose of helping understanding the method and the core concept of the present disclosure;

meanwhile, for those skilled in the art, according to the idea of the present disclosure, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present description should not be construed as a limitation to the present disclosure.

Claims

1. The utility model provides an associated information push equipment based on face identification, this equipment includes camera device, display screen and treater, its characterized in that:

the processor further comprises a face recognition unit and an associated information retrieval unit;

the camera device is used for acquiring image information and sending the image information to the face recognition unit;

the associated information retrieval unit retrieves and acquires the face information and associated information corresponding to the attribute value of the face information, and sends the associated information to a display screen for display;

when the associated information is displayed, simultaneously displaying the original face or the beautified face in the image information;

the face recognition unit includes:

the complete extraction is to obtain a zoomed image from an image to be recognized based on the currently obtained zoom ratio S, and calculate feature data according to face classification according to a face model by taking the obtained zoomed image as input;

the approximate extraction is to scale feature data according to face classification obtained from a complete extraction layer, wherein the approximate extraction is to calculate approximate extraction data of a lambda layer on the basis of the complete extraction layer, and if the complete extraction layer is on an Nth layer, the approximate extraction is to calculate approximate extraction data on an N + 1-N + lambda layer; wherein, λ is an acceleration ratio parameter of the face model, and is a division standard for dividing the pyramid layer number into two categories, namely a complete extraction layer and an approximate extraction layer;

2. The apparatus of claim 1, wherein the apparatus further comprises a memory.

3. The apparatus of claim 2, wherein the memory further comprises a face database and an associated information storage unit;

the face database is used for providing and storing face information and attribute values thereof for the face recognition unit;

the related information storage unit is used for providing and storing related information corresponding to the face information and the attribute values thereof for the related information retrieval unit.

4. The apparatus according to claim 3, wherein the association information retrieving unit further comprises:

an internal resource retrieval unit, configured to retrieve, in the associated information storage unit, associated information corresponding to the face information and the attribute value thereof;

and the external resource retrieval unit is used for searching the associated information corresponding to the face information and the attributes thereof from the external resources.

5. The apparatus of claim 4, wherein:

the associated information storage unit is further configured to store: the associated information retrieval unit retrieves and acquires associated information corresponding to the face information and the attribute value thereof to serve as historical associated information corresponding to the face information and the attribute value thereof;

an internal resource retrieving unit further configured to: based on the historical associated information, retrieving associated information corresponding to the face information and the attribute value thereof in the associated information storage unit;

an external resource retrieval unit further configured to: and searching the associated information corresponding to the face information and the attribute value thereof from external resources based on the historical associated information.

6. The apparatus of claim 1, further comprising a data transmission device for communicating with an external device in both wireless and wired modes, wherein the external device comprises a keyboard, a mouse, a display, a printer, a USB device, and a mobile communication device.

7. The apparatus of claim 1, wherein:

the face recognition unit also comprises an image preprocessing unit, and the image preprocessing unit is used for converting the color space, the row and column format of the image to be recognized into the color space and the row and column format required by the face model.

8. The apparatus of claim 1, wherein the pyramid feature level n _ scales is calculated by the following formula:

n_scales＝num×(log(ratio)/log(2.0)+1)

in the formula: num refers to the number of pyramid feature layers obtained based on face model training;

ratio min (w/min _ w, h/min _ h), wherein:

min _ w means: a minimum image width in the face model;

min _ h means: a minimum image height in the face model;

w denotes: the width of the image to be identified;

h denotes: the height of the image to be identified.

9. The apparatus of claim 1, wherein:

the calculation formula of the scaling S is as follows:

in the formula: s_a、S_rApproximate extraction layer, scaling with respect to the original image size resulting from the full extraction layer, respectively; beta is a_jThe jth feature type parameter used for face classification.

10. The apparatus of claim 1, wherein:

the feature data according to which the face classification is based includes one or more combinations of color channels, gradient magnitudes, and gradient histograms.

11. The device according to claim 1, wherein the classification algorithm is an AdaBoost classification algorithm based on a decision tree, and the matching of the face model of the sliding window is realized through a five-layer full binary tree structure, so as to obtain a matching result meeting a threshold condition.

12. The apparatus of claim 1, wherein the face recognition unit further comprises:

a face identification unit: and establishing a corresponding relation between the obtained face information and the attribute value thereof and the corresponding image area.

13. A method for pushing associated information based on face recognition comprises the following steps;

step S1: acquiring image information;

step S2: detecting the image information, judging whether face information exists, if so, further identifying at least one piece of face information and attribute value through a face identification unit, and if not, waiting for the next input;

step S3: retrieving and acquiring face information and associated information corresponding to the attribute values of the face information;

step S4: displaying the associated information obtained in the step S3, and displaying the original face or the beautified face in the image information when displaying the associated information;

the identifying at least one face information and attribute value further comprises:

14. The method according to claim 13, wherein the image information comprises a single image, or a sequence of images or a video stream; the attribute values include the sex, age, whether the user wears glasses, whether the user wears a hat, whether the user wears a mask, and expression.

15. The method of claim 13, further comprising:

16. The method of claim 13, wherein: before step S21, the method further includes:

s20, preprocessing the image, converting the color space and the row and column format of the image to be recognized into the color space and the row and column format required by the human face model.

17. The method of claim 13, wherein the pyramid feature level n _ scales is calculated by the formula:

n_scales＝num×(log(ratio)/log(2.0)+1)

ratio min (w/min _ w, h/min _ h), wherein:

min _ w means: a minimum image width in the face model;

min _ h means: a minimum image height in the face model;

w denotes: the width of the image to be identified;

h denotes: the height of the image to be identified.

18. The method of claim 13, wherein:

the calculation formula of the scaling S is as follows:

in the formula: s_a、S_rApproximate extraction layer, scaling with respect to the original image size resulting from the full extraction layer, respectively; beta is a_jAnd (4) characteristic type parameters according to which the jth face is classified.

19. The method according to any one of claims 13, 16 to 18, wherein:

20. The method according to claim 13, wherein the classification algorithm in the step S23 utilizes an AdaBoost classification algorithm based on a decision tree to implement face model matching of a sliding window through a five-level full binary tree, so as to obtain a matching result meeting a threshold condition.

21. The method of claim 14, wherein: and setting the face display area in the associated information display area when the associated information is displayed.

22. The method of claim 16, further comprising:

and S25, establishing a corresponding relation between the obtained face information and the attribute value thereof in the corresponding image area.