CN108596090A

CN108596090A - Facial image critical point detection method, apparatus, computer equipment and storage medium

Info

Publication number: CN108596090A
Application number: CN201810372872.9A
Authority: CN
Inventors: 李宣平
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Kwai Technology Co.,Ltd.; Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2018-04-24
Filing date: 2018-04-24
Publication date: 2018-09-28
Anticipated expiration: 2038-04-24
Also published as: CN108596090B

Abstract

The embodiment of the invention discloses a kind of facial image critical point detection method, apparatus, computer equipment and storage mediums, include the following steps：It obtains pending facial image the facial image is input in preset convolutional neural networks model, wherein the convolutional neural networks model includes at least two channels；The grouped data of the convolutional neural networks model output is obtained, and content understanding is carried out to the facial image according to the grouped data, wherein the grouped data is the mean value of the output valve at least two channel.Pass through multiple channels of convolutional neural networks model, grouped data of the different channels to same facial image is obtained respectively, then the average value of multiple grouped datas is calculated, disappeared shake and ambient random error caused by facial image when shooting by the calculating of average value, keep the convergence of entire convolutional neural networks model more preferable, the precision of output enhances with stability simultaneously, improves the robustness of model.

Description

Facial image critical point detection method, apparatus, computer equipment and storage medium

Technical field

The present embodiments relate to image processing field, especially a kind of facial image critical point detection method, apparatus, meter Calculate machine equipment and storage medium.

Background technology

With the development of depth learning technology, convolutional neural networks have become the powerful of extraction face characteristic, right For the fixed convolutional neural networks of model, most crucial technology be how allowable loss function, can effectively supervise The training of convolutional neural networks, to the ability for making convolutional neural networks that there are key point translation specifications in extraction facial image.

In the prior art in face key point technology, traditional method includes being based on shape constraining method, is based on cascading back The method returned, for example, classical model has active shape model to pass through the characteristic point of training image sample acquisition training image sample The statistical information of distribution, and obtaining characteristic point allows existing change direction, realization to find corresponding spy on target image Levy the position of point.

The inventor of the invention has found under study for action, since shooting video is trembled when face critical point detection in video It is dynamic larger and influenced by extraneous light, lead to be difficult to accurately detect face key point, poor robustness, degree of convergence is relatively low.

Invention content

The embodiment of the present invention provides a kind of facial image pass for improving the degree of convergence by multichannel convolutive neural network model Key point detecting method, device, computer equipment and storage medium.

In order to solve the above technical problems, the technical solution that the embodiment of the invention uses is：A kind of people is provided Face image critical point detection method, includes the following steps：

Obtain pending facial image；

The facial image is input in preset convolutional neural networks model, wherein the convolutional neural networks mould Type includes at least two channels；

The grouped data of the convolutional neural networks model output is obtained, and according to the grouped data to the face figure As carrying out content understanding, wherein the grouped data is the mean value of the output valve at least two channel.

Optionally, the convolutional neural networks model includes：First-loss function and the second loss function.

Optionally, the first-loss function is used to calculate separately the output valve at least two channel and preset phase Hope that the Euclidean distance between true value, the feature description of the first-loss function are：

Wherein, x_iAnd y_iFor the output valve of i-th of point coordinates of first order network,WithExpectation for i-th of point coordinates is true Value, n are the number of face key point.

Optionally, second loss function be used to calculate the Euclidean of the characteristics of image of at least two channel extraction away from From the feature description of second loss function is：

Wherein, f_1iAnd f_2iFor the feature of two networks extraction, ZG () is that gradient stops function.

Optionally, the convolutional neural networks model trains to be formed by following step：

It obtains and is marked with the training sample data that classification judges information；

Training sample data input convolutional neural networks model is obtained to the classification reference of the training sample data Information, wherein mean value of the classification with reference to the output valve that information is at least two channel；

The category of model for comparing different samples in the training sample data judges that information is with reference to information and the classification It is no consistent；

When the category of model judges that information is inconsistent with reference to information with the classification, the update institute of iterative cycles iteration The weight in convolutional neural networks model is stated, until the comparison result terminates when judging that information is consistent with the classification.

Optionally, the acquisition is marked with after the training sample data that classification judges information, further includes following step：

Obtain the initialization weight parameter at least two channel at random respectively.

In order to solve the above technical problems, the embodiment of the present invention also provides a kind of facial image understanding method, input is pending Facial image so that intelligent terminal according to facial image critical point detection method described above to the facial image carry out Content is managed, wherein the content understanding includes：Gender identification, Age estimation, the marking of face value or face phase are carried out to facial image It is compared like degree.

In order to solve the above technical problems, the embodiment of the present invention also provides a kind of facial image critical point detection device, including：

Acquisition module, for obtaining pending facial image；

Processing module, for the facial image to be input in preset convolutional neural networks model, wherein the volume Product neural network model includes at least two channels；

Execution module, the grouped data for obtaining the convolutional neural networks model output, and according to the classification number Content understanding is carried out according to the facial image, wherein the grouped data is the equal of the output valve at least two channel Value.

Optionally, the facial image critical point detection device further includes：

First acquisition submodule, for obtaining the training sample data for being marked with classification and judging information；

First input submodule, for training sample data input convolutional neural networks model to be obtained the training The classification of sample data is with reference to information, wherein mean value of the classification with reference to the output valve that information is at least two channel；

First compares submodule, for compare in the training sample data categories of model of different samples with reference to information with The classification judges whether information is consistent；

First implementation sub-module is used for when the category of model judges that information is inconsistent with reference to information with the classification, Weight in the update convolutional neural networks model of iterative cycles iteration, until the comparison result judges to believe with the classification Terminate when ceasing consistent.

Second acquisition submodule, the initialization weight parameter for obtaining at least two channel at random respectively.

In order to solve the above technical problems, the embodiment of the present invention also provides a kind of computer equipment, including memory and processing Device is stored with computer-readable instruction in the memory, when the computer-readable instruction is executed by the processor so that The processor executes the step of facial image critical point detection method described above, or executes facial image described above Understanding method ground step.

In order to solve the above technical problems, the embodiment of the present invention also provides a kind of storage Jie being stored with computer-readable instruction Matter, when the computer-readable instruction is executed by one or more processors so that the processor executes face described above The step of image key points detection method, or with executing facial image understanding method described above step.

The advantageous effect of the embodiment of the present invention is：By multiple channels of convolutional neural networks model, obtain respectively different Then channel calculates the average value of multiple grouped datas to the grouped data of same facial image, disappeared by the calculating of average value Shake and ambient random error caused by facial image, make the convergence of entire convolutional neural networks model when shooting More preferably, while the precision of output enhances with stability, improves the robustness of model.

Description of the drawings

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.

Fig. 1 is the flow diagram of convolutional neural networks model of the embodiment of the present invention；

Fig. 2 is the basic procedure schematic diagram of facial image critical point detection method of the embodiment of the present invention；

Fig. 3 is convolutional neural networks model training flow diagram of the embodiment of the present invention；

Fig. 4 is facial image critical point detection device basic structure schematic diagram of the embodiment of the present invention；

Fig. 5 is computer equipment basic structure block diagram of the embodiment of the present invention.

Specific implementation mode

In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described.

In some flows of description in description and claims of this specification and above-mentioned attached drawing, contain according to Multiple operations that particular order occurs, but it should be clearly understood that these operations can not be what appears in this article suitable according to its Sequence is executed or is executed parallel, and the serial number such as 101,102 etc. of operation is only used for distinguishing each different operation, serial number It itself does not represent and any executes sequence.In addition, these flows may include more or fewer operations, and these operations can To execute or execute parallel in order.It should be noted that the descriptions such as " first " herein, " second ", are for distinguishing not Same message, equipment, module etc., does not represent sequencing, does not also limit " first " and " second " and be different type.

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, the every other implementation that those skilled in the art are obtained without creative efforts Example, shall fall within the protection scope of the present invention.

Embodiment 1

It includes wireless communication that those skilled in the art of the present technique, which are appreciated that " terminal " used herein above, " terminal device " both, The equipment of number receiver, only has the equipment of the wireless signal receiver of non-emissive ability, and includes receiving and transmitting hardware Equipment, have on bidirectional communication link, can execute two-way communication reception and emit hardware equipment.This equipment May include：Honeycomb or other communication equipments are shown with single line display or multi-line display or without multi-line The honeycomb of device or other communication equipments；PCS (Personal Communications Service, PCS Personal Communications System), can With combine voice, data processing, fax and/or communication ability；PDA (Personal Digital Assistant, it is personal Digital assistants), may include radio frequency receiver, pager, the Internet/intranet access, web browser, notepad, day It goes through and/or GPS (Global Positioning System, global positioning system) receiver；Conventional laptop and/or palm Type computer or other equipment, have and/or the conventional laptop including radio frequency receiver and/or palmtop computer or its His equipment." terminal " used herein above, " terminal device " they can be portable, can transport, be mounted on the vehicles (aviation, Sea-freight and/or land) in, or be suitable for and/or be configured in local runtime, and/or with distribution form, operate in the earth And/or any other position operation in space." terminal " used herein above, " terminal device " can also be communication terminal, on Network termination, music/video playback terminal, such as can be PDA, MID (Mobile Internet Device, mobile Internet Equipment) and/or mobile phone with music/video playing function, can also be the equipment such as smart television, set-top box.

It is to be noted that the basic structure of convolutional neural networks includes two layers, one is characterized extract layer, each nerve The input of member is connected with the local acceptance region of preceding layer, and extracts the feature of the part.After the local feature is extracted, it Position relationship between other feature is also decided therewith；The second is Feature Mapping layer, each computation layer of network is by multiple Feature Mapping forms, and each Feature Mapping is a plane, and the weights of all neurons are equal in plane.Feature Mapping structure is adopted Activation primitive of the sigmoid functions for using influence function core small as convolutional network so that Feature Mapping has shift invariant. Further, since the neuron on a mapping face shares weights, thus reduce the number of network freedom parameter.Convolutional Neural net Each convolutional layer followed by one in network is used for asking the computation layer of local average and second extraction, it is this it is distinctive twice Feature extraction structure reduces feature resolution.

It is the flow diagram of the present embodiment convolutional neural networks model referring specifically to Fig. 1, Fig. 1.

As shown in Figure 1, convolutional neural networks model includes input layer, first passage and second channel.Wherein, first passage Including：Convolutional layer (Conv1) and full articulamentum (FC1), likewise, second channel also includes：Convolutional layer (Conv2) and full connection Layer (FC2).In some embodiments, first passage and second channel include：Convolutional layer, excitation layer, pond layer and connect entirely Layer is connect, level connects aforementioned four layer successively, constitutes first passage and second channel.

Wherein, input layer pre-processes raw image data, including：Go mean value, normalization and PCA/ albefactions. Go mean value：The each dimension of input data all centers are turned to 0, as shown below, purpose is exactly that the center of sample is withdrawn into On coordinate origin.Normalization：Amplitude normalization is to same range, as follows, that is, reduces each dimension data value range Difference and the interference that brings, for example, we there are two the feature A and B of dimension, A ranges are 0 to 10, and B ranges are 0 to arrive 10000, if being directly characterized in using the two problematic, good way is exactly to normalize, i.e. the data of A and B all become 0 To 1 range.PCA (principal component analysis)/albefaction：With PCA dimensionality reductions；Albefaction is to the amplitude normalizing on each feature axis of data Change.

Convolutional layer be used to perceive the part of facial image, and convolutional layer is usually connected in cascaded fashion It connects, the convolutional layer of position more rearward can perceive the information being more globalized in cascade.

Full articulamentum plays the role of " grader " in entire convolutional neural networks.If convolutional layer, pond layer and The operations such as activation primitive layer are that full articulamentum is then played " to be divided what is acquired if initial data is mapped to hidden layer feature space Cloth character representation " is mapped to the effect in sample labeling space.Full articulamentum is connected to convolutional layer output position, can perceive quilt Survey the full toolization feature of facial image.

Convolutional neural networks model includes：First-loss function and the second loss function loss3.Wherein, first-loss letter Euclidean distance between output valve and preset expectation true value of the number for calculating separately two channels.Second loss function is used for The Euclidean distance of the characteristics of image of two channels extraction is calculated, i.e. the second loss function is for calculating first passage and second channel Convolutional layer or pond layer output valve carry out Euclidean distance.

First-loss function includes：First passage loss function loss1 and second channel loss function loss2, and first Pass loss function is identical with the feature description of second channel loss function.Wherein, first passage loss function is for calculating the The output valve in one channel and the preset Euclidean distance it is expected between true value, second channel loss function is for calculating second channel Output valve and it is preset it is expected true value between Euclidean distance.

The feature description of first passage loss function is：

The feature description of second channel loss function is：

The feature description of second loss function is：

Wherein, f_1iAnd f_2iFor the feature of two networks extraction, ZG () is that gradient stops function, and gradient stops function and is used for It realizes gradient descent method, the new direction of search of each iteration is exactly determined using negative gradient direction so that each iteration energy Object function to be optimized is set to gradually reduce.Gradient descent method is the steepest descent method under 2 norms.Gradient stops function in difference Gradient correspond to different step-lengths, smaller closer to extreme value step-length, otherwise, then step-length is bigger.

In some embodiments, convolutional neural networks model be not limited to include：Input layer, first passage and second are logical Road.According to the difference of concrete application environment, the corresponding number of channels of convolutional neural networks model is also different, specifically, to model Convergence and the higher application environment of robustness required precision, the number of channels needed is more, conversely, then number of channels is smaller.

In present embodiment, further relates to facial image critical point detection method and implement for this city referring specifically to Fig. 2, Fig. 2 The basic procedure schematic diagram of example facial image critical point detection method.

As shown in Fig. 2, a kind of facial image critical point detection method, includes the following steps：

S1100, pending facial image is obtained；

The method for obtaining facial image includes two methods of acquisition in real time and extraction storage image/video data.Acquisition in real time It is mainly used for the real-time application of intelligent terminal (mobile phone, tablet computer and monitoring device) (such as：Judge age of user, gender, face value With similarity etc.).Extraction storage image/video data is mainly used for further locating the image and video data of storage Reason, also can be used in intelligent terminal and is applied to historical photograph.

The acquisition of facial image can be extracted by photo, or by the frame picture to video data, obtain Facial image.

S1200, the facial image is input in preset convolutional neural networks model, wherein the convolutional Neural Network model includes at least two channels；

The facial image of acquisition is input in the convolutional neural networks model of trained completion, the convolutional neural networks Model is to be trained to obtain by selected loss function.

Convolutional neural networks model by binary channels respectively to the facial image inputted in different channels carry out feature extraction, Feature is exported after Nonlinear Mapping and reduction over-fitting processing.

S1300, the grouped data for obtaining the convolutional neural networks model output, and according to the grouped data to described Facial image carries out content understanding, wherein the grouped data is the mean value of the output valve at least two channel.

The full articulamentum of convolutional neural networks model classifies to the feature exported after over-fitting processing, forms classification number According to.The grouped data can be used to facial image content understanding, content understanding including but not limited to carries out gender identification, age Judge, the marking of face value or human face similarity degree compare.Grouped data indicate facial image in mainly can recognize that feature, by this feature with Preset criteria for classification is compared, it will be able to be judged to the gender of facial image, age and face value.

Grouped data is the mean value of first passage and second channel grouped data, but not limited to this, in some embodiment party In formula, grouped data is calculated according to number of channels in convolutional neural networks model, grouped data is multiple channel output datas Mean value.

The above embodiment obtains different channels to same face respectively by multiple channels of convolutional neural networks model Then the grouped data of image calculates the average value of multiple grouped datas, by the calculating of average value disappear shooting when shake with it is outer Light random error caused by facial image in boundary's keeps the convergence of entire convolutional neural networks model more preferable, exports simultaneously Precision and stability enhance, improve the robustness of model.

Referring to Fig. 3, Fig. 3 is the present embodiment convolutional neural networks model training flow diagram.

As shown in figure 3, the training method of convolutional neural networks model is as follows：

S2100, acquisition are marked with the training sample data that classification judges information；

Training sample data are the component units of entire training set, and training set is by several training sample training data groups At.

Training sample data judge what information formed by human face data and to human face data to the classification being marked.

Classification judges that information refers to training direction of the people according to input convolutional neural networks model, passes through sentencing for universality The artificial judgement that disconnected standard and true state make training sample data, that is, people are defeated to convolutional neural networks model Go out the expectation target of numerical value.Such as, in a training sample data, the artificial key point for demarcating facial image in training sample Image pixel coordinates where setting, the image pixel coordinates are then the expectation mesh of convolutional neural networks model output category data Mark.

S2200, the mould that training sample data input convolutional neural networks model is obtained to the training sample data Type classification is with reference to information；

Training sample set is sequentially inputted in convolutional neural networks model, training sample is had respectively entered by input layer In first passage and second channel, first passage and second channel carry out feature extraction and classification to facial image respectively, wherein The data of the full articulamentum output of the full articulamentum and second channel of first passage are to classify with reference to information.

Category of model with reference to the excited data that information is that convolutional neural networks model is exported according to the facial image of input, It is not trained to before convergence in convolutional neural networks model, classification is the larger numerical value of discreteness with reference to information, as convolution god It is not trained to convergence through network model, classification is metastable data with reference to information.

S2300, compared in the training sample data by loss function the categories of model of different samples with reference to information with The classification judges whether information is consistent；

Loss function is judged with desired classification with reference to information for detecting category of model in convolutional neural networks model The whether consistent detection function of information.Output result when convolutional neural networks model and the expectation for judging information of classifying As a result it when inconsistent, needs to be corrected the weight in convolutional neural networks model, so that convolutional neural networks model is defeated Go out result and judges that the expected result of information is identical with classification.

In present embodiment, loss function includes first-loss function and the second loss function.And first-loss function packet It includes：First passage loss function and second channel loss function.First passage loss function is for judging that first passage connects entirely The grouped data of layer output and the expectation true value (the crucial point coordinates for the training sample facial image manually demarcated) manually set It is whether consistent, when result is inconsistent, need to be adjusted the weight in first passage by back-propagation algorithm.

The phase that second channel loss function is used to judge the grouped data of the full articulamentum output of second channel and manually sets Hope whether true value (the crucial point coordinates for the training sample facial image manually demarcated) is consistent, when result is inconsistent, needs to lead to Back-propagation algorithm is crossed to be adjusted the weight in second channel.

Second loss function be used for calculates two channels extraction characteristics of image Euclidean distance, that is, judge first passage and Whether the characteristic of the same training sample of second channel output is with uniformity, since first passage and second channel are pair Same training sample is trained, therefore, should be with uniformity in the extraction of characteristic.Second loss function passes through inspection The consistency for surveying two channel characteristics data, when the consistency of characteristic differs, to first passage and second channel Weight is adjusted.The setting of second loss function, make first passage characteristic and second channel characteristic it Between be provided with mutual constraint, provide reference each other for the adjustment of weight, can be entire convolutional neural networks model more Fast reaches convergence.

S2400, when the category of model with reference to information with it is described classification judge that information is inconsistent when, iterative cycles iteration The weight in the convolutional neural networks model is updated, until the comparison result terminates when judging that information is consistent with the classification.

When the classification of convolutional neural networks model output output result judges that the expected result of information is inconsistent with classification, It needs to be corrected the weight in convolutional neural networks model, so that the output result of convolutional neural networks model is sentenced with classification The expected result of disconnected information is identical.When the grouped data that first passage and second channel export judges information with preset classification Unanimously, and the characteristic of the training sample of the extraction of first passage and second channel output characteristic between it is European away from After up to standard, stop the training to the training sample.Be trained using multiple training samples when training (such as 1,000,000 people Face image), by training and correction repeatedly, when the classification of convolutional neural networks model output category data and each training sample When reaching and (be not limited to) 99.9% with reference to information comparison, training terminates.

In some embodiments, it to make multiple channels that there is personalization before training, needs to first passage and the The initial weight in two channels is configured, and specific configuration method is as shown in step S2110.

S2110, the initialization weight parameter for obtaining at least two channel at random respectively.Getting number of training According to rear, the weight of first passage and second channel is initialized by way of generating at random.First after initialization is logical The probability that road is overlapped with second channel weight greatly reduces, and first passage is synchronous with the training of second channel, but initial weight is not Together, the reference between channel is made to enhance.

In order to solve the above technical problems, the embodiment of the present invention also provides face image processing unit.

It is the present embodiment facial image critical point detection device basic structure schematic diagram referring specifically to Fig. 4, Fig. 4.

As shown in figure 4, a kind of face image processing unit, including：Acquisition module 2100, processing module 2200 and execution mould Block 2300.Wherein, acquisition module 2100 is for obtaining pending facial image；Processing module 2200 is used for facial image is defeated Enter into preset convolutional neural networks model, wherein convolutional neural networks model includes at least two channels；Execution module 2300 grouped data for obtaining the output of convolutional neural networks model, and content reason is carried out to facial image according to grouped data Solution, wherein grouped data is the mean value of the output valve at least two channels.

Face image processing unit obtains different channels to same respectively by multiple channels of convolutional neural networks model Then the grouped data of facial image calculates the average value of multiple grouped datas, shaken when disappearing and shoot by the calculating of average value With ambient caused by facial image random error, keep the convergence of entire convolutional neural networks model more preferable, simultaneously The precision of output enhances with stability, improves the robustness of model.

In some embodiments, convolutional neural networks model includes：First-loss function and the second loss function.

In some embodiments, first-loss function be used to calculate separately the output valve at least two channels with it is preset It is expected that the Euclidean distance between true value, the feature description of first-loss function are：

In some embodiments, the second loss function is used to calculate the Euclidean of the characteristics of image of at least two channels extraction The feature description of distance, the second loss function is：

In some embodiments, facial image critical point detection device further includes：First acquisition submodule, the first input Submodule, first compare submodule and the first implementation sub-module.Wherein, the first acquisition submodule is marked with classification for acquisition and sentences The training sample data of disconnected information；First input submodule is used to obtain on training sample data input convolutional neural networks model The classification of training sample data is with reference to information, wherein mean value of the classification with reference to the output valve that information is at least two channels；First Submodule is compared for comparing whether the category of model of different samples in training sample data judges information with reference to information and classification Unanimously；First implementation sub-module is used for when category of model judges that information is inconsistent with reference to information with classification, iterative cycles iteration Update convolutional neural networks model in weight, until comparison result with classify judge that information is consistent when terminate.

In some embodiments, facial image critical point detection device further includes：Second acquisition submodule, for distinguishing The random initialization weight parameter for obtaining at least two channels.

In order to solve the above technical problems, the embodiment of the present invention also provides computer equipment.It is this referring specifically to Fig. 5, Fig. 5 Embodiment computer equipment basic structure block diagram.

As shown in figure 5, the internal structure schematic diagram of computer equipment.As shown in figure 5, the computer equipment includes passing through to be Processor, non-volatile memory medium, memory and the network interface of bus of uniting connection.Wherein, the computer equipment is non-easy The property lost storage medium is stored with operating system, database and computer-readable instruction, and control information sequence can be stored in database Row when the computer-readable instruction is executed by processor, may make processor to realize a kind of facial image critical point detection method. The processor of the computer equipment supports the operation of entire computer equipment for providing calculating and control ability.The computer It can be stored with computer-readable instruction in the memory of equipment, when which is executed by processor, may make place It manages device and executes a kind of facial image critical point detection method.The network interface of the computer equipment is used for and terminal connection communication. It will be understood by those skilled in the art that structure shown in Fig. 5, only with the frame of the relevant part-structure of application scheme Figure, does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment can wrap It includes than more or fewer components as shown in the figure, either combine certain components or is arranged with different components.

Processor is for executing acquisition module 2100 in Fig. 4, processing module 2200 and execution module in present embodiment 2300 concrete function, memory are stored with the program code and Various types of data executed needed for above-mentioned module.Network interface is used for To the data transmission between user terminal or server.Memory in present embodiment is stored with facial image critical point detection The program code and data needed for all submodules are executed in device, server is capable of the program code and data of invoking server Execute the function of all submodules.

Computer equipment obtains different channels to same face figure respectively by multiple channels of convolutional neural networks model Then the grouped data of picture calculates the average value of multiple grouped datas, disappeared shake and the external world when shooting by the calculating of average value Light random error caused by facial image keeps the convergence of entire convolutional neural networks model more preferable, while exporting Precision enhances with stability, improves the robustness of model.

The present invention also provides a kind of storage mediums being stored with computer-readable instruction, and the computer-readable instruction is by one When a or multiple processors execute so that one or more processors execute facial image key point described in any of the above-described embodiment The step of detection method.

One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, which can be stored in a computer-readable storage and be situated between In matter, the program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, storage medium above-mentioned can be The non-volatile memory mediums such as magnetic disc, CD, read-only memory (Read-Only Memory, ROM) or random storage note Recall body (Random Access Memory, RAM) etc..

It should be understood that although each step in the flow chart of attached drawing is shown successively according to the instruction of arrow, These steps are not that the inevitable sequence indicated according to arrow executes successively.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that either these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence is also not necessarily to be carried out successively, but can be with other Either the sub-step of other steps or at least part in stage execute step in turn or alternately.

Embodiment 2

Facial image understanding method is used for client terminal in the present embodiment, user by client terminal to picture or video into Row image understanding.Content understanding including but not limited to carries out gender identification, Age estimation, the marking of face value or human face similarity degree ratio It is right.Grouped data is indicated mainly to can recognize that feature in facial image, this feature is compared with preset criteria for classification, energy It is enough that the gender of facial image, age and face value are judged.

Client terminal is trained by the way that above-mentioned image to be input in embodiment 1 into convergent convolutional neural networks model, Feature extraction is carried out to the face key point in image, then the corresponding content understanding result for obtaining the image.

Client terminal smart mobile phone, the end pad, PC or mobile computer in the present embodiment.

As shown in figure 5, the internal structure schematic diagram of computer equipment.As shown in figure 5, the computer equipment includes passing through to be Processor, non-volatile memory medium, memory and the network interface of bus of uniting connection.Wherein, the computer equipment is non-easy The property lost storage medium is stored with operating system, database and computer-readable instruction, and control information sequence can be stored in database Row when the computer-readable instruction is executed by processor, may make processor to realize a kind of facial image understanding method.The calculating The processor of machine equipment supports the operation of entire computer equipment for providing calculating and control ability.The computer equipment It can be stored with computer-readable instruction in memory, when which is executed by processor, processor may make to hold A kind of facial image understanding method of row.The network interface of the computer equipment is used for and terminal connection communication.People in the art Member it is appreciated that Fig. 5 shown in structure, only with the block diagram of the relevant part-structure of application scheme, constitute pair The restriction for the computer equipment that application scheme is applied thereon, specific computer equipment may include than as shown in the figure more More or less component either combines certain components or is arranged with different components.

The present invention also provides a kind of storage mediums being stored with computer-readable instruction, and the computer-readable instruction is by one When a or multiple processors execute so that one or more processors execute facial image understanding side described in any of the above-described embodiment The step of method.

The above is only some embodiments of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of facial image critical point detection method, which is characterized in that include the following steps：

Obtain pending facial image；

The facial image is input in preset convolutional neural networks model, wherein the convolutional neural networks model packet Include at least two channels；

Obtain the grouped data of convolutional neural networks model output, and according to the grouped data to the facial image into Row content understanding, wherein the grouped data is the mean value of the output valve at least two channel.

2. facial image critical point detection method according to claim 1, which is characterized in that the convolutional neural networks mould Type includes：First-loss function and the second loss function.

3. facial image critical point detection method according to claim 2, which is characterized in that the first-loss function is used In the output valve and the preset Euclidean distance it is expected between true value that calculate separately at least two channel, the first-loss The feature description of function is：

Wherein, x_iAnd y_iFor the output valve of i-th of point coordinates of first order network,WithFor the expectation true value of i-th of point coordinates, n For the number of face key point.

4. facial image critical point detection method according to claim 2, which is characterized in that second loss function is used In the Euclidean distance for the characteristics of image for calculating at least two channels extraction, the feature description of second loss function is：

5. face image processing method according to claim 1, which is characterized in that the convolutional neural networks model passes through Following step trains to be formed：

Training sample data input convolutional neural networks model is obtained to the classification reference information of the training sample data, Wherein, mean value of the classification with reference to the output valve that information is at least two channel；

Compare in the training sample data categories of model of different samples with reference to information and the classification judge information whether one It causes；

When the category of model judges that information is inconsistent with reference to information with the classification, the update volume of iterative cycles iteration Weight in product neural network model, until the comparison result terminates when judging that information is consistent with the classification.

6. face image processing method according to claim 5, which is characterized in that the acquisition is marked with classification and judges letter Further include following step after the training sample data of breath：

7. a kind of facial image understanding method, which is characterized in that pending facial image is inputted, so that intelligent terminal is according to power Profit requires the facial image critical point detection method described in 1-6 any one to carry out content reason to the facial image, wherein institute Stating content understanding includes：Gender identification, Age estimation, the marking of face value or human face similarity degree are carried out to facial image to compare.

8. a kind of facial image critical point detection device, which is characterized in that including：

Acquisition module, for obtaining pending facial image；

Processing module, for the facial image to be input in preset convolutional neural networks model, wherein the convolution god Include at least two channels through network model；

Execution module, the grouped data for obtaining the convolutional neural networks model output, and according to the grouped data pair The facial image carries out content understanding, wherein the grouped data is the mean value of the output valve at least two channel.

9. a kind of computer equipment, including memory and processor, it is stored with computer-readable instruction in the memory, it is described When computer-readable instruction is executed by the processor so that the processor is executed such as any one of claim 1 to 6 right It is required that the step of facial image critical point detection method, or execute facial image understanding side as claimed in claim 7 Method ground step.

10. a kind of storage medium being stored with computer-readable instruction, the computer-readable instruction is handled by one or more When device executes so that one or more processors execute the facial image as described in any one of claim 1 to 6 claim and close The step of key point detecting method, or with executing facial image understanding method as claimed in claim 7 step.