CN110147776A

CN110147776A - The method and apparatus for determining face key point position

Info

Publication number: CN110147776A
Application number: CN201910441092.XA
Authority: CN
Inventors: 洪智滨; 郭汉奇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-05-24
Filing date: 2019-05-24
Publication date: 2019-08-20
Anticipated expiration: 2039-05-24
Also published as: CN110147776B

Abstract

The method and apparatus that the embodiment of the present invention proposes a kind of determining face key point position, method include: the preliminary face key point coordinate obtained in image by coordinate Recurrent networks；Based on the face low-level image feature in preliminary face key point coordinate and image, position sensing feature is obtained by position sensing feature extraction network, position sensing feature includes the corresponding face low-level image feature of preliminary face key point coordinate and face semantic feature；According to position sensing feature, preliminary face key point coordinate is modified, determines the final face key point coordinate in image.The embodiment of the present invention makes full use of the preliminary face key point coordinate including spatial positional information, and face low-level image feature with bottom high-resolution features obtains position sensing feature, enable the position sensing feature obtained that there is semantic feature abundant and low-level image feature, to improve the precision of the final face key point coordinate obtained by amendment.

Description

The method and apparatus for determining face key point position

Technical field

The present invention relates to nerual network technique field more particularly to a kind of methods and dress of determining face key point position It sets.

Background technique

Face critical point detection algorithm is broadly divided into two major classes according to technology path: traditional key point coordinate homing method With the deep learning method for using neural network.Wherein, the feature stability that traditional key point coordinate regression class method uses Weak, vulnerable to factors such as illumination influence.And deep learning class method, in application scenes, since face is smaller in image, Face characteristic figure needed for after deep learning network substantially reduces, spatial positional information lacks, and face key point is caused to examine The result of survey generates deviation, influences practical application performance.

Summary of the invention

The embodiment of the present invention provides a kind of method and apparatus of determining face key point position, in the prior art to solve One or more technical problems.

In a first aspect, the embodiment of the invention provides a kind of methods of determining face key point position, comprising:

The preliminary face key point coordinate in image is obtained by coordinate Recurrent networks；

Based on the face low-level image feature in the preliminary face key point coordinate and described image, pass through position sensing feature It extracts network and obtains position sensing feature, the position sensing feature includes the corresponding face of the preliminary face key point coordinate Low-level image feature and face semantic feature；

According to the position sensing feature, the preliminary face key point coordinate is modified, determines described image In final face key point coordinate.

In one embodiment, network is extracted by fisrt feature and obtains the face low-level image feature；

Based on the face low-level image feature, network is extracted by second feature and obtains face semantic feature.

In one embodiment, the preliminary face key point coordinate in image is obtained by coordinate Recurrent networks, comprising:

Based on the face semantic feature, it is crucial that the preliminary face in described image is obtained by the coordinate Recurrent networks Point coordinate.

In one embodiment, according to the position sensing feature, the preliminary face key point coordinate is repaired Just, the final face key point coordinate in described image is determined, comprising:

According to the position sensing feature, coordinate modification value is obtained by coordinate difference Recurrent networks；

The preliminary face key point coordinate is modified by the coordinate modification value, is determined in described image Final face key point coordinate.

In one embodiment, the position sensing feature extraction network includes that third feature extracts network and convolution net Network passes through position sensing feature extraction based on the face low-level image feature in the preliminary face key point coordinate and described image Network obtains position sensing feature, comprising:

Based on the preliminary face key point coordinate and the face low-level image feature, network is extracted by the third feature Face characteristic is obtained, the face characteristic includes the corresponding face low-level image feature of the preliminary face key point coordinate；

Based on the face characteristic, the position sensing feature is obtained by the convolutional network.

In one embodiment, it is based on the preliminary face key point coordinate and the face low-level image feature, passes through institute It states third feature and extracts network acquisition face characteristic, comprising:

According to each preliminary face key point coordinate for forming each human face, the position for obtaining each human face is covered Film；

Position exposure mask based on the face low-level image feature and each human face, is extracted by the third feature Network obtains the face characteristic.

Second aspect, the embodiment of the invention provides a kind of devices of determining face key point position, comprising:

Coordinate obtaining module, for obtaining the preliminary face key point coordinate in image by coordinate Recurrent networks；

Sensitive features obtain module, for based on the face bottom in the preliminary face key point coordinate and described image Feature obtains position sensing feature by position sensing feature extraction network, and the position sensing feature includes the preliminary people The corresponding face low-level image feature of face key point coordinate and face semantic feature；

Coordinate determining module, for being repaired to the preliminary face key point coordinate according to the position sensing feature Just, the final face key point coordinate in described image is determined.

In one embodiment, further includes:

Low-level image feature obtains module, obtains the face low-level image feature for extracting network by fisrt feature；

Semantic feature obtains module, for being based on the face low-level image feature, extracts network by second feature and obtains people Face semantic feature.

In one embodiment, the coordinate obtaining module includes:

Coordinate acquisition submodule, for being based on the face semantic feature, by described in coordinate Recurrent networks acquisition Preliminary face key point coordinate in image.

In one embodiment, the coordinate determining module includes:

Correction value acquisition submodule, for being obtained and being sat by coordinate difference Recurrent networks according to the position sensing feature Mark correction value；

Coordinate determines submodule, for being repaired by the coordinate modification value to the preliminary face key point coordinate Just, the final face key point coordinate in described image is determined.

In one embodiment, the sensitive features acquisition module includes:

First acquisition submodule passes through for being based on the preliminary face key point coordinate and the face low-level image feature Third feature extracts network and obtains face characteristic, and the face characteristic includes the corresponding face of the preliminary face key point coordinate Low-level image feature；

Second acquisition submodule obtains the position sensing feature by convolutional network for being based on the face characteristic.

In one embodiment, the first acquisition submodule includes:

First acquisition unit, for obtaining each according to each preliminary face key point coordinate for forming each human face The position exposure mask of human face；

Second acquisition unit, the position exposure mask based on the face low-level image feature and each human face, passes through institute It states third feature and extracts the network acquisition face characteristic.

The third aspect, the embodiment of the invention provides a kind of terminal of determining face key point position, the determining faces The function of the terminal of key point position can also execute corresponding software realization by hardware realization by hardware.It is described Hardware or software include one or more modules corresponding with above-mentioned function.

It include processor and depositing in the structure of the terminal of the determining face key point position in a possible design Reservoir, the memory are used to store the terminal for supporting the determining face key point position and execute above-mentioned determining face key point The program of the method for position, the processor is configured to for executing the program stored in the memory.The determining people The terminal of face key point position can also include communication interface, be used for and other equipment or communication.

Fourth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, close for storing determining face Computer software instructions used in the terminal of key point position comprising the method for executing above-mentioned determining face key point position Related program.

A technical solution in above-mentioned technical proposal has the following advantages that or the utility model has the advantages that the embodiment of the present invention is sufficiently sharp With the preliminary face key point coordinate including spatial positional information, and the face low-level image feature with bottom high-resolution features Position sensing feature is obtained, enables the position sensing feature obtained that there is semantic feature abundant and low-level image feature, from And improve the precision of the final face key point coordinate obtained by amendment.

Above-mentioned general introduction is merely to illustrate that the purpose of book, it is not intended to be limited in any way.Except foregoing description Schematical aspect, except embodiment and feature, by reference to attached drawing and the following detailed description, the present invention is further Aspect, embodiment and feature, which will be, to be readily apparent that.

Detailed description of the invention

In the accompanying drawings, unless specified otherwise herein, otherwise indicate the same or similar through the identical appended drawing reference of multiple attached drawings Component or element.What these attached drawings were not necessarily to scale.It should be understood that these attached drawings depict only according to the present invention Disclosed some embodiments, and should not serve to limit the scope of the present invention.

Fig. 1 shows the flow chart of the method for determining face key point position according to an embodiment of the present invention.

Fig. 2 shows the schematic diagrames for marking preliminary face key point in facial image according to embodiments of the present invention.

Fig. 3 shows the schematic diagram for being trimmed to final face key point in facial image according to embodiments of the present invention.

Fig. 4 shows the flow chart according to an embodiment of the present invention for obtaining face low-level image feature and face semantic feature.

Fig. 5 shows the flow chart of the method for determining face key point position according to another embodiment of the present invention.

Fig. 6 shows the flow chart of the method for determining face key point position according to another embodiment of the present invention.

Fig. 7 shows the flow chart of the method for determining face key point position according to another embodiment of the present invention.

Fig. 8 shows the flow chart of the method for determining face key point position according to another embodiment of the present invention.

Fig. 9 shows the flow chart of the method for determining face key point position according to another embodiment of the present invention.

Figure 10 shows the neural network structure figure of the method for determining face key point position according to an embodiment of the present invention.

Figure 11 shows the structure chart of position sensing feature extraction network according to an embodiment of the present invention.

Figure 12 shows the structural block diagram of the device of determining face key point position according to an embodiment of the present invention.

Figure 13 shows the structural block diagram of the device of determining face key point position according to another embodiment of the present invention.

Figure 14 shows the structure chart of coordinate obtaining module according to an embodiment of the present invention.

Figure 15 shows the structure chart of coordinate determining module according to an embodiment of the present invention.

Figure 16 shows the structure chart that sensitive features according to an embodiment of the present invention obtain module.

Figure 17 shows the structure chart of the first acquisition submodule according to an embodiment of the present invention.

Figure 18 shows the structure chart of the second acquisition submodule according to an embodiment of the present invention.

Figure 19 shows the structural schematic diagram of the terminal of position sensing feature extraction network according to an embodiment of the present invention.

Specific embodiment

Hereinafter, certain exemplary embodiments are simply just described.As one skilled in the art will recognize that Like that, without departing from the spirit or scope of the present invention, described embodiment can be modified by various different modes. Therefore, attached drawing and description are considered essentially illustrative rather than restrictive.

Fig. 1 shows the flow chart of determining face key point position according to an embodiment of the present invention.As shown in Figure 1, this method Include:

S100: the preliminary face key point coordinate in image is obtained by coordinate Recurrent networks.

Preliminary face key point is for indicating the position of each face organ in the picture.Face organ includes eyes, eyebrow Hair, nose, ear, mouth and face itself.The preliminary face key point that each face organ may be provided with different number is sat Mark.The quantity of specific preliminary face key point can be by adjusting the parameter change in coordinate Recurrent networks.It should be noted that Face key point coordinate may include one or more pixels in facial image.

In one example, as shown in Fig. 2, the facial image by coordinate Recurrent networks handle after, at eyebrow extract Three preliminary face key point A, B, C.This three preliminary face key points can indicate eyebrow in facial image substantially Position, but not precisely.

S200: it based on the face low-level image feature in preliminary face key point coordinate and image, is mentioned by position sensing feature Network is taken to obtain position sensing feature.Position sensing feature include the corresponding face low-level image feature of preliminary face key point coordinate and Face semantic feature.

Face semantic feature is intended to indicate that the high dimension vector of face feature, can be used in the classification for distinguishing each organ of face The position and.Face low-level image feature can be the high-resolution features vector with abundant information.Face low-level image feature may include One of texture, color, shape and spatial relationship are a variety of.The tentatively corresponding face low-level image feature of face key point coordinate It is to be understood that a certain range of and preliminary face key point centered on the preliminary face key point coordinate in the picture The features such as the relevant texture of position height, color, shape and/or spatial relationship.The size and shape of the range can pass through position Sensitive features extraction network is set to be adaptively adjusted.

In one example, color characteristic is a kind of global characteristics, describes scenery corresponding to image or image-region Surface nature.General color characteristic is the feature based on pixel, and the pixels for belonging to image or image-region all at this time are all There is respective contribution.Color histogram is the method for most common expression color characteristic, its advantage is that not by image rotation peace The influence of variation is moved, further can not also be changed by graphical rule by normalization is influenced.

Textural characteristics are also a kind of global characteristics, it also illustrates the superficiality of scenery corresponding to image or image-region Matter.Different from color characteristic, textural characteristics are not based on the feature of pixel, it is needed in the region comprising multiple pixels Carry out statistics calculating.In pattern match, this zonal feature has biggish superiority, will not be due to the deviation of part And it can not successful match.As a kind of statistical nature, textural characteristics have noise stronger often with there is rotational invariance Resistivity.

Shape feature has two class representation methods, and one kind is contour feature, and another kind of is provincial characteristics.The contour feature of image Mainly for the outer boundary of object, and the provincial characteristics of image is then related to entire shape area.Spatial relationship may include figure Mutual spatial position or relative direction relationship, these relationships as between multiple targets for splitting can also be divided into company Connect/syntople, overlapping/overlapping relation and comprising/containment relationship etc..Usual spatial positional information can be divided into: space Location information and absolute spatial position information.Former relationship mainly indicates the relative case between target, such as closes up and down System etc., latter relationship mainly indicates the distance between target size and orientation.

S300: according to position sensing feature, preliminary face key point coordinate is modified, is determined final in image Face key point coordinate.The finally expression face that face key point coordinate can be more accurate relative to preliminary face key point coordinate Position of portion's organ in facial image.

In one example, as shown in figure 3, by the amendment to preliminary face key point coordinate, eyebrow position is indicated Three final face key point coordinates are A1, B1, C1.From the figure 3, it may be seen that three final face key point coordinate A1, B1 and C1 with Three preliminary face key point A, B, C are compared, and can more accurately indicate position of the eyebrow in facial image.

In the above-described embodiments, by making full use of the preliminary face key point coordinate including spatial positional information, and Face low-level image feature with bottom high-resolution features obtains position sensing feature, enables the position sensing feature obtained Enough there is semantic feature abundant and low-level image feature, to improve the essence of the final face key point coordinate obtained by amendment Degree.

In one embodiment, as shown in figure 4, this method further include:

S400: network is extracted by fisrt feature and obtains face low-level image feature.

In one example, fisrt feature, which extracts network, can be used the nerve net being made of convolutional layer and maximum pond layer Network.Wherein, the quantity of convolutional layer and maximum pond layer can be adjusted as needed.For example, one layer of convolutional layer and one can be used The maximum pond layer of layer is formed by fisrt feature and extracts network.

It should be noted that fisrt feature, which extracts network, can carry out face bottom spy to original image therein is input to Sign is extracted, and can also carry out the extraction of face low-level image feature to by pretreated image.The pretreatment of image mainly includes people Face righting, the work such as the enhancing of facial image, and normalization.Face righting is the face figure that face location is proper in order to obtain Picture；Image enhancement is the quality in order to improve facial image, is not only visually more clear image, but also be more conducive to image The processing and identification of computer.The target of normalization work is that acquirement size is consistent, the identical standardization people of gray scale value range Face image.Be input to fisrt feature extract network in image can be only include face face picture, be also possible to include The picture of face and other backgrounds.

S500: being based on face low-level image feature, extracts network by second feature and obtains face semantic feature.Second feature mentions Take network that can further extract high-level semantics features on the basis of low-level image feature.

In one example, second feature, which extracts network, can be used the nerve net being made of reversed residual error network and convolutional layer Network.Wherein, the quantity of reversed residual error network and convolutional layer can be adjusted as needed.For example, using six reversed residual error nets Network and one layer of convolutional layer are formed by second feature and extract network.

In one embodiment, as shown in figure 5, obtaining the preliminary face key point in image by coordinate Recurrent networks Coordinate, comprising:

S110: being based on face semantic feature, obtains the preliminary face key point coordinate in image by coordinate Recurrent networks.

In one example, coordinate Recurrent networks can be used full articulamentum and constitute neural network.Full articulamentum passes through to defeated The face semantic feature entered is calculated, and preliminary face key point coordinate can be directly returned.

In one embodiment, as shown in fig. 6, according to position sensing feature, preliminary face key point coordinate is carried out Amendment, determines the final face key point coordinate in image, comprising:

S310: according to position sensing feature, coordinate modification value is obtained by coordinate difference Recurrent networks.For different people Face key point coordinate can return out different correction values.Due to both including face low-level image feature or including people in position sensing feature Face semantic feature, therefore coordinate difference Recurrent networks can more accurately generate coordinate modification value by the data of various dimensions.

In one example, the neural network that coordinate difference Recurrent networks can be constituted using full articulamentum.

S320: preliminary face key point coordinate is modified by coordinate modification value, determines the final people in image Face key point coordinate.It is modified by coordinate value of the coordinate modification value to each preliminary face key point coordinate, it can be in image In extract the more accurate key point for indicating each human face position.

In one embodiment, as shown in fig. 7, position sensing feature extraction network include third feature extract network and Convolutional network.Based on the face low-level image feature in preliminary face key point coordinate and image, pass through position sensing feature extraction net Network obtains position sensing feature, comprising:

S210: being based on preliminary face key point coordinate and face low-level image feature, extracts network by third feature and obtains people Face feature, face characteristic include the corresponding face low-level image feature of preliminary face key point coordinate.Face characteristic includes preliminary face The corresponding face low-level image feature of key point coordinate is it is to be understood that using the preliminary face key point coordinate as center certain predetermined model Enclose the face low-level image feature highly relevant with preliminary face key point position of interior covering.The size and shape of preset range can root Network is extracted according to third feature to be adaptively adjusted.

In one example, third feature is extracted network and be can be used by convolutional layer, maximum pond layer and linear active coating The neural network of composition.Wherein, the quantity of convolutional layer, maximum pond layer and linear active coating can be adjusted as needed and Configuration.For example, forming third feature by nine convolutional layers, nine maximum pond layers and nine linear active coatings extracts network.

S220: being based on face characteristic, obtains position sensing feature by convolutional network.It should be noted that convolutional network Structure existing network structure can be used, as long as can be realized the function that feature is further extracted.Convolutional network Quantity can be adjusted as needed, for example, it is desired to when the feature finally extracted includes the characteristic information of multiple and different dimensions, then The extraction that multiple convolutional networks realize different dimensions feature can be set.

In one example, as shown in figure 8, being based on face characteristic, position sensing feature, packet are obtained by convolutional network It includes:

S2210: feature extraction is carried out to face characteristic by the first convolutional network, and exports fisrt feature.Fisrt feature In include the corresponding face low-level image feature of preliminary face key point coordinate and the first face semantic feature.

S2220: feature extraction is carried out to fisrt feature by the second convolutional network, and exports second feature.Second feature In include the corresponding face low-level image feature of preliminary face key point coordinate and the second face semantic feature.

S2230: being based on fisrt feature and second feature, obtains position sensing feature.

It should be noted that since the second convolutional network is to have carried out the fisrt feature of the first convolutional network output into one The feature extraction of step, therefore semantic feature included in the second face semantic feature and the first face semantic feature is different 's.For example, including the classification and location information of each human face in the first face semantic feature.Second face semantic feature In include relative position information between each human face.That is the fisrt feature and the second convolution of the first convolutional network output The characteristic information of different dimensions is separately included in the second feature of network output.Finally obtained position can be made quick in this way Feeling in feature had both included low-level image feature abundant, additionally it is possible to include more semantic features.Due to mentioning for position sensing feature Process is taken to depend on the preliminary face key point coordinate of corresponding part, therefore very sharp to spatial position change, it can be effective Critical point detection performance is improved using spatial positional information help.

In one embodiment, as shown in figure 9, being based on preliminary face key point coordinate and face low-level image feature, pass through Third feature extracts network and obtains face characteristic, comprising:

S2110: according to each preliminary face key point coordinate for forming each human face, the position of each human face is obtained Set exposure mask.In one example, which can be realized by position exposure mask generator.

S2120: the position exposure mask based on face low-level image feature and each human face extracts network by third feature and obtains Take face characteristic.

It should be noted that position exposure mask is for retaining and determining human face region in the picture.For example, face Image can be nine regions by grid dividing, when nose is in five region among image, position corresponding with nose Exposure mask then only indicates the 5th region.By position exposure mask, in step S2120 can quick obtaining to be located at the 5th region in Face low-level image feature realizes the quick obtaining of the face low-level image feature of each human face.

In one embodiment, as shown in Figure 10,11, the neural network of the confirmation face key point position is by five parts Composition.Respectively low-level image feature extracts network, high-level characteristic extracts network, coordinate Recurrent networks, position sensing feature extraction net Network, coordinate difference Recurrent networks.

Low-level image feature extracts network: being made of one layer of convolutional network and one layer of maximum pond network.It is special for face bottom The extraction of sign.

High-level characteristic extracts network: being made of six reversed residual error sub-networks and a convolutional layer cascade.Its act on be Face semantic feature (high-level characteristic) is further extracted on the basis of face low-level image feature.

Coordinate Recurrent networks: it is made of full articulamentum.According to the face semantic feature of extraction, preliminary face is calculated and closes Key point coordinate.

Position sensing feature extraction network: being combined by position exposure mask generator, feature extraction sub-network and convolutional network and At.The network is extracted in face low-level image feature and is closed with preliminary face according to preliminary face key point coordinate and face low-level image feature The corresponding face characteristic of key point coordinate.Since the extraction process of this feature depends on the key point coordinate corresponding to part, It is very sensitive to spatial position change, spatial positional information can be efficiently used, help improves critical point detection performance.

Coordinate difference Recurrent networks: it is made of full articulamentum.According to position sensing feature, coordinate modification value is calculated, To correct preliminary face key point coordinate value, final face key point coordinate is obtained.Realize high-precision critical point detection.

In one embodiment, in order to enable position sensing feature extraction network can be more accurately obtained position quick Feel feature, it is therefore desirable to which the face key point coordinate manually to mark passes through some training sample location sensitives as reference Feature extraction network is trained.Until the coordinate modification value obtained by position sensing feature is to preliminary face key point coordinate It, being capable of or same position approximate with the face key point coordinate manually marked after amendment.

Since preliminary face key point coordinate and face being utilized when location sensitive feature extraction network is trained Low-level image feature, therefore trained convergence rate can be accelerated.

The embodiment of the present invention can be improved face critical point detection precision.Reduce high-rise neural network characteristics figure resolution ratio Small, spatial positional information missing is to face critical point detection bring adverse effect.Help with face critical point detection technology be Many applications on basis promote effect and user experience.Be conducive to the further genralrlization of business item.This method and device can answer For many scenes such as face editor, In vivo detection, Attribute Recognition, face video pasters.Have in current many business It is widely applied.

Figure 12 shows the structure chart of the device of the determination face key point position of the embodiment of the present invention.As shown in figure 12, should Device includes:

Coordinate obtaining module 10, for obtaining the preliminary face key point coordinate in image by coordinate Recurrent networks.

Sensitive features obtain module 20, for based on the face low-level image feature in preliminary face key point coordinate and image, Position sensing feature is obtained by position sensing feature extraction network, position sensing feature includes preliminary face key point coordinate pair The face low-level image feature and face semantic feature answered.

Coordinate determining module 30 is determined for being modified to preliminary face key point coordinate according to position sensing feature Final face key point coordinate in image out.

In one embodiment, as shown in figure 13, the device further include:

Low-level image feature obtains module 40, obtains face low-level image feature for extracting network by fisrt feature.

Semantic feature obtains module 50, for being based on face low-level image feature, extracts network by second feature and obtains face Semantic feature.

In one embodiment, as shown in figure 14, coordinate obtaining module 10 includes:

Coordinate acquisition submodule 11 is obtained first in image for being based on face semantic feature by coordinate Recurrent networks Walk face key point coordinate.

In one embodiment, as shown in figure 15, coordinate determining module 30 includes:

Correction value acquisition submodule 31, for obtaining coordinate by coordinate difference Recurrent networks according to position sensing feature Correction value.

Coordinate determines submodule 32, for being modified by coordinate modification value to preliminary face key point coordinate, determines Final face key point coordinate in image out.

In one embodiment, as shown in figure 16, sensitive features acquisition module 20 includes:

First acquisition submodule 21 passes through third spy for being based on preliminary face key point coordinate and face low-level image feature Sign extracts network and obtains face characteristic, and face characteristic includes the corresponding face low-level image feature of preliminary face key point coordinate.

Second acquisition submodule 22 obtains position sensing feature by convolutional network for being based on face characteristic.

In one embodiment, as shown in figure 17, the first acquisition submodule 21 includes:

First acquisition unit 211, for obtaining every according to each preliminary face key point coordinate for forming each human face The position exposure mask of a human face.

Second acquisition unit 212, the position exposure mask based on face low-level image feature and each human face, passes through third feature It extracts network and obtains face characteristic.

In one embodiment, as shown in figure 18, the second acquisition submodule 22 includes:

Fisrt feature extraction unit 221 carries out feature extraction to face characteristic by the first convolutional network, and exports first Feature.It include the corresponding face low-level image feature of preliminary face key point coordinate and the first face semantic feature in fisrt feature.

Second feature extraction unit 222 carries out feature extraction to fisrt feature by the second convolutional network, and exports second Feature.It include the corresponding face low-level image feature of preliminary face key point coordinate and the second face semantic feature in second feature.

Third acquiring unit 223 is based on fisrt feature and second feature, obtains position sensing feature.

The function of each module in each device of the embodiment of the present invention may refer to the corresponding description in the above method, herein not It repeats again.

Figure 19 shows the structural block diagram of the terminal of determining face key point position according to an embodiment of the present invention.Such as Figure 19 institute Show, which includes: memory 910 and processor 920, and the calculating that can be run on processor 920 is stored in memory 910 Machine program.The processor 920 realizes the determination face key point position in above-described embodiment when executing the computer program Method.The quantity of the memory 910 and processor 920 can be one or more.

The terminal further include:

Communication interface 930 carries out the transmission that data determine face key point position for being communicated with external device.

Memory 910 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory), a for example, at least magnetic disk storage.

If memory 910, processor 920 and the independent realization of communication interface 930, memory 910,920 and of processor Communication interface 930 can be connected with each other by bus and complete mutual communication.The bus can be Industry Standard Architecture Structure (ISA, Industry Standard Architecture) bus, external equipment interconnection (PCI, Peripheral Component Interconnect) bus or extended industry-standard architecture (EISA, Extended Industry Standard Architecture) bus etc..The bus can be divided into address bus, data/address bus, control bus etc..For Convenient for indicating, only indicated with a thick line in Figure 19, it is not intended that an only bus or a type of bus.

Optionally, in specific implementation, if memory 910, processor 920 and communication interface 930 are integrated in one piece of core On piece, then memory 910, processor 920 and communication interface 930 can complete mutual communication by internal interface.

The embodiment of the invention provides a kind of computer readable storage mediums, are stored with computer program, the program quilt Processor realizes any the method in above-described embodiment when executing.

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.Moreover, particular features, structures, materials, or characteristics described It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this The technical staff in field can be by the spy of different embodiments or examples described in this specification and different embodiments or examples Sign is combined.

In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic." first " is defined as a result, the feature of " second " can be expressed or hidden It include at least one this feature containing ground.In the description of the present invention, the meaning of " plurality " is two or more, unless otherwise Clear specific restriction.

Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be of the invention Embodiment person of ordinary skill in the field understood.

Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the use device in conjunction with these instruction execution systems, device or equipment. The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electrical connection of one or more wirings Portion's (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM) can It wipes editable read-only memory (EPROM or flash memory), fiber device and portable read-only memory (CDROM). In addition, computer-readable medium can even is that the paper that can print described program on it or other suitable media, because can For example by carrying out optical scanner to paper or other media, then to be edited, be interpreted or when necessary with other suitable methods It is handled electronically to obtain described program, is then stored in computer storage.

It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..

Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.

It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In readable storage medium storing program for executing.The storage medium can be read-only memory, disk or CD etc..

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in its various change or replacement, These should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the guarantor of the claim It protects subject to range.

Claims

1. a kind of method of determining face key point position characterized by comprising

Based on the face low-level image feature in the preliminary face key point coordinate and described image, pass through position sensing feature extraction Network obtains position sensing feature, and the position sensing feature includes the corresponding face bottom of the preliminary face key point coordinate Feature and face semantic feature；

According to the position sensing feature, the preliminary face key point coordinate is modified, is determined in described image Final face key point coordinate.

2. the method according to claim 1, wherein further include:

Network, which is extracted, by fisrt feature obtains the face low-level image feature；

3. according to the method described in claim 2, it is characterized in that, obtaining the preliminary face in image by coordinate Recurrent networks Key point coordinate, comprising:

Based on the face semantic feature, the preliminary face key point in described image is obtained by the coordinate Recurrent networks and is sat Mark.

4. the method according to claim 1, wherein according to the position sensing feature, to the preliminary face Key point coordinate is modified, and determines the final face key point coordinate in described image, comprising:

The preliminary face key point coordinate is modified by the coordinate modification value, is determined final in described image Face key point coordinate.

5. the method according to claim 1, wherein the position sensing feature extraction network includes third feature It extracts network and convolutional network is passed through based on the face low-level image feature in the preliminary face key point coordinate and described image Position sensing feature extraction network obtains position sensing feature, comprising:

Based on the preliminary face key point coordinate and the face low-level image feature, network is extracted by the third feature and is obtained Face characteristic, the face characteristic include the corresponding face low-level image feature of the preliminary face key point coordinate；

6. according to the method described in claim 5, it is characterized in that, being based on the preliminary face key point coordinate and the face Low-level image feature extracts network by the third feature and obtains face characteristic, comprising:

According to each preliminary face key point coordinate for forming each human face, the position exposure mask of each human face is obtained；

Position exposure mask based on the face low-level image feature and each human face extracts network by the third feature Obtain the face characteristic.

7. a kind of device of determining face key point position characterized by comprising

Sensitive features obtain module, for special based on the face bottom in the preliminary face key point coordinate and described image Sign obtains position sensing feature by position sensing feature extraction network, and the position sensing feature includes the preliminary face The corresponding face low-level image feature of key point coordinate and face semantic feature；

Coordinate determining module, for being modified to the preliminary face key point coordinate, really according to the position sensing feature Make the final face key point coordinate in described image.

8. device according to claim 7, which is characterized in that further include:

Semantic feature obtains module, for being based on the face low-level image feature, extracts network by second feature and obtains face language Adopted feature.

9. device according to claim 8, which is characterized in that the coordinate obtaining module includes:

Coordinate acquisition submodule obtains described image by the coordinate Recurrent networks for being based on the face semantic feature In preliminary face key point coordinate.

10. device according to claim 7, which is characterized in that the coordinate determining module includes:

Correction value acquisition submodule, for obtaining coordinate by coordinate difference Recurrent networks and repairing according to the position sensing feature Positive value；

Coordinate determines submodule, for being modified by the coordinate modification value to the preliminary face key point coordinate, really Make the final face key point coordinate in described image.

11. device according to claim 7, which is characterized in that the sensitive features obtain module and include:

First acquisition submodule passes through third for being based on the preliminary face key point coordinate and the face low-level image feature Feature extraction network obtains face characteristic, and the face characteristic includes the corresponding face bottom of the preliminary face key point coordinate Feature；

12. device according to claim 11, which is characterized in that the first acquisition submodule includes:

First acquisition unit, for obtaining each face according to each preliminary face key point coordinate for forming each human face The position exposure mask of organ；

Second acquisition unit, the position exposure mask based on the face low-level image feature and each human face pass through described Three feature extraction networks obtain the face characteristic.

13. a kind of terminal of determining face key point position characterized by comprising

One or more processors；

Storage device, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processors Realize such as any one of claims 1 to 6 the method.

14. a kind of computer readable storage medium, is stored with computer program, which is characterized in that the program is held by processor Such as any one of claims 1 to 6 the method is realized when row.