CN109241835A

CN109241835A - Image processing method and device, electronic equipment and storage medium

Info

Publication number: CN109241835A
Application number: CN201810842970.4A
Authority: CN
Inventors: 邵志文; 马利庄; 刘志磊; 蔡剑飞
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2018-07-27
Filing date: 2018-07-27
Publication date: 2019-01-18

Abstract

This disclosure relates to a kind of image processing method and device, electronic equipment and storage medium.This method comprises: obtaining the crucial point feature of the characteristic information of target object and target object in image to be processed；According to crucial point feature, the positioning result of the key point of target object in image to be processed is determined；According to the characteristic information of target object, positioning result and crucial point feature, the object detection results of the motor unit of target object are determined.According to the embodiment of the present disclosure, the accuracy of positioning result and object detection results can be improved, to improve the accuracy for carrying out object analysis to target object in image to be processed.

Description

Image processing method and device, electronic equipment and storage medium

Technical field

This disclosure relates to field of computer technology more particularly to a kind of image processing method and device, electronic equipment and deposit Storage media.

Background technique

With the fast development of Internet technology, computer vision technique and is applied to every field, for example, available In the analysis task (for example, carrying out object analysis such as face registration, the detection of human face action unit etc.) for carrying out types of objects.So And in the related technology, the accuracy of object analysis task result need to be improved.

Summary of the invention

In view of this, the present disclosure proposes a kind of image processing techniques schemes.

According to the one side of the disclosure, a kind of image processing method is provided, which comprises

Obtain the crucial point feature of the characteristic information of target object and the target object in image to be processed；

According to the crucial point feature, the positioning knot of the key point of target object described in the image to be processed is determined Fruit；

According to the characteristic information of the target object, the positioning result and the crucial point feature, the mesh is determined Mark the object detection results of the motor unit of object.

In one possible implementation, according to the characteristic information of the target object, the positioning result and institute Crucial point feature is stated, determines the object detection results of the motor unit of the target object, comprising:

According to the characteristic information of the target object and the positioning result, the motor unit of the target object is determined Local feature；

According to the local feature and the crucial point feature, the object detection results of the motor unit are determined.

In this way, passing through the characteristic information and the positioning result of target object, the part of obtained motor unit is determined Feature accuracy is high, so that determining obtained motor unit according to the local feature and the crucial point feature Object detection results have high accuracy.

In one possible implementation, the characteristic information of target object and the target in image to be processed are obtained The crucial point feature of object, comprising:

Feature extraction is carried out to the image to be processed, obtains the characteristic information of target object in the image to be processed；

Key point feature extraction is carried out to the characteristic information of the target object, the key point for obtaining the target object is special Sign.

In such manner, it is possible to accurately obtain the characteristic information of target object and the crucial point feature of target object.

In one possible implementation, according to the characteristic information of the target object and the positioning result, really The local feature of the motor unit of the fixed target object, comprising:

According to the positional relationship of the key point of the central point and target object of the positioning result and motor unit, determine The initial attention characteristic pattern of the motor unit of the target object；

Process of convolution is carried out to the initial attention characteristic pattern, the attention characteristic pattern that obtains that treated；

According to treated attention characteristic pattern and the characteristic information of the target object, the motor unit is determined Local feature.

In this way, being distributed by the attention that adaptive attention mode of learning learns motor unit, and according to attention point Cloth binding characteristic information, the accuracy of the local feature of determining motor unit are higher.

In one possible implementation, according to the feature of treated attention characteristic pattern and the target object Information determines the local feature of the motor unit, comprising:

According to the characteristic information of treated attention characteristic pattern and the target object, the power that gains attention content；

Feature extraction processing is carried out to the attention content, obtains the local feature.

In such manner, it is possible to the local feature for the motor unit for guaranteeing that loss of spatial information is less, model parameter is simple, and extracting It is more accurate, to improve the detection accuracy of the object detection results of moving cell.

In one possible implementation, the target object includes multiple motor units,

Wherein, according to the local feature and the crucial point feature, the target detection knot of the motor unit is determined Fruit, comprising:

Fusion treatment is carried out to the local feature of multiple motor units, obtains fused local feature；

According to the crucial point feature of fused local feature and the target object, the multiple motor unit is determined Object detection results.

In this way, the space characteristics extracted can preferably be kept, to improve the accuracy of object detection results.

In one possible implementation, the method utilizes neural fusion,

Wherein, feature extraction is carried out to the image to be processed, obtains the feature of target object in the image to be processed Information, comprising:

The image to be processed is inputted and carries out feature extraction in the feature extraction network of the neural network, is obtained described The characteristic information of target object in image to be processed.

In one possible implementation, the method utilizes neural fusion, and the neural network is according to wait locate Reason image training obtains.

The same of neural fusion positioning result and object detection results end to end is obtained by image to be processed training Step prediction, improves the accuracy and intelligence of image procossing.

In one possible implementation, include: according to the step of image training neural network to be processed

Feature extraction network and key point feature that the image to be processed is inputted respectively in the neural network are mentioned It takes in network and is handled, it is special to obtain the key point of the characteristic information of target object and the target object in image to be processed Sign；

First that the crucial point feature is inputted in the neural network, which detects in network, to be handled, and determines the mesh Mark the positioning result of the key point of object；

The characteristic information of the target object, the positioning result and the crucial point feature are inputted into the nerve net It is handled in the second detection network in network, determines the object detection results of the motor unit of the target object；

According to the positioning result, the markup information of the positioning result, the object detection results and the target The markup information of testing result determines the model loss of the neural network；

It is lost according to the model, adjusts the network parameter values of the neural network.

In this way, it can train to obtain the positioning result and motor unit that can accurately obtain key point The neural network of object detection results.

In one possible implementation, by the characteristic information of the target object, the positioning result and described It is handled in the second detection network that crucial point feature inputs in the neural network, determines the movement list of the target object The object detection results of member, comprising:

The characteristic information of the target object and the positioning result are inputted to the local feature in the neural network It extracts and is handled in network, determine the local feature of the motor unit of the target object；

By the local feature and the crucial point feature input in the second detection network in the neural network into Row processing, determines the object detection results of the motor unit.

In such manner, it is possible to accurately get the local feature of motor unit, the object detection results of moving cell are improved Detection accuracy.

In one possible implementation, the characteristic information of the target object and the positioning result are inputted into institute It states and is handled in the local shape factor network in neural network, determine that the part of the motor unit of the target object is special Sign, comprising:

The initial attention that the positioning result inputs in the neural network is generated in network and is handled, according to institute The positional relationship for stating the central point of positioning result and motor unit and the key point of target object determines the target object The initial attention characteristic pattern of motor unit；

The fixed reference feature that the initial attention characteristic pattern inputs in the neural network is extracted in network and carries out convolution Processing, the attention characteristic pattern that obtains that treated；

The characteristic information of treated the attention characteristic pattern and the target object is inputted into the neural network In local shape factor network in handled, determine the local feature of the moving cell of the target object,

Wherein, according to the positioning result, the markup information of the positioning result, object detection results and described The markup information of object detection results determines the model loss of the neural network, comprising:

According to the positioning result, the markup information of the positioning result, the object detection results, the target detection As a result the initial attention weight of each element and treated the note in markup information, the initial attention characteristic pattern The attention weight of each element in meaning power characteristic pattern determines the model loss of the neural network.

By the adaptive attention study of neural network, the local feature accuracy of obtained each motor unit is higher, energy Enough improve the accuracy of the robustness of image processing method and the object detection results of motor unit.

In one possible implementation, the feature extraction network includes at least one convolution group, each convolution group Including at least one convolutional layer and at least one convolution subgroup, the convolution subgroup includes multiple convolution sublayers, each convolution Layer includes the subregion of different number, and the convolution nuclear parameter of each convolution sublayer difference subregion is different.

In this way, convolution sublayer is divided into multiple subregions, local feature can be preferably extracted.Each convolution sublayer quilt It is divided into the subregion of different number, convenient for extracting various sizes of local feature, to adapt to different size of motor unit, Finer, more fully feature can be extracted, to improve the accuracy of the object detection results of subsequent action unit.Meanwhile This residual error structure can reduce the probability of occurrence of gradient disperse problem in training process, improve the stability of network and accurate Property.

According to another aspect of the present disclosure, a kind of image processing apparatus is provided, described device includes:

Module is obtained, for obtaining the key of the characteristic information of target object and the target object in image to be processed Point feature；

Positioning result determining module, for determining target described in the image to be processed according to the crucial point feature The positioning result of the key point of object；

Object detection results determining module, for according to the characteristic information of the target object, the positioning result and The key point feature, determines the object detection results of the motor unit of the target object.

In one possible implementation, the object detection results determining module includes:

First determines that submodule determines institute for the characteristic information and the positioning result according to the target object State the local feature of the motor unit of target object；

Second determines submodule, for determining that the movement is single according to the local feature and the crucial point feature The object detection results of member.

In one possible implementation, the acquisition module includes:

First acquisition submodule obtains in the image to be processed for carrying out feature extraction to the image to be processed The characteristic information of target object；

Second acquisition submodule carries out key point feature extraction for the characteristic information to the target object, obtains institute State the crucial point feature of target object.

In one possible implementation, described first determine that submodule includes:

Third determines submodule, for according to the central point of the positioning result and motor unit and the pass of target object The positional relationship of key point determines the initial attention characteristic pattern of the motor unit of the target object；

Third acquisition submodule, for obtaining to the initial attention characteristic pattern progress process of convolution, treated is infused Meaning power characteristic pattern；

4th determines submodule, for the feature letter according to treated attention characteristic pattern and the target object Breath, determines the local feature of the motor unit.

In one possible implementation, the described 4th determine that submodule includes:

4th acquisition submodule, for the characteristic information according to treated attention characteristic pattern and the target object, The power that gains attention content；

5th acquisition submodule obtains the local feature for carrying out feature extraction processing to the attention content.

Wherein, described second determine that submodule includes:

6th acquisition submodule carries out fusion treatment for the local feature to multiple motor units, obtains fused Local feature；

5th determines submodule, for the crucial point feature according to fused local feature and the target object, Determine the object detection results of the multiple motor unit.

In one possible implementation, described device utilizes neural fusion,

Wherein, first acquisition submodule includes:

7th acquisition submodule, for by the image to be processed input in the feature extraction network of the neural network into Row feature extraction obtains the characteristic information of target object in the image to be processed.

In one possible implementation, described device utilizes neural fusion, and the neural network is according to wait locate Reason image training obtains.

In one possible implementation, described device includes:

Feature obtains module, for the image to be processed to be inputted to the feature extraction network in the neural network respectively And handled in key point feature extraction network, obtain the characteristic information of target object and the mesh in image to be processed Mark the crucial point feature of object；

First determining module, for by the crucial point feature input in the first detection network in the neural network into Row processing, determines the positioning result of the key point of the target object；

Second determining module, for by the characteristic information of the target object, the positioning result and the key point It is handled in the second detection network that feature inputs in the neural network, determines the mesh of the motor unit of the target object Mark testing result；

Third determining module, for the markup information according to the positioning result, the positioning result, the target detection As a result and the markup information of the object detection results, the model loss of the neural network is determined；

Parameter adjustment module adjusts the network parameter values of the neural network for losing according to the model.

In one possible implementation, second determining module includes:

6th determines submodule, for the characteristic information of the target object and the positioning result to be inputted the mind Through being handled in the local shape factor network in network, the local feature of the motor unit of the target object is determined；

7th determines submodule, for inputting the local feature and the crucial point feature in the neural network Second detection network in handled, determine the object detection results of the motor unit.

In one possible implementation, the described 6th determine that submodule includes:

8th determines submodule, generates net for the positioning result to be inputted the initial attention in the neural network It is handled in network, is closed according to the position of the key point of the central point and target object of the positioning result and motor unit System, determines the initial attention characteristic pattern of the motor unit of the target object；

8th acquisition submodule, for the initial attention characteristic pattern to be inputted the fixed reference feature in the neural network It extracts in network and carries out process of convolution, the attention characteristic pattern that obtains that treated；

9th determines submodule, for believing the feature of treated the attention characteristic pattern and the target object It is handled in the local shape factor network that breath inputs in the neural network, determines the moving cell of the target object Local feature,

Wherein, the third determining module includes:

Tenth determines submodule, for being examined according to the markup information of the positioning result, the positioning result, the target Survey result, the markup information of the object detection results, in the initial attention characteristic pattern each element initial attention power The attention weight of each element in weight and treated the attention characteristic pattern determines the model damage of the neural network It loses.

According to another aspect of the present disclosure, a kind of electronic equipment is provided, comprising:

Processor；

Memory for storage processor executable instruction；

Wherein, the processor is configured to: execute above-mentioned image processing method.

According to another aspect of the present disclosure, a kind of computer readable storage medium is provided, computer journey is stored thereon with Sequence instruction, the computer program instructions realize above-mentioned image processing method when being executed by processor.

The dynamic of target object is determined by the positioning result of the key point of combining target object according to the embodiment of the present disclosure The object detection results for making unit can be improved positioning result using the relevance between positioning result and object detection results And the accuracy of object detection results, to improve the accuracy for carrying out object analysis to target object in image to be processed.

It should be understood that above general description and following detailed description is only exemplary and explanatory, rather than Limit the disclosure.

According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become It is clear.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and those figures show meet this public affairs The embodiment opened, and together with specification it is used to illustrate the technical solution of the disclosure.

Fig. 1 shows the flow chart of the image processing method according to the embodiment of the present disclosure.

Fig. 2 shows the schematic diagrames according to the neural network of the image processing method of the embodiment of the present disclosure.

Fig. 3 shows the schematic diagram of the convolution group according to the image processing method of the embodiment of the present disclosure.

Fig. 4 shows the schematic diagram of the convolution group according to the image processing method of the embodiment of the present disclosure.

Fig. 5 shows the schematic diagram of the application scenarios of the image processing method according to the embodiment of the present disclosure.

Fig. 6 is shown according to the initial attention characteristic pattern of motor unit of the image processing method of the embodiment of the present disclosure and place The schematic diagram of attention characteristic pattern after reason.

Fig. 7 shows the schematic diagram of the application scenarios of the image processing method according to the embodiment of the present disclosure.

Fig. 8 shows the flow chart of training neural network in the image processing method according to the embodiment of the present disclosure.

Fig. 9 shows the block diagram of the image processing apparatus according to the embodiment of the present disclosure.

Figure 10 shows the block diagram of the image processing apparatus according to the embodiment of the present disclosure.

Figure 11 shows the block diagram of the electronic equipment according to the embodiment of the present disclosure.

Figure 12 shows the block diagram of the electronic equipment according to the embodiment of the present disclosure.

Specific embodiment

Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove It non-specifically points out, it is not necessary to attached drawing drawn to scale.

Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary " Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.

The terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates that there may be three kinds of passes System, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.In addition, herein Middle term "at least one" indicate a variety of in any one or more at least two any combination, it may for example comprise A, B, at least one of C can indicate to include any one or more elements selected from the set that A, B and C are constituted.

In addition, giving numerous details in specific embodiment below to better illustrate the disclosure. It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.

Fig. 1 shows the flow chart of the image processing method according to the embodiment of the present disclosure.This method can be applied to electronic equipment In, equipment which may be provided as terminal, server or other forms.Wherein, terminal can be user equipment (User Equipment, UE), mobile device, user terminal, cellular phone, wireless phone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, calculate equipment, mobile unit, wearable device etc., the disclosure does not make this Limitation.In some possible implementations, which can call the calculating stored in memory by processor The mode of machine readable instruction is realized.As shown in Figure 1, the image processing method according to the embodiment of the present disclosure includes:

In step s101, the key of the characteristic information of target object and the target object in image to be processed is obtained Point feature；

In step s 102, according to the crucial point feature, the pass of target object described in the image to be processed is determined The positioning result of key point；

In step s 103, special according to the characteristic information of the target object, the positioning result and the key point Sign, determines the object detection results of the motor unit of the target object.

Wherein, image to be processed can be true picture, for example, can be original image or image after treatment. Target object can be the object in a certain area image in image to be processed.For example, image to be processed can be to original Beginning image carries out the image obtained after intercepting process, for example, intercepting the obtained image of target object according to certain rule.Wait locate Reason image be also possible to include target object original image, the disclosure to this with no restriction.Key point can be target object On the point that acquires a special sense, can be used for defining shape, appearance of target object etc..For example, the key point of face can be people The point of certain specific positions (for example, eyes, eyebrow etc.) on the face, can be used for defining face shape, expression appearance etc..Motor unit (Action Unit, AU) can be used for indicating the muscle movement on target object.For example, human face action unit can indicate certain Muscle movement in face location, can be used for it is accurate, human face expression is objectively described.The disclosure is to the key point of target object Quantity and definition rule, the quantity of motor unit and definition rule etc. are with no restriction.

In one possible implementation, the target object in object to be processed may include the object of any classification, example Such as, face, animal face, article etc., the disclosure to this with no restriction.Below for ease of understanding, using face as target object, to Processing image is illustrated for facial image.

For example, original image can be intercepted according to certain rule, obtains image to be processed.For example, people can be passed through Face detects the face location in the various ways such as code, face registration codes detection original image, determines multiple features of face Point (key point).It should be understood that can locating human face in several ways key point, for example, can be positioned manually, Face datection generation Code, face registration codes etc., the disclosure to this with no restriction.

It in one possible implementation, can be according to multiple key points of face (for example, pupil of left eye, right eye pupil Hole, nose, the left corners of the mouth, the right corners of the mouth), by similarity transformation (for example, rotation, translation, uniformly scaling etc.), do not changing face shape Under conditions of shape and expression, interception original image obtains image to be processed, and image to be processed can be facial image.

For example, can be by way of image rotation, so that pupil of both eyes keeps horizontal, and along this 5 crucial dots At boundary rectangle frame amplification, cut obtain the facial image (image to be processed) of target size (for example, L × L).

As shown in Figure 1, in step s101, obtaining the characteristic information of target object and the target in image to be processed The crucial point feature of object.

For example, feature extraction can be carried out to image to be processed (facial image), obtains sharing feature.The shared spy Sign can be used as the characteristic information of target object.The sharing feature can also be used in the crucial point feature for determining target object, for example, right Sharing feature carries out key point feature extraction, obtains crucial point feature.

In some alternative embodiments, feature extraction can also be carried out to image to be processed respectively, obtains figure to be processed The characteristic information of target object and the crucial point feature of target object as in.For example, it can be got according to expectation The classification of feature carries out feature extraction to image to be processed respectively.For example, feature extraction can be carried out to image to be processed, obtain Take the characteristic information of target object.Key point feature extraction can be carried out to image to be processed, obtain crucial point feature.The disclosure The mode for obtaining the crucial point feature of the characteristic information and target object of target object in image to be processed is not limited System.

In one possible implementation, step S101 may include:

For example, it can be based on deep learning algorithm, for example, nerve net can be obtained according to image to be processed training Network.Feature extraction can be carried out to image to be processed using the feature extraction network of trained neural network, obtain it is described to Handle the characteristic information (sharing feature) of target object in image.It is carried out by the characteristic information (sharing feature) to target object Key point feature extraction obtains the crucial point feature of the target object.In such manner, it is possible to more quickly, accurately obtain target The characteristic information of object and the crucial point feature of target object.

Fig. 2 shows the schematic diagrames according to the neural network of the image processing method of the embodiment of the present disclosure.A kind of possible In implementation, as shown in Fig. 2, the neural network includes feature extraction network, this feature extracts network can be to described wait locate It manages image and carries out feature extraction, obtain the characteristic information (sharing feature) of target object in the image to be processed.The nerve Network may also include key point feature extraction network, and sharing feature inputs key point feature extraction network and carries out feature extraction, can Obtain the crucial point feature of target object.

In one possible implementation, feature extraction is carried out to the image to be processed, obtains the figure to be processed As in the step of the characteristic information of target object, may include:

Wherein, feature extraction network can be the arbitrary network structure that feature extraction can be carried out to image to be processed, example Such as, feature extraction network may include convolutional layer, by carrying out process of convolution to image to be processed, obtain characteristic information.Wherein, special Sign, which extracts the convolutional layer that network includes, can have arbitrary form and structure.For example, feature extraction network may include at least one A convolution group, each convolution group may include at least one convolutional layer.Every grade of convolutional layer may include one or more convolution Layer etc., the disclosure to this with no restriction.

For example, feature extraction network may include two convolution groups, for example, respectively MR1 (L, L, c) and MR2 (L/2, L/2,2c), described two convolution group series connection, for carrying out feature extraction to image to be processed.Wherein, c represents convolution The number of filter of layer, and filter the port number of the characteristic pattern generated.Wherein, MR1 can be used for extracting image to be processed compared with office Portion (for example, corner angle, edge etc.) feature.MR2 can be used for extracting the feature of image higher level to be processed (for example, fritter etc.).MR2 It, can effective lifting feature extraction effect by compared with multi-filter.A maximum pond layer is connected with after each convolution group.It is maximum Pond layer can be used for carrying out down-sampling to the feature extracted, and reduce characteristic dimension.Each convolutional layer can be applied batch and normalize With rectification linear unit, feature extraction network exports the characteristic information of target object in image to be processed.

Fig. 3 shows the schematic diagram of the convolution group according to the image processing method of the embodiment of the present disclosure.In a kind of possible reality In existing mode, as shown in figure 3, a convolution group includes a convolutional layer and a convolution subgroup, the convolution subgroup includes Four convolution sublayers.Four convolution sublayers are divided evenly the subregion for different number respectively, for example, being divided evenly respectively For 1 × 1,2 × 2,4 × 4 and 8 × 8 sub-regions.Each subregion can share a convolution kernel, the different sons of each convolution sublayer The convolution nuclear parameter in region is different (for example, the value of some weights of convolution kernel is different).

For example, image to be processed is input in feature extraction network, and the convolutional layer of first convolution group is to be processed Image carries out process of convolution, obtains some intermediate features, which is input in convolution subgroup, respectively by four convolution Layer carries out process of convolution to intermediate features, the output result of four convolution sublayers be overlapped to obtain the first Superposition Characteristics (for example, C shown in Fig. 3 indicates that superposition obtains the first Superposition Characteristics), the first Superposition Characteristics and intermediate features progress Element-Level be added (for example, Corresponding element is added), it obtains carrying out the characteristic information that process of convolution obtains through first convolution group.It should be understood that first Convolution group carries out the characteristic information that process of convolution obtains and handles by second convolution group (for example, treatment process can be the same as first Convolution group), output is the characteristic information of target object.

In this way, convolution sublayer is divided into multiple subregions, local feature can be preferably extracted.Each convolution sublayer quilt It is divided into the subregion of different number, convenient for extracting various sizes of local feature, to adapt to different size of motor unit, Finer, more fully feature can be extracted, to improve the accuracy of the object detection results of subsequent action unit.Meanwhile It should be understood that the first Superposition Characteristics are added with intermediate features progress Element-Level may make up residual error structure, can reduce in training process The probability of occurrence of gradient disperse problem improves the stability and accuracy of network.

It should be noted that a convolution group may include diversified forms, as it was noted above, multiple convolution sublayers can be Convolution with sizes (quantity of divided subregion is different) on the same level (the same convolution subgroup) Layer.Multiple convolution sublayers can also be the convolution sublayer with multiple sizes in different levels.

Fig. 4 shows the schematic diagram of the convolution group according to the image processing method of the embodiment of the present disclosure.In a kind of possible reality In existing mode, as shown in figure 4, convolution subgroup includes 3 convolution sublayers, for example, respectively convolution sublayer 1, convolution sublayer 2 and Convolution sublayer 3.This 3 convolution sublayers are divided evenly respectively as 2 × 2,4 × 4 and 8 × 8 sub-regions (3 convolution sublayers With sizes).As shown in figure 4, this 3 convolution sublayers are the convolution sublayer in different levels.For example, first convolutional layer The intermediate features of output are input in convolution sublayer 1 and are handled, convolution sublayer 1 export feature input convolution sublayer 2 in into Row processing, the feature that convolution sublayer 2 exports are inputted in convolution sublayer 3 and are handled.By convolution sublayer 1, convolution sublayer 2 and volume The output of product sublayer 3 is overlapped, and the output result of other convolutional layers of the result being superimposed and the convolution group carries out element Grade is added.

In this way, the layered structure of convolution subgroup can effectively expand the receptive field of convolution kernel, it is conducive to extract more comprehensive Characteristic information.As long as feature extraction, structure, feature of the disclosure to feature extraction network can be carried out to image to be processed Extract network include the quantity of convolution group, the structure of each convolution group, the level of each convolution subgroup and structure, each convolution sublayer number The quantity for the subregion that amount, each convolution sublayer include and division mode, the convolution nuclear parameter of each convolution sublayer difference subregion etc. With no restriction.

In one possible implementation, key point feature extraction is carried out to the characteristic information of the target object, obtained To the crucial point feature of the target object.

For example, as shown in Fig. 2, the neural network may also include key point feature extraction network, wherein key point Feature extraction network feature extraction network as previously described can be the arbitrary network structure for including convolutional layer, the disclosure pair This is with no restriction.

In some alternative embodiments, key point feature extraction network may include 5 convolutional layers, for example, this 5 volumes Lamination is connected, and global average pond layer is connected after the 5th convolutional layer, which is averaged pond layer can be to the 5th convolutional layer output Result carry out the average pond of entire spatial domain.

For example, feature extraction will be carried out in the characteristic information input key point feature extraction network of target object, point Not Jing Guo 5 convolutional layers carry out process of convolution, the output result of the 5th convolutional layer carries out global average pond, obtains target pair The crucial point feature of elephant.

In this way, the structure of key point feature extraction network is simple, and global average pondization can preferably keep to be processed The space characteristics of image.The object detection results of the motor unit of the positioning result and target object of the key point of target object It is closely related with the space characteristics of target object, it is fixed that the crucial point feature of the target object extracted in this way can be improved key point The prediction accuracy of the object detection results of position result and motor unit.

It should be understood that key point feature extraction network can also be other network structures, for example, it is also possible to include 3 convolution Group, 3 convolution group series connection, each convolution group include 2 convolutional layers, one maximum pond layer of connection etc. after each convolution group.Only Want can to extract the crucial point feature of target object, the disclosure is special to structure, the key point of key point feature extraction network It includes convolution group, the quantity of convolutional layer, the structure of each convolution group, the structure of convolutional layer, classification of pond layer etc. that sign, which extracts network, With no restriction.

As shown in Figure 1, in step s 102, according to the crucial point feature, determining mesh described in the image to be processed Mark the positioning result of the key point of object.

For example, the crucial point feature for the target object that can be will acquire is input to full articulamentum (for example, full connection Layer dimension be 2 × key point number) in handled, obtain the positioning result of the key point of target object.For example, positioning As a result can be the horizontally and vertically coordinate etc. of key point, the disclosure to this with no restriction.For example, as shown in Fig. 2, being positioned As a result.

As shown in Figure 1, in step s 103, according to the characteristic information of the target object, the positioning result and institute Crucial point feature is stated, determines the object detection results of the motor unit of the target object.

For example, the positioning result of the key point of determination described previously can be acted on to the feature letter of target object Breath, for example, the region of interest ROI of the motor unit of target object can be intercepted out according to the positioning result of key point (Region of Interest).And ROI and crucial point feature based on motor unit, determine the movement of the target object The object detection results etc. of unit.The disclosure is to according to the characteristic information of the target object, the positioning result and described Crucial point feature determines the mode of the object detection results of the motor unit of the target object with no restriction.

In one possible implementation, step S103 may include:

It for example, as it was noted above, can be according to the characteristic information and the positioning result of the target object, really Determine motor unit ROI.The local feature of motor unit can be determined based on the ROI of motor unit.For example, to the mesh intercepted out The motor unit ROI for marking object carries out feature extraction, obtains the local feature of motor unit.It can be according to the part of motor unit Feature and crucial point feature, determine the object detection results of motor unit.

In this way, passing through the characteristic information and the positioning result of target object, the part of obtained motor unit is determined Feature accuracy is high, so that determining obtained motor unit according to the local feature and the crucial point feature Object detection results have high accuracy.The disclosure is to according to the characteristic information of the target object and positioning knot Fruit determines the mode of the local feature of the motor unit of the target object, according to the local feature and the key point Feature determines the mode of the object detection results of the motor unit with no restriction.

Wherein, the key point of target object may include one or more, and motor unit also may include one or more. The ROI of motor unit can be the region for noting also that power distribution (for example, each pixel attention weight is identical), or different Attention is distributed the region of (for example, each pixel attention weight is not exactly the same).It is multiple in the motor unit of target object When, the ROI of each motor unit can be fixed dimension, or different sizes (such as each motor unit ROI shape is not advised Then), the disclosure to this with no restriction.

In one possible implementation, according to the characteristic information of the target object and the positioning result, really The step of local feature of the motor unit of the fixed target object, may include:

For example, as shown in Fig. 2, the neural network further includes that initial attention generates network.The initial attention Power generates network and can be closed according to the position of the key point of the central point and target object of the positioning result and motor unit System, determines the initial attention characteristic pattern of the motor unit of the target object.

The position of the central point of the illustrative motor unit of the embodiment of the present disclosure and the key point of target object is given below Relationship, as illustrated in chart 1:

Table 1

Motor unit serial number	Motor unit title	Motor unit center
			7	Eyelid tightening	Eye center
10	Upper lip raises up	Upper lip center
			12	The corners of the mouth elongates	The corners of the mouth
14	Dimple	The corners of the mouth
			15	The corners of the mouth forces down	The corners of the mouth

Now by taking motor unit 12 as an example, as shown in table 1, the center of the motor unit 12 is the corners of the mouth.For example, key point is determined It include the positioning result (for example, coordinate) of this key point of the corners of the mouth in the result of position.Can according to the coordinate of the determining corners of the mouth and Positional relationship between the corners of the mouth and motor unit 12 defines the initial attention characteristic pattern of motor unit 12.

In some alternative embodiments, each element that can define the attention characteristic pattern of motor unit is initialized as 0, And according to the positioning result of the key point in positioning result, the center of motor unit is determined.It, can be with according to the center of motor unit Define the ROI of the motor unit.The ROI of the motor unit may include two symmetrical subregions, can determine each motor unit Two sub-regions in each element initial attention weight, obtain the element for needing to update attention weight, and generate each dynamic Make the initial attention characteristic pattern of unit.Wherein, the size of the initial attention characteristic pattern of each motor unit can are as follows: L/4 × L/4 ×1。

The disclosure is to the position according to the key point of the central point and target object of the positioning result and motor unit Relationship determines the concrete mode of the initial attention characteristic pattern of the motor unit of the target object, initial attention characteristic pattern Size etc. with no restriction.

The attention of k-th of element in the subregion of illustrative i-th of the motor unit of the embodiment of the present disclosure is given below The determination formula (1) of weight:

In formula (1), v_ikIndicate the attention weight of k-th of element in the subregion of i-th of motor unit, d_ikTable Show k-th of element to motor unit subregion center manhatton distance.ζ indicates the width and attention characteristic pattern of subregion The ratio of width, ξ are a coefficient, ξ >=0.n_auIndicate the number of motor unit.I is variable, and the value of i arrives n 1_auBetween, ζ and ξ is the hyper parameter pre-set.

Wherein, ζ can be used for determining the size of the ROI of each motor unit, for example, according to ζ, motor unit subregion center with And the width of attention characteristic pattern, it can determine the size of the ROI of each motor unit.

It should be noted that if some element, belongs to the lap of two sub-regions of ROI, then it can be according to formula (1) Two attention weights are acquired respectively, take attention weight of the biggish value as the element in two attention weights.It can be with The ROI of motor unit is defined with the attention weight of the element of exterior domain as 0.

In this way, the available initial attention characteristic pattern with the distribution of different attentions.The disclosure is to according to described fixed The positional relationship of the key point of the central point and target object of position result and motor unit, determines the movement of the target object The mode of initial attention characteristic pattern, the size of initial attention characteristic pattern, the method for determination of attention weight, region of unit Width and the ratio size of attention characteristic pattern, the value of ξ this coefficient etc. with no restriction.

Fig. 5 shows the schematic diagram of the application scenarios of the image processing method according to the embodiment of the present disclosure.A kind of possible In implementation, as shown in figure 5, target object (face) includes multiple motor units.According to determining positioning result (for example, The coordinate of the multiple key points of face) and motor unit central point and target object key point positional relationship, can be true The initial attention characteristic pattern of the motor unit of face is determined, for example, as shown in figure 5, determining the initial attention of multiple motor units Power characteristic pattern 51.

Process of convolution is carried out to the initial attention characteristic pattern, the attention characteristic pattern that obtains that treated.

For example, as shown in Fig. 2, the neural network further includes that fixed reference feature extracts network, which is extracted Network can carry out attention optimization processing to initial attention characteristic pattern.For example, the fixed reference feature extract network can be such as institute above The feature extraction network stated is the arbitrary network structure for including convolutional layer.For example, may include a convolution as previously described Group (for example, convolution group shown in Fig. 3) can carry out process of convolution to the initial attention characteristic pattern, by a filter Convolutional layer output treated the attention characteristic pattern that (port number) is 1.

In this way, multiple convolution sublayers of convolution subgroup include the subregion of different number, can be in different offices Portion region optimizes transformation using different attention, and is suitable for different size of motor unit, the attention that improves that treated The effect of optimization of characteristic pattern.Fixed reference feature, which extracts network, to be other forms, such as may include multiple convolutional layer (examples Such as, concatenated 3 convolutional layers) etc., the disclosure carries out the mode of process of convolution, convolutional layer to the initial attention characteristic pattern Structure and form etc. with no restriction.For example, as shown in figure 5, the initial attention characteristic pattern 51 to multiple motor units carries out Process of convolution obtains multiple treated attention characteristic patterns 52.

Fig. 6 is shown according to the initial attention characteristic pattern of motor unit of the image processing method of the embodiment of the present disclosure and place The schematic diagram of attention characteristic pattern after reason.In one possible implementation, as shown in fig. 6, the first row is 6 dynamic respectively Make the initial attention characteristic pattern of unit, the second row is that this 6 initial attention characteristic patterns are obtained by process of convolution respectively Treated attention characteristic pattern.As shown in fig. 6, respectively treated attention characteristic pattern according to the position of respective action unit from The size and attention weight of ROI are adaptively adjusted, the shape of the ROI of each motor unit is irregular, and its edge smoothing Ground is transitioned into peripheral region.

It for example, can be according to processing as shown in Fig. 2, the neural network may include local shape factor network The characteristic information of attention characteristic pattern and the target object afterwards inputs local shape factor network, determines that the movement is single The local feature of member.

It, can treated that attention is special according to each motor unit as it was noted above, may include multiple motor units The characteristic information of sign figure and the target object, determines the local feature of each motor unit respectively.In this way, should be noted that by adaptive Power mode of learning learns the attention distribution of motor unit, and is distributed binding characteristic information, determining movement list according to attention The accuracy of the local feature of member is higher.The disclosure is to the spy according to treated attention characteristic pattern and the target object Reference breath, determines the mode of the local feature of the motor unit with no restriction.

In one possible implementation, according to the feature of treated attention characteristic pattern and the target object Information the step of determining the local feature of the motor unit, may include:

For example, the characteristic information of treated attention characteristic pattern and the target object can be subjected to Element-Level It is multiplied.For example, corresponding element multiplication, the power that gains attention content.Feature extraction processing can be carried out to the attention content, Obtain the local feature.For example, can include the feature extraction structure of 5 series connection convolutional layers to attention content by one Feature extraction processing is carried out, the local feature is obtained.For example, it is single to respectively obtain each movement when including multiple motor units The local feature of member.For example, as shown in figure 5, respectively according to each treated attention characteristic pattern and the target object Characteristic information (for example, it may be the characteristic information extracted by feature extraction network), determines the part of multiple motor units Feature 53.

In such manner, it is possible to the local feature for the motor unit for guaranteeing that loss of spatial information is less, model parameter is simple, and extracting It is more accurate, to improve the detection accuracy of the object detection results of moving cell.The disclosure is carried out to the attention content Feature extraction processing, obtains the mode of the local feature with no restriction.

In one possible implementation, it according to the local feature and the crucial point feature, determines described dynamic Make the object detection results of unit.

It for example, as shown in Fig. 2, can be special in conjunction with the local feature of motor unit and the key point of target object Sign, determines the object detection results of the motor unit.The disclosure to according to the local feature and the crucial point feature, Determine the mode of the object detection results of the motor unit with no restriction.

In one possible implementation, the target object includes multiple motor units, according to the local feature And the crucial point feature, the step of determining the object detection results of the motor unit, may include:

For example, the local feature of obtained multiple motor units can be subjected to fusion treatment, for example, carrying out element Grade is added (respective pixel addition), obtains fused local feature.For example, as shown in figure 5, by the part of multiple motor units Feature carries out Element-Level addition, obtains fused local feature 54, which can be used for determining final Object detection results.

For example, can be determined described more according to the crucial point feature of fused local feature and the target object The object detection results of a motor unit.For example, the crucial point feature for stating target object and fused local feature are carried out Element-Level is added, and carries out global average pond by pond layer, is n by the result input dimension of pond layer output_au(for movement The number of unit) full articulamentum in handled, the object detection results of multiple motor units are obtained, for example, obtaining one Two classification results of multi-tag.

In this way, the space characteristics extracted can preferably be kept, to improve the accuracy of object detection results.The disclosure To the mode of fusion treatment, the form of local feature, the form of object detection results, according to fused local feature and institute The crucial point feature for stating target object determines mode of the object detection results of the multiple motor unit etc. with no restriction.

It should be understood that the above method can be adapted for the scene that object detection results are determined using trained neural network, It is readily applicable to train the process of neural network, the embodiment of the present disclosure does not limit this.In a kind of possible implementation In, before determining object detection results using trained neural network, it may include according to the image to be processed training mind The step of through network.

Fig. 7 shows the schematic diagram of the application scenarios of the image processing method according to the embodiment of the present disclosure.A kind of possible In implementation, as shown in fig. 7, before according to the image to be processed training neural network and passing through trained mind Before determining object detection results through network, data prediction can be carried out.Now according to the image to be processed training nerve It carries out being illustrated for data prediction before network.

For example, it can establish human face action cell data library.For example, 20 males of recruitment and 20 women, each People is induced to spontaneously produce different expressions in 8 different tasks, is acquired by camera and obtains 2D video, each view Frequency obtains 500 frames by screening, and picture sum is 40 × 8 × 500=160000.Multiple movements are marked to every face picture Unit (such as 12).For example, occurring the motor unit in face picture, then the markup information of the motor unit is 1, if not having There is the motor unit, then the markup information of the motor unit is 0.

In addition, carrying out Face datection and crucial point location to every face picture, multiple face key point (examples are marked Such as, 49).For example, marking the horizontally and vertically coordinate of multiple face key point.At can be to 160000 pictures Reason, for example, then positioning 5 face key points: pupil of left eye, pupil of right eye, nose as it was noted above, first detect face location Sharp, the left corners of the mouth, the right corners of the mouth, the face similarity transformation rotated, translated, uniformly scaled, are not changing face shape and expression While face is normalized, facial image randomly flip horizontal and is finally cut to L × L, obtains figure to be processed Picture, details are not described herein.

Fig. 8 shows the flow chart of training neural network in the image processing method according to the embodiment of the present disclosure.One kind can In the implementation of energy, as shown in figure 8, including: according to the step of image training neural network to be processed

In step S104, by the image to be processed input respectively feature extraction network in the neural network and It is handled in key point feature extraction network, obtains the characteristic information of target object and the target pair in image to be processed The crucial point feature of elephant；

In step s105, the crucial point feature is inputted in the first detection network in the neural network Reason, determines the positioning result of the key point of the target object；

In step s 106, by the characteristic information of the target object, the positioning result and the crucial point feature It is handled in the second detection network inputted in the neural network, determines the target inspection of the motor unit of the target object Survey result；

In step s 107, according to the positioning result, the markup information of the positioning result, the object detection results And the markup information of the object detection results, determine the model loss of the neural network；

In step S108, is lost according to the model, adjust the network parameter values of the neural network.

For example, the image to be processed can be inputted respectively feature extraction network in the neural network and It is handled in key point feature extraction network, obtains the characteristic information of target object and the target pair in image to be processed The crucial point feature of elephant.For example, as shown in fig. 7, image input feature vector to be processed is extracted the multiple dimensioned sharing feature of e-learning. The crucial point feature of the multiple dimensioned sharing feature input key point feature extraction e-learning face.

It is handled in the first detection network that crucial point feature can be inputted in the neural network, determines the mesh Mark the positioning result of the key point of object.For example, as shown in fig. 7, determining the positioning result of multiple key points of face.

The characteristic information of the target object, the positioning result and the crucial point feature can be inputted into the mind Through being handled in the second detection network in network, the object detection results of the motor unit of the target object are determined.

In one possible implementation, step S106 may include:

For example, the characteristic information of the target object and the positioning result can be inputted into the neural network In local shape factor network in handled, the local feature of the motor unit of the target object is determined, by the office It is handled, is determined described dynamic in the second detection network that portion's feature and the crucial point feature input in the neural network Make the object detection results of unit.For example, as shown in fig. 7, being determined according to the local feature of motor unit and crucial point feature Object detection results.In such manner, it is possible to accurately get the local feature of motor unit, the target detection of moving cell is improved As a result detection accuracy.

In one possible implementation, according to the positioning result, the markup information of the positioning result, the mesh The markup information for marking testing result and the object detection results determines the model loss of the neural network.According to model Loss, adjusts the network parameter values of the neural network.

It for example, can be according to positioning result, the markup information of the positioning result, the object detection results, institute The markup information and loss function for stating object detection results determine the model loss of the neural network.The disclosure is to loss The form of function is with no restriction.It can be lost according to model, adjust the network parameter values of the neural network.For example, using anti- Gradient descent algorithm etc. is combined to adjust network parameter values to propagating.It should be appreciated that suitable mode, which can be used, adjusts neural network Network parameter values, the disclosure to this with no restriction.

After repeatedly adjusting, if meeting preset training condition, such as adjustment number reaches preset Frequency of training threshold value or model loss are less than or equal to preset loss threshold value, then can be by current neural network Be determined as final neural network, so as to complete neural network training process.It should be appreciated that those skilled in the art can Be set according to actual conditions training condition and loss threshold value, the disclosure to this with no restriction.

In one possible implementation, the characteristic information of the target object and the positioning result are inputted into institute It states and is handled in the local shape factor network in neural network, determine the local feature of the motor unit of the target object The step of, may include:

The characteristic information of treated the attention characteristic pattern and the target object is inputted into the neural network In local shape factor network in handled, determine the local feature of the moving cell of the target object.

In one possible implementation, step S107 may include:

For example, during training neural network, it may be determined that the initial attention characteristic pattern of each motor unit, and Processing is optimized to the initial attention characteristic pattern of each motor unit, auxiliary determines the local feature of each moving cell.For example, It is generated in network as shown in fig. 7, the positioning result of the face key point is inputted the initial attention in the neural network It is handled, determines the initial attention characteristic pattern of the motor unit of the face.Initial attention characteristic pattern is optimized It handles (for example, process of convolution), the attention characteristic pattern that obtains that treated.According to treated attention characteristic pattern and described The local feature of the characteristic information study face moving cell of target object.By the adaptive attention study of neural network, obtain The local feature accuracy of each motor unit arrived is higher, to improve the robustness of image processing method and the mesh of motor unit Mark the accuracy of testing result.

It in one possible implementation, can be according to the positioning result, the markup information of the positioning result, institute State object detection results, the markup information of the object detection results, each element is initial in the initial attention characteristic pattern The attention weight of each element, determines the neural network in attention weight and treated the attention characteristic pattern Model loss.

It should be understood that can determine the model loss of neural network by markup information, testing result.Determining neural network Model loss during, loss function may include diversified forms, the disclosure to this with no restriction.

The embodiment of the present disclosure is given below illustratively according to positioning result and the markup information of positioning result, determines damage The formula (2) of mistake:

In formula (2), E_alignIt indicates according to key point positioning result and the markup information of positioning result, determining damage It loses.y_2j-1And y_2jRespectively indicate j-th point of markup information (true x coordinate and y-coordinate).Wherein, the value of j is arrived 1 n_alignBetween, n_alignIndicate the number of key point.d_oFor the true interpupillary distance of eyes.WithJ-th point is respectively indicated to determine Position result (x coordinate and y-coordinate of the point predicted).

The embodiment of the present disclosure is given below illustratively according to object detection results and the mark of the object detection results Information is infused, determines the formula (3) of loss:

In formula (3), E_auIt indicates to be determined according to object detection results and the markup information of the object detection results Loss.n_auIndicate the dimension of full articulamentum, the number of motor unit.p_iIndicate the true probability that i-th of motor unit occurs (markup informations of object detection results) occur for 1 and lack then to be 0.Expression prediction probability be (motor unit of prediction Object detection results).Weight w_iIt is for overcoming the problems, such as that data are unbalanced.For most of motor unit Test databases, The frequency that different motor units occur is unbalanced, and not mutually indepedent between motor unit, in training neural network When establishing face database before, it is ensured that the appearance of each motor unit and frequency of loss are almost the same, so as to drop The probability of occurrence of the low unbalanced problem of data guarantees predictablity rate.Wherein it is possible to statistical disposition is carried out to training set, it can be with According to formulaDetermine weight, wherein r_iIndicate that i-th of motor unit goes out in training set Existing frequency.

It should be understood that formula (3) are weighting multi-tag sigmoid cross entropy loss function, full articulamentum can simplify in this way Dimension, simplify structure on the basis of ensure that prediction effect.Furthermore it is also possible to by other kinds of loss function, root According to object detection results and the markup information of the object detection results, loss is determined.For example, it is also possible to for weighting multi-tag Softmax loss etc., the disclosure to this with no restriction.

The embodiment of the present disclosure is given below illustratively according to the initial attention of each element in initial attention characteristic pattern The attention weight of each element in weight and treated the attention characteristic pattern, determines the formula (4) of loss:

In formula (4), E_rIndicate the initial attention weight and institute according to each element in initial attention characteristic pattern The attention weight of each element, determining loss in attention characteristic pattern of stating that treated, can measure attention characteristic pattern just The sigmoid cross entropy between value after initial value and optimization.vi_kIndicate the initial of k-th of element of i-th of attention characteristic pattern Attention weighted value.Indicate k-th of element treated attention weighted value, n of i-th of attention characteristic pattern_element It is the element number of each attention characteristic pattern.

In this way, the probability that can reduce that treated attention characteristic pattern and initial attention characteristic pattern has big difference.

The formula (5) that the embodiment of the present disclosure illustratively determines whole model loss is given below:

E=E_au+λ₁E_align+λ₂E_r (5)

Wherein, E indicates whole model loss, λ₁And λ₂It is the coefficient for balancing importance, is pre-set hyper parameter.

In this way, can be during training neural network, it can be according to formula (2), formula (3), formula (4) and formula (5), whole model loss is calculated, is lost according to the model, adjusts the network parameter values of the neural network, as before Described, details are not described herein.

In this way, it can train to obtain the positioning result and motor unit that can accurately obtain key point The neural network of object detection results.Joint training is detected by crucial point location and motor unit, using between two tasks Relevance can promote the accuracy for improving the testing result of two tasks.Learnt by adaptive attention, is adapted to each Class diversification, the detection of nonrigid motor unit, improve the accuracy of the object detection results of motor unit.

It should be understood that the positioning result and target object of the key point of the target object obtained according to the embodiment of the present disclosure The object detection results of motor unit can be applied in types of objects analysis task.For example, the positioning of the face key point determined As a result and the object detection results of human face action unit can be used for carrying out human face analysis personage, can be applied to human face expression knowledge Not, the every field such as face verification, security protection.The disclosure does not limit the applicable scene of object detection results and positioning result System.

It will be understood by those skilled in the art that each step writes sequence simultaneously in the above method of specific embodiment It does not mean that stringent execution sequence and any restriction is constituted to implementation process, the specific execution sequence of each step should be with its function It can be determined with possible internal logic.

Fig. 9 shows the block diagram of the image processing apparatus according to the embodiment of the present disclosure.As shown in figure 9, described device includes:

Module 201 is obtained, for obtaining the characteristic information of target object in image to be processed and the target object Crucial point feature；

Positioning result determining module 202, for determining mesh described in the image to be processed according to the crucial point feature Mark the positioning result of the key point of object；

Object detection results determining module 203, for according to the characteristic information of the target object, the positioning result with And the crucial point feature, determine the object detection results of the motor unit of the target object.

In one possible implementation, the object detection results determining module 203 includes:

In one possible implementation, the acquisition module 201 includes:

Wherein, described second determine that submodule includes:

In one possible implementation, described device utilizes neural fusion,

Wherein, first acquisition submodule includes:

Figure 10 shows the block diagram of the image processing apparatus according to the embodiment of the present disclosure.As shown in Figure 10, a kind of possible In implementation, described device includes:

Feature obtains module 204, for the image to be processed to be inputted to the feature extraction in the neural network respectively It is handled in network and key point feature extraction network, obtains the characteristic information of target object and institute in image to be processed State the crucial point feature of target object；

First determining module 205, for the crucial point feature to be inputted to the first detection network in the neural network In handled, determine the positioning result of the key point of the target object；

Second determining module 206, for by the characteristic information of the target object, the positioning result and the key It is handled in the second detection network that point feature inputs in the neural network, determines the motor unit of the target object Object detection results；

Third determining module 207, for the markup information according to the positioning result, the positioning result, the target The markup information of testing result and the object detection results determines the model loss of the neural network；

Parameter adjustment module 208 adjusts the network parameter values of the neural network for losing according to the model.

In one possible implementation, second determining module 206 includes:

Wherein, the third determining module 207 includes:

In some embodiments, the embodiment of the present disclosure provides the function that has of device or comprising module can be used for holding The method of row embodiment of the method description above, specific implementation are referred to the description of embodiment of the method above, for sake of simplicity, this In repeat no more

The embodiment of the present disclosure also proposes a kind of computer readable storage medium, is stored thereon with computer program instructions, institute It states when computer program instructions are executed by processor and realizes the above method.Computer readable storage medium can be non-volatile meter Calculation machine readable storage medium storing program for executing.

The embodiment of the present disclosure also proposes a kind of electronic equipment, comprising: processor；For storage processor executable instruction Memory；Wherein, the processor is configured to the above method.

Figure 11 shows the block diagram of the electronic equipment according to the embodiment of the present disclosure.For example, electronic equipment 800 can be mobile electricity Words, computer, digital broadcasting terminal, messaging device, game console, tablet device, Medical Devices, body-building equipment are a The terminals such as personal digital assistant.

Referring to Fig.1 1, electronic equipment 800 may include following one or more components: processing component 802, memory 804, Power supply module 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor module 814, And communication component 816.

The integrated operation of the usual controlling electronic devices 800 of processing component 802, such as with display, call, data are logical Letter, camera operation and record operate associated operation.Processing component 802 may include one or more processors 820 to hold Row instruction, to perform all or part of the steps of the methods described above.In addition, processing component 802 may include one or more moulds Block, convenient for the interaction between processing component 802 and other assemblies.For example, processing component 802 may include multi-media module, with Facilitate the interaction between multimedia component 808 and processing component 802.

Memory 804 is configured as storing various types of data to support the operation in electronic equipment 800.These data Example include any application or method for being operated on electronic equipment 800 instruction, contact data, telephone directory Data, message, picture, video etc..Memory 804 can by any kind of volatibility or non-volatile memory device or it Combination realize, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable Except programmable read only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, fastly Flash memory, disk or CD.

Power supply module 806 provides electric power for the various assemblies of electronic equipment 800.Power supply module 806 may include power supply pipe Reason system, one or more power supplys and other with for electronic equipment 800 generate, manage, and distribute the associated component of electric power.

Multimedia component 808 includes the screen of one output interface of offer between the electronic equipment 800 and user. In some embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch surface Plate, screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touches Sensor is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding The boundary of movement, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, Multimedia component 808 includes a front camera and/or rear camera.When electronic equipment 800 is in operation mode, as clapped When taking the photograph mode or video mode, front camera and/or rear camera can receive external multi-medium data.It is each preposition Camera and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike Wind (MIC), when electronic equipment 800 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone It is configured as receiving external audio signal.The received audio signal can be further stored in memory 804 or via logical Believe that component 816 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.

I/O interface 812 provides interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.

Sensor module 814 includes one or more sensors, for providing the state of various aspects for electronic equipment 800 Assessment.For example, sensor module 814 can detecte the state that opens/closes of electronic equipment 800, the relative positioning of component, example As the component be electronic equipment 800 display and keypad, sensor module 814 can also detect electronic equipment 800 or The position change of 800 1 components of electronic equipment, the existence or non-existence that user contacts with electronic equipment 800, electronic equipment 800 The temperature change of orientation or acceleration/deceleration and electronic equipment 800.Sensor module 814 may include proximity sensor, be configured For detecting the presence of nearby objects without any physical contact.Sensor module 814 can also include optical sensor, Such as CMOS or ccd image sensor, for being used in imaging applications.In some embodiments, which may be used also To include acceleration transducer, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 816 is configured to facilitate the communication of wired or wireless way between electronic equipment 800 and other equipment. Electronic equipment 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.Show at one In example property embodiment, communication component 816 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel Relevant information.In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, short to promote Cheng Tongxin.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module (UWB) technology, bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, electronic equipment 800 can be by one or more application specific integrated circuit (ASIC), number Word signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.

In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating The memory 804 of machine program instruction, above-mentioned computer program instructions can be executed by the processor 820 of electronic equipment 800 to complete The above method.

Figure 12 shows the block diagram of the electronic equipment according to the embodiment of the present disclosure.For example, electronic equipment 1900 can be provided For a server.Referring to Fig.1 2, it further comprises one or more processing that electronic equipment 1900, which includes processing component 1922, Device and memory resource represented by a memory 1932, can be by the instruction of the execution of processing component 1922, example for storing Such as application program.The application program stored in memory 1932 may include it is one or more each correspond to one group The module of instruction.In addition, processing component 1922 is configured as executing instruction, to execute the above method.

Electronic equipment 1900 can also include that a power supply module 1926 is configured as executing the power supply of electronic equipment 1900 Management, a wired or wireless network interface 1950 is configured as electronic equipment 1900 being connected to network and an input is defeated (I/O) interface 1958 out.Electronic equipment 1900 can be operated based on the operating system for being stored in memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.

In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating The memory 1932 of machine program instruction, above-mentioned computer program instructions can by the processing component 1922 of electronic equipment 1900 execute with Complete the above method.

The disclosure can be system, method and/or computer program product.Computer program product may include computer Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the disclosure.

Computer readable storage medium, which can be, can keep and store the tangible of the instruction used by instruction execution equipment Equipment.Computer readable storage medium for example can be-- but it is not limited to-- storage device electric, magnetic storage apparatus, optical storage Equipment, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned any appropriate combination.Computer readable storage medium More specific example (non exhaustive list) includes: portable computer diskette, hard disk, random access memory (RAM), read-only deposits It is reservoir (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable Compact disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, for example thereon It is stored with punch card or groove internal projection structure and the above-mentioned any appropriate combination of instruction.Calculating used herein above Machine readable storage medium storing program for executing is not interpreted that instantaneous signal itself, the electromagnetic wave of such as radio wave or other Free propagations lead to It crosses the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) of waveguide or the propagation of other transmission mediums or is transmitted by electric wire Electric signal.

Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/ Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment In calculation machine readable storage medium storing program for executing.

Computer program instructions for executing disclosure operation can be assembly instruction, instruction set architecture (ISA) instructs, Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages The source code or object code that any combination is write, the programming language include the programming language-of object-oriented such as Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer Readable program instructions can be executed fully on the user computer, partly execute on the user computer, be only as one Vertical software package executes, part executes on the remote computer or completely in remote computer on the user computer for part Or it is executed on server.In situations involving remote computers, remote computer can pass through network-packet of any kind It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).In some embodiments, by utilizing computer-readable program instructions Status information carry out personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or can Programmed logic array (PLA) (PLA), the electronic circuit can execute computer-readable program instructions, to realize each side of the disclosure Face.

Referring herein to according to the flow chart of the method, apparatus (system) of the embodiment of the present disclosure and computer program product and/ Or block diagram describes various aspects of the disclosure.It should be appreciated that flowchart and or block diagram each box and flow chart and/ Or in block diagram each box combination, can be realized by computer-readable program instructions.

These computer-readable program instructions can be supplied to general purpose computer, special purpose computer or other programmable datas The processor of processing unit, so that a kind of machine is produced, so that these instructions are passing through computer or other programmable datas When the processor of processing unit executes, function specified in one or more boxes in implementation flow chart and/or block diagram is produced The device of energy/movement.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to It enables so that computer, programmable data processing unit and/or other equipment work in a specific way, thus, it is stored with instruction Computer-readable medium then includes a manufacture comprising in one or more boxes in implementation flow chart and/or block diagram The instruction of the various aspects of defined function action.

Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other In equipment, so that series of operation steps are executed in computer, other programmable data processing units or other equipment, to produce Raw computer implemented process, so that executed in computer, other programmable data processing units or other equipment Instruct function action specified in one or more boxes in implementation flow chart and/or block diagram.

The flow chart and block diagram in the drawings show system, method and the computer journeys according to multiple embodiments of the disclosure The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation One module of table, program segment or a part of instruction, the module, program segment or a part of instruction include one or more use The executable instruction of the logic function as defined in realizing.In some implementations as replacements, function marked in the box It can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be held substantially in parallel Row, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or The combination of each box in flow chart and the box in block diagram and or flow chart, can the function as defined in executing or dynamic The dedicated hardware based system made is realized, or can be realized using a combination of dedicated hardware and computer instructions.

The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In the principle, practical application or technological improvement to the technology in market for best explaining each embodiment, or lead this technology Other those of ordinary skill in domain can understand each embodiment disclosed herein.

Claims

1. a kind of image processing method, which is characterized in that the described method includes:

According to the crucial point feature, the positioning result of the key point of target object described in the image to be processed is determined；

According to the characteristic information of the target object, the positioning result and the crucial point feature, the target pair is determined The object detection results of the motor unit of elephant.

2. the method according to claim 1, wherein according to the characteristic information of the target object, the positioning As a result and the crucial point feature, the object detection results of the motor unit of the target object are determined, comprising:

According to the characteristic information of the target object and the positioning result, the office of the motor unit of the target object is determined Portion's feature；

3. method according to claim 1 or 2, which is characterized in that obtain the feature letter of target object in image to be processed The crucial point feature of breath and the target object, comprising:

Key point feature extraction is carried out to the characteristic information of the target object, obtains the crucial point feature of the target object.

4. according to the method described in claim 2, it is characterized in that, according to the characteristic information of the target object and described fixed Position is as a result, determine the local feature of the motor unit of the target object, comprising:

According to the positional relationship of the key point of the central point and target object of the positioning result and motor unit, determine described in The initial attention characteristic pattern of the motor unit of target object；

According to treated attention characteristic pattern and the characteristic information of the target object, the part of the motor unit is determined Feature.

5. a kind of image processing apparatus, which is characterized in that described device includes:

Module is obtained, it is special for obtaining the key point of the characteristic information of target object and the target object in image to be processed Sign；

Positioning result determining module, for determining target object described in the image to be processed according to the crucial point feature Key point positioning result；

Object detection results determining module, for according to the characteristic information of the target object, the positioning result and described Crucial point feature determines the object detection results of the motor unit of the target object.

6. device according to claim 5, which is characterized in that the object detection results determining module includes:

First determines that submodule determines the mesh for the characteristic information and the positioning result according to the target object Mark the local feature of the motor unit of object；

Second determines submodule, for determining the motor unit according to the local feature and the crucial point feature Object detection results.

7. device according to claim 5 or 6, which is characterized in that the acquisition module includes:

First acquisition submodule obtains target in the image to be processed for carrying out feature extraction to the image to be processed The characteristic information of object；

Second acquisition submodule carries out key point feature extraction for the characteristic information to the target object, obtains the mesh Mark the crucial point feature of object.

8. device according to claim 6, which is characterized in that described first determines that submodule includes:

Third determines submodule, for according to the central point of the positioning result and motor unit and the key point of target object Positional relationship, determine the initial attention characteristic pattern of the motor unit of the target object；

Third acquisition submodule, for carrying out process of convolution to the initial attention characteristic pattern, the attention that obtains that treated Characteristic pattern；

4th determines submodule, for the characteristic information according to treated attention characteristic pattern and the target object, really The local feature of the fixed motor unit.

9. a kind of electronic equipment characterized by comprising

Processor；

Memory for storage processor executable instruction；

Wherein, the processor is configured to: perform claim require any one of 1 to 4 described in method.

10. a kind of computer readable storage medium, is stored thereon with computer program instructions, which is characterized in that the computer Method described in any one of Claims 1-4 is realized when program instruction is executed by processor.