CN106845432A

CN106845432A - The method and apparatus that a kind of face is detected jointly with human body

Info

Publication number: CN106845432A
Application number: CN201710067572.5A
Authority: CN
Inventors: 赵瑞; 徐鹏飞
Original assignee: Shenzhen Shenzhen Horizon Technology Co Ltd
Current assignee: Shenzhen Shenzhen Horizon Technology Co Ltd
Priority date: 2017-02-07
Filing date: 2017-02-07
Publication date: 2017-06-13
Anticipated expiration: 2037-02-07
Also published as: CN106845432B

Abstract

The invention discloses the method and apparatus that a kind of face and human body are detected jointly, wherein, the method includes：Obtain normal data；Wherein, the position frame information of the position with human body of the face for marking each pedestrian is included in normal data；Common identification model is modified by the position frame information in the normal data；Wherein, the common identification model is generated based on Faster RCNN；The common detection of face and human body is carried out to image in video to be identified based on revised common identification model, to export real-time structured message；Include the face information and human body information of each pedestrian in wherein described structured message simultaneously.Recognized while realizing efficient with this, saved resource, and improve practicality and ensure that the corresponding relation of face and human body.

Description

The method and apparatus that a kind of face is detected jointly with human body

Technical field

The present invention relates to recognize field, the method and apparatus that more particularly to a kind of face is detected jointly with human body.

Background technology

In the prior art, it is individually to carry out to carry out recognition of face and human bioequivalence, and simple target type can only be entered Row detection, if carrying out the independent multiple models of detection needs to the target of polymorphic type, for example, work as to need to detect face and detection Pedestrian, that is accomplished by a human-face detector and a human body detector, more moneys can be taken using two sets of independent detectors Source.

In addition, the detection of target and the correction of testing result are separately carried out, it is necessary to what is repeated carries out category filter candidate Frame and correction candidate frame position and size, exist it is substantial amounts of compute repeatedly, making the practicality of system reduces；

It is follow-up in real time information structurizing process, independent detection to face and human body lack corresponding relation, but , it is necessary to face and human body are all corresponded to some specific individual goal in actual demand, based on existing Face datection Done with human detection result, it is necessary to expend extra calculating to be matched.

The content of the invention

For defect of the prior art, the present invention proposes the method and apparatus that a kind of face is detected jointly with human body, It is used to overcome defect of the prior art.

The present invention proposes embodiment in detail below：

The embodiment of the present invention proposes a kind of method that face is detected jointly with human body, including：

Obtain normal data；Wherein, the position of the position with human body of the face for marking each pedestrian is included in normal data Put frame information；

Common identification model is modified by the position frame information in the normal data；Wherein, the common knowledge Other model is generated based on Faster RCNN；

The common detection of face and human body is carried out to image in video to be identified based on revised common identification model, with Export real-time structured message；Include the face information and human body information of each pedestrian in wherein described structured message simultaneously.

In a specific embodiment, the acquisition normal data, including：

Obtain the monitor video under different scenes；

Each pedestrian in the image in monitor video is identified；

For identify each pedestrian carry out face where position and human body position be labeled as different position frames, To generate position frame information；

Integrate recognized image and frame information in position therein generation normal data.

In a specific embodiment, the position frame information by the normal data is to common identification model It is modified, including：

The parameter of the common identification model of step A, random initializtion；Wherein, the parameter includes：The ginseng of correspondence loss function Number；The loss function is used to be modified common identification model；

Step B, the position frame information in the normal data is passed sequentially through the common identification mould after being initialized Type is calculated, to obtain the loss of each loss function of correspondence；

Step C, gradient is obtained by carrying out derivation to each loss function, and reversely passed by chain rule Broadcast, each parameter after being updated；

Step D, repeat step B and step C, until loss no longer declines, to obtain final parameter；

Step E, the amendment by final parameter completion to common identification model.

In a specific embodiment, the loss function includes：Classification Loss function, position return loss function, Relative position constraint loss function；

Wherein, the Classification Loss function, the classification for carrying out face/pedestrian/background to the candidate frame in image, with Face frame and pedestrian's frame are obtained, and gets rid of the candidate frame in background；

The position returns loss function, and position and size for correcting face frame and pedestrian's frame improve the standard of positioning Exactness；

The relative position constraint loss function, the relative position relation for ensureing face frame and pedestrian's frame is normal.

It is described image in video to be identified is entered based on revised common identification model in a specific embodiment The common detection of pedestrian's face and human body, to export real-time structured message, including：

Video to be identified is decoded, to obtain the every two field picture in video to be identified；

If the disposal ability of the common identification module exceedes preset value, face is carried out with human body for every two field picture Common detection, the structured message of the face information, human body information and satellite information of each pedestrian is included with generation in real time simultaneously； Wherein, the satellite information includes：The point position information of the corresponding camera of video to be identified, the temporal information of video to be identified.

The embodiment of the present invention also proposed the equipment that a kind of face is detected jointly with human body, including：

Acquisition module, for obtaining normal data；Wherein, the face and people for marking each pedestrian is included in normal data The position frame information of the position of body；

Correcting module, for being modified to common identification model by the position frame information in the normal data；Its In, the common identification model is generated based on Faster RCNN；

Detection module, for carrying out face and human body to image in video to be identified based on revised common identification model Common detection, to export real-time structured message；Include that the face of each pedestrian is believed in wherein described structured message simultaneously Breath and human body information.

In a specific embodiment, the acquisition module is used for：

Obtain the monitor video under different scenes；

Each pedestrian in the image in monitor video is identified；

In a specific embodiment, the correcting module, for performing operations described below：

In a specific embodiment, the detection module is used for：

With this, the invention discloses the method and apparatus that a kind of face and human body are detected jointly, wherein, the method includes： Obtain normal data；Wherein, the position frame information of the position with human body of the face for marking each pedestrian is included in normal data； Common identification model is modified by the position frame information in the normal data；Wherein, the common identification model is Based on Faster RCNN generations；Face and people are carried out to image in video to be identified based on revised common identification model The common detection of body, to export real-time structured message；Include the face of each pedestrian in wherein described structured message simultaneously Information and human body information.Recognized while realizing efficient with this, saved resource, and improve practicality and ensure that people The corresponding relation of face and human body.

Brief description of the drawings

Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be attached to what is used needed for embodiment Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, thus be not construed as it is right The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this A little accompanying drawings obtain other related accompanying drawings.

Fig. 1 is the schematic flow sheet of the method that a kind of face that the embodiment of the present invention is proposed is detected jointly with human body；

Fig. 2 is the signal of calibration position frame in the method that a kind of face that the embodiment of the present invention is proposed is detected jointly with human body Figure；

Fig. 3 is showing for the structured message in the method that a kind of face that the embodiment of the present invention is proposed is detected jointly with human body It is intended to；

Fig. 4 is the structural representation of the equipment that a kind of face that the embodiment of the present invention is proposed is detected jointly with human body.

Specific embodiment

Hereinafter, the various embodiments of the disclosure will be described more fully.The disclosure can have various embodiments, and Can wherein adjust and change.It should be understood, however, that：It is limited to spy disclosed herein in the absence of by the various embodiments of the disclosure Determine the intention of embodiment, but in the spirit and scope that should be interpreted as cover the various embodiments for falling into the disclosure by the disclosure All adjustment, equivalent and/or alternative.

Hereinafter, can be used in the various embodiments of the disclosure term " including " or " may include " indicate it is disclosed Function, operation or the presence of element, and do not limit the increase of one or more functions, operation or element.Additionally, such as existing Used in the various embodiments of the disclosure, term " including ", " having " and its cognate be meant only to represent special characteristic, number The combination of word, step, operation, element, component or foregoing item, and be understood not to exclude first it is one or more other The presence of the combination of feature, numeral, step, operation, element, component or foregoing item or increase one or more features, numeral, The possibility of the combination of step, operation, element, component or foregoing item.

In the various embodiments of the disclosure, statement "or" or " at least one of A or/and B " include what is listed file names with Any combinations of word or all combinations.For example, statement " A or B " or " at least one of A or/and B " may include A, may include B may include A and B both.

The statement (" first ", " second " etc.) used in the various embodiments of the disclosure can be modified in various implementations Various element in example, but corresponding element can not be limited.For example, presented above be not intended to limit the suitable of the element Sequence and/or importance.The purpose for being only used for differentiating an element and other elements presented above.For example, first user is filled Put and indicate different user device with second user device, although the two is all user's set.For example, not departing from each of the disclosure In the case of planting the scope of embodiment, the first element is referred to alternatively as the second element, and similarly, the second element is also referred to as first Element.

It should be noted that：If an element ' attach ' to another element by description, can be by the first composition unit Part is directly connected to the second element, and " connection " the 3rd can be constituted between the first element and the second element Element.On the contrary, when an element " being directly connected to " is arrived into another element, it will be appreciated that be in the first element And second do not exist the 3rd element between element.

The term " user " used in the various embodiments of the disclosure may indicate that the people that uses electronic installation or use electricity The device (for example, artificial intelligence electronic installation) of sub-device.

The term used in the various embodiments of the disclosure is only used for describing the purpose of specific embodiment and not anticipating In the various embodiments of the limitation disclosure.As used herein, singulative is intended to also including plural form, unless context is clear Chu ground is indicated otherwise.Unless otherwise defined, all terms (including the technical term and scientific terminology) tool being otherwise used herein There is the implication identical implication being generally understood that with the various embodiment one skilled in the art of the disclosure.The term (term limited such as in the dictionary for generally using) is to be interpreted as to be had and the situational meaning in correlative technology field Identical implication and will be not construed as with Utopian implication or excessively formal implication, unless in the various of the disclosure It is clearly defined in embodiment.

Embodiment 1

The embodiment of the invention discloses a kind of method that face and human body are detected jointly, as shown in figure 1, including：

Step 101, acquisition normal data；Wherein, include the face that marks each pedestrian in normal data with human body The position frame information of position；

Step 102, common identification model is modified by the position frame information in the normal data；Wherein, institute Common identification model is stated to be generated based on Faster RCNN；

Step 103, face to image in video to be identified is carried out based on revised common identification model and human body is total to With detection, to export real-time structured message；In wherein described structured message simultaneously include each pedestrian face information and Human body information.

In a specific embodiment, normal data is obtained described in step 101, including：

Obtain the monitor video under different scenes；

Each pedestrian in the image in monitor video is identified；

Specifically, collecting data, the monitor video comprising pedestrian is (such as such as Fig. 2 examples under can collecting different scenes Scape), in order to lift the stability across scene performance of detection algorithm, the data under more scenes are collected as far as possible；And one (specifically, can individually be marked using mode of the prior art) is marked in frame picture and goes out human body and correspondence face Position frame (as shown in frame different in Fig. 2), but in order to avoid repeat demarcate, reply every section of video demarcate (example every frame Such as took a frame every 3 seconds to demarcate).

In a specific embodiment, the position frame information pair by the normal data in step 102 Common identification model is modified, including：

In the particular embodiment, the loss function includes：Classification Loss function, position return loss function, relative Position constraint loss function；

Classification Loss function, mainly carries out the classification of face/pedestrian/background to candidate frame, obtains face frame and pedestrian's frame, Get rid of the candidate frame in background；And position returns loss function, position and size for correcting face frame and pedestrian's frame make It is more accurate to position；Relative position constraint loss function, statistics mark picture in face relative to human body position, in human body Inframe determines an anchor point for face location, and the European side-play amount of face prediction block and human body anchor point position is calculated during training, with This is used as constraint face frame and the loss function of the relative position relation of pedestrian's frame so that face frame and pedestrian's frame are not in not Normal relative position relation.

In a specific embodiment, described in step 103 is based on revised common identification model to be identified Image carries out the common detection of face and human body in video, to export real-time structured message, including：

Specifically, when common identification model is trained, sample is sent into mould by the parameter of random initializtion model first one by one Type is calculated, and obtains each loss of each loss function, and the target of training is to minimize every loss, therefore, it is asked Lead and obtain gradient, backpropagation is carried out using chain rule, update model parameter；This process is repeated to each sample, until testing The loss for demonstrate,proving collection no longer declines, namely reaches minimum value.

Follow-up to obtain image by decoding when live video stream is accessed, the algorithm performance according to common identification model is determined When detect, if algorithm speed meet, can detect frame by frame, by technical scheme to a frame treatment be once The detection to face and human body can be simultaneously completed, specific testing result is as shown in Figure 3.

With this, this programme realizes following technique effect：

1st, while completing the detection and the correction of testing result to target, it is to avoid largely compute repeatedly, raising efficiency；

2nd, polymorphic type target is integrated, with a unified model, (framework i.e. using Faster RCNN is simultaneously complete Into the detection and correction of target frame) target detection of multiple types is completed, reduce the occupancy of computing redundancy and system resource；

3rd, the association of structured message (face and human body) is automatically performed when real-time detection, complete knot is directly obtained Structure information.

Embodiment 2

The embodiment of the invention also discloses the equipment that a kind of face and human body are detected jointly, as illustrated, including：

Acquisition module 201, for obtaining normal data；Wherein, the face that marks each pedestrian is included in normal data With the position frame information of the position of human body；

Correcting module 202, for being modified to common identification model by the position frame information in the normal data； Wherein, the common identification model is generated based on Faster RCNN；

Detection module 203, for based on revised common identification model image in video to be identified is carried out face with The common detection of human body, to export real-time structured message；Include the people of each pedestrian in wherein described structured message simultaneously Face information and human body information.

In a specific embodiment, the acquisition module 201 is used for：

Obtain the monitor video under different scenes；

Each pedestrian in the image in monitor video is identified；

In a specific embodiment, the correcting module 202, for performing operations described below：

In a specific embodiment, the detection module 203 is used for：

It will be appreciated by those skilled in the art that accompanying drawing is a schematic diagram for being preferable to carry out scene, module in accompanying drawing or Flow is not necessarily implemented necessary to the present invention.

It will be appreciated by those skilled in the art that module in device in implement scene can according to implement scene describe into Row is distributed in the device of implement scene, it is also possible to carry out one or more dresses that respective change is disposed other than this implement scene In putting.The module of above-mentioned implement scene can merge into a module, it is also possible to be further split into multiple submodule.

The invention described above sequence number is for illustration only, and the quality of implement scene is not represented.

Disclosed above is only several specific implementation scenes of the invention, but, the present invention is not limited to this, Ren Heben What the technical staff in field can think change should all fall into protection scope of the present invention.

Claims

1. a kind of method that face is detected jointly with human body, it is characterised in that including：

Obtain normal data；Wherein, the position frame of the position with human body of the face for marking each pedestrian is included in normal data Information；

Common identification model is modified by the position frame information in the normal data；Wherein, the common identification mould Type is generated based on Faster RCNN；

The common detection of face and human body is carried out to image in video to be identified based on revised common identification model, to export Real-time structured message；Include the face information and human body information of each pedestrian in wherein described structured message simultaneously.

2. the method for claim 1, it is characterised in that the acquisition normal data, including：

Obtain the monitor video under different scenes；

Each pedestrian in the image in monitor video is identified；

For identify each pedestrian carry out face where position and human body position be labeled as different position frames, with life Into position frame information；

3. the method for claim 1, it is characterised in that the position frame information by the normal data is to altogether It is modified with identification model, including：

The parameter of the common identification model of step A, random initializtion；Wherein, the parameter includes：The parameter of correspondence loss function； The loss function is used to be modified common identification model；

Step B, the position frame information in the normal data is passed sequentially through into the common identification model after being initialized enter Row is calculated, to obtain the loss of each loss function of correspondence；

Step C, gradient is obtained by carrying out derivation to each loss function, and backpropagation is carried out by chain rule, obtained Each parameter after to renewal；

4. method as claimed in claim 3, it is characterised in that the loss function includes：Classification Loss function, position return Loss function, relative position constraint loss function；

Wherein, the Classification Loss function, the classification for carrying out face/pedestrian/background to the candidate frame in image, to obtain Face frame and pedestrian's frame, and get rid of the candidate frame in background；

The position returns loss function, and position and size for correcting face frame and pedestrian's frame improve the degree of accuracy of positioning；

5. the method for claim 1, it is characterised in that described to be regarded to be identified based on revised common identification model Image carries out the common detection of face and human body in frequency, to export real-time structured message, including：

If the disposal ability of the common identification module exceedes preset value, face is carried out for every two field picture common with human body Detection, the structured message of the face information, human body information and satellite information of each pedestrian is included with generation in real time simultaneously；Its In, the satellite information includes：The point position information of the corresponding camera of video to be identified, the temporal information of video to be identified.

6. the equipment that a kind of face is detected jointly with human body, it is characterised in that including：

Acquisition module, for obtaining normal data；Wherein, include the face that marks each pedestrian in normal data with human body The position frame information of position；

Correcting module, for being modified to common identification model by the position frame information in the normal data；Wherein, institute Common identification model is stated to be generated based on Faster RCNN；

Detection module, is total to for carrying out face to image in video to be identified based on revised common identification model with human body With detection, to export real-time structured message；In wherein described structured message simultaneously include each pedestrian face information and Human body information.

7. equipment as claimed in claim 6, it is characterised in that the acquisition module, is used for：

Obtain the monitor video under different scenes；

Each pedestrian in the image in monitor video is identified；

8. equipment as claimed in claim 6, it is characterised in that the correcting module, for performing operations described below：

9. equipment as claimed in claim 8, it is characterised in that the loss function includes：Classification Loss function, position return Loss function, relative position constraint loss function；

10. equipment as claimed in claim 6, it is characterised in that the detection module, is used for：