CN105930822A

CN105930822A - Human face snapshot method and system

Info

Publication number: CN105930822A
Application number: CN201610308462.9A
Authority: CN
Inventors: 左珍
Original assignee: BEIJING DEEPGLINT INFORMATION TECHNOLOGY Co Ltd
Current assignee: BEIJING DEEPGLINT INFORMATION TECHNOLOGY Co Ltd
Priority date: 2016-05-11
Filing date: 2016-05-11
Publication date: 2016-09-07

Abstract

The application provides a human face snapshot method and system. The method comprises the following steps: using a wide-angle camera to acquire wide-angle video data within a monitored scene; on the basis of the video data, detecting the area where a human is; using a long focal length camera to acquire video data of the area where the human is; conducting human face detection on the video data acquired by the long focal length camera to obtain a human face image. According to the application, the method combines advantages of different cameras, can use the wide-angle camera to acquire an object human within a large area of the monitored scene, and then can use the long focal length camera to acquire a long-distance and highly clear human face image. The system overcomes the defect that traditional human face snapshot systems must reply on short-distance photographing of the human face and can be applied to application scenes that are more complex and changeable.

Description

A kind of face snap method and system

Technical field

The application relates to technical field of computer vision, particularly relates to a kind of face snap method and system.

Background technology

Along with the development of information age, people are to the demand of formula face snap non-under monitoring scene increasingly Strongly.Face snap system is the data by processing high definition monitoring camera typing, when people enters appointment prison Time in the range of depending on, grab record face, identify for backstage structured record and analysis, be mainly used in record Discrepancy personnel, lookup suspect/specific people, statistics crowd characteristic etc..

Existing face snap system generally includes Face datection, tracking and structured storage module, utilizes Face datection algorithm and Face tracking algorithm carry out Face detection and tracking to the personnel entering detection region, sieve Select optimal facial image input database.Owing to human face target is relatively small, and too low resolution can shadow Ring the face picture quality captured, be difficult to be identified further.

Therefore, existing face snap system often relies on the high-definition camera being fixed on the outpost of the tax office, closely Shooting high-resolution human face image, working range is typically several meters, it is impossible to realize remote large coverage Face snap under application scenarios, constrains the usable range of face snap system significantly.

Prior art deficiency is:

Existing face snap system usable range is less, it is impossible to the face realizing long-distance large-range scene is grabbed Clap.

Summary of the invention

The embodiment of the present application proposes a kind of face snap method and system, grabs solving face in prior art Shooting system usable range is less, it is impossible to realize the technical problem of the face snap of long-distance large-range scene.

First aspect, the embodiment of the present application provides a kind of face snap method, comprises the steps:

Wide angle camera is utilized to obtain the wide-angle video data in monitoring scene；

Human body region is detected according to described video data；

Focal length camera is utilized to obtain the video data of described human body region；

The video data getting described focal length camera carries out Face datection, obtains facial image.

Preferably, farther include:

Described facial image is carried out similarity-rough set with the facial image stored；

According to the result of described similarity-rough set, determine that described face was the most stored.

Preferably, farther include:

If it is determined that described face was not stored, store described facial image.

Preferably, farther include:

If it is determined that described face was stored, relatively described facial image and the facial image stored Picture quality；

When described facial image is higher than the picture quality of the facial image stored, store described in renewal Facial image.

Preferably, described carry out similarity-rough set according to described facial image with the facial image stored, tool Body is: extract the characteristic information of described face according to described facial image, when described characteristic information includes current Between, face position in described wide angle camera and the feature representation that extracts according to described facial image； The characteristic information of the characteristic information of described face with the facial image stored is carried out similarity-rough set；

The described result according to described similarity-rough set, determines that described face was the most stored, specifically For: if the relative position that time interval is in preset time range, in described wide angle camera preset away from From the similarity of feature representation of interior and two facial images higher than predetermined threshold value, it is determined that described face Stored.

Preferably, described according to described video data detection human body region, particularly as follows:

The video image getting described wide angle camera carries out pretreatment；

The human detection degree of depth convolutional neural networks described video image input training in advance obtained, obtains spy Levy collection of illustrative plates；On described characteristic spectrum, the action scope of pixel is corresponding to the picture block in described video image, institute State the action scope of all pixels on characteristic spectrum and correspond to described video image；

Described characteristic spectrum is scanned by the default sliding window utilizing multiple different scale and/or length-width ratio, Obtain the score value of human body region；

Determining that described score value exceedes the region of preset first threshold value and local highest scoring is described human body place Region.

Preferably, the described video data getting described focal length camera carries out Face datection, particularly as follows:

The video image getting described focal length camera carries out pretreatment；

The Face datection degree of depth convolutional neural networks described video image input training in advance obtained, obtains spy Levy collection of illustrative plates；On described characteristic spectrum, the action scope of pixel is corresponding to the picture block in described video image, institute State the action scope of all pixels on characteristic spectrum and correspond to described video image；

Described characteristic spectrum is scanned by the default sliding window utilizing multiple different scale and/or length-width ratio, Obtain the score value of face region；

Determining that described score value exceedes the region of default Second Threshold and local highest scoring is described face place Region.

Second aspect, the embodiment of the present application provides a kind of face snap system, it is characterised in that including: Linkage phase unit, first detection module and the second detection module, described linkage phase unit include wide angle camera and Focal length camera, wherein,

Described wide angle camera, for obtaining the wide-angle video data in monitoring scene；

Described first detection module, for detecting human body region according to described video data；

Described focal length camera, for obtaining the video data of described human body region；

Described second detection module, carries out face inspection for the video data getting described focal length camera Survey, obtain facial image.

Preferably, farther include:

First comparison module, for carrying out similarity ratio by described facial image and the facial image stored Relatively；

Determine module, for the result according to described similarity-rough set, determine that described face is the most deposited Stored up.

Preferably, farther include:

Add module, for determining that described face is not out-of-date by storage, store described facial image.

Preferably, farther include:

Second comparison module, for determining that described face is out-of-date by storage, relatively described facial image Picture quality with the facial image stored；

More new module, is used for when described facial image is higher than the picture quality of the facial image stored, more The facial image stored described in Xin.

Preferably, described first comparison module specifically includes:

Feature extraction unit, for extracting the characteristic information of described face, described spy according to described facial image Reference breath includes current time, the face position in described wide angle camera and according to described facial image The feature representation extracted；

Comparing unit, for entering the characteristic information of described face with the characteristic information of the facial image stored Row similarity-rough set；

If described determine module specifically for time interval in preset time range, at described wide angle camera In relative position in predeterminable range and two facial images feature representation similarity higher than preset Threshold value, it is determined that described face was stored.

Preferably, described first detection module specifically includes:

First pretreatment unit, carries out pretreatment for the video image getting described wide angle camera；

First volume product unit, for the human detection degree of depth volume described video image input training in advance obtained Long-pending neutral net, obtains characteristic spectrum；On described characteristic spectrum, the action scope of pixel corresponds to described video Picture block in image, on described characteristic spectrum, the action scope of all pixels corresponds to described video image；

First output unit, for utilizing the default sliding window of multiple different scale and/or length-width ratio to described Characteristic spectrum is scanned, and obtains the score value of human body region；

First determines unit, for determining that described score value exceedes preset first threshold value and the district of local highest scoring Territory is described human body region.

Preferably, described second detection module specifically includes:

Second pretreatment unit, carries out pretreatment for the video image getting described focal length camera；

Volume Two product unit, for the Face datection degree of depth volume described video image input training in advance obtained Long-pending neutral net, obtains characteristic spectrum；On described characteristic spectrum, the action scope of pixel corresponds to described video Picture block in image, on described characteristic spectrum, the action scope of all pixels corresponds to described video image；

Second output unit, for utilizing the default sliding window of multiple different scale and/or length-width ratio to described Characteristic spectrum is scanned, and obtains the score value of face region；

Second determines unit, for determining that described score value exceedes default Second Threshold and the district of local highest scoring Territory is described face region.

Have the beneficial effect that:

The face snap method and system that the embodiment of the present application is provided, it is possible to use wide angle camera obtains monitoring Wide-angle video data in scene, detect human body region according to described video data, then recycle Focal length camera obtains the video data of described human body region, finally, gets described focal length camera Video data carries out Face datection, obtains facial image, and the embodiment of the present application combines the advantage of different cameral, Wide angle camera can be utilized to obtain the human body target in large-range monitoring scene, then can utilize focal length camera Obtain remote high-resolution face, overcome conventional face's capturing system and be necessarily dependent upon shooting at close range face Shortcoming, be adapted to application scenarios more complicated and changeable.

Accompanying drawing explanation

The specific embodiment of the application is described below with reference to accompanying drawings, wherein:

Fig. 1 shows the schematic flow sheet that in the embodiment of the present application one, face snap method is implemented；

Fig. 2 shows face snap process schematic in the embodiment of the present application two；

Fig. 3 shows human body/or the process schematic of Face datection in the embodiment of the present application three；

Fig. 4 shows the structural representation one of face snap system in the embodiment of the present application four；

Fig. 5 shows the structural representation two of face snap system in the embodiment of the present application four；

Fig. 6 shows the structural representation three of face snap system in the embodiment of the present application four；

Fig. 7 shows the structural representation four of face snap system in the embodiment of the present application four；

Fig. 8 shows the structural representation of the first comparison module in the embodiment of the present application four；

Fig. 9 shows the structural representation of first detection module in the embodiment of the present application four；

Figure 10 shows the structural representation of the second detection module in the embodiment of the present application four.

Detailed description of the invention

Technical scheme and advantage in order to make the application are clearer, below in conjunction with accompanying drawing to the application's Exemplary embodiment is described in more detail, it is clear that described embodiment is only the one of the application Section Example rather than all embodiments exhaustive.And in the case of not conflicting, in this explanation Feature in embodiment and embodiment can be combined with each other.

Inventor note that during invention

Being different from the traditional identity identifications such as iris verification, fingerprint authentication, face snap has following several the most excellent Point:

1) operating distance limits and significantly relaxes, it is not necessary to people directly contacts particular device and gathers the information such as fingerprint, As long as can process in photographic head coverage；

2) can freely be operated by photographic head under non-intervention non-cooperation scene, not interfere with personnel normal Movable, it is not required that personnel intervention；

3) biological characteristic recorded is friendly directly perceived, and by face, ordinary people can distinguish identity, it is not necessary to Specific area expert differentiates；

4) waiting time is short, and operating efficiency is high, it is not necessary to personnel gather iris/refer to by gate one by one The information such as stricture of vagina, can quickly be positioned by methods such as computer visions, quickly process.

But, in the range of existing face snap system is typically only capable to process closely (such as: 3 meters) Man face image acquiring, it is impossible to realize the application scenarios (example of remote (such as: about 50 meters) large coverage Such as railway station square, big assembly, street etc.) under personnel control/record, this distance limit is significantly Constraining the usable range of face snap system, coverage rate is subject to photographic head installation position, lost big Measure personnel activity's record in wide place, can only passively be contemplated for monitored personnel's " by chance " near laying Photographic head, brings hidden trouble for safety.

For above-mentioned deficiency, present applicant proposes a kind of face snap system, make use of a priori cleverly Information: face is a part for the person, has the region of face must have people, otherwise the most not necessarily sets up.Cause This, can carry out a preliminary judgement by detection human body to Face datection, and owing to human body is than face district Territory occupied area in picture is much larger, and the distance that human detection can adapt to can the most very than Face datection Many, it is beneficial to efficiently investigation.

In order to realize the face snap of overlength distance, the application can be first with the Radix Rumicis phase of linkage phase unit Machine carries out human detection, human body target region interested is carried out primary dcreening operation, the most again to described region Call focal length camera shooting overlength distance video, accomplish high-precision face snap.

The application is intended to break through existing distance limit, it is achieved the people of overlength distance (such as: about 50 meters) Face is captured.

For the ease of the enforcement of the application, below in conjunction with specific embodiment, face provided herein is grabbed Shooting method and system illustrate.

Embodiment one,

Fig. 1 shows the schematic flow sheet that in the embodiment of the present application one, face snap method is implemented, as it can be seen, Described face snap method may include steps of:

Step 101, wide angle camera is utilized to obtain the wide-angle video data in monitoring scene；

Step 102, according to described video data detect human body region；

Step 103, utilize focal length camera obtain described human body region video data；

Step 104, the video data getting described focal length camera carry out Face datection, obtain face figure Picture.

In being embodied as, described wide angle camera can be that camera lens (wide-angle lens) has the broadest visual angle, More scenery scope can be accommodated in limited distance.The parameter weighing camera Radix Rumicis is typically minimum burnt Away from, the focal length of general wide-angle lens is between 24mm～35mm, and the Radix Rumicis of the least camera of minimum focus is the widest, It is suitable for clapping the landscape of big scene and building etc. up to.Focal length camera is then the phase having telephoto lens Machine, the focal length of telephoto lens is typically between 80mm～300mm, it is possible to be clearly captured out scape farther out Thing.

The phase unit that links described in the embodiment of the present application can be made up of two or more cameras, passes through Can accurately calculate after machinery and vision alignment position between any two camera with towards relative close System.By camera being fixed on the The Cloud Terrace controlled by motor, it is possible to achieve selected in a certain camera view A certain region, rotates other cameras so that they are towards the function of this selection area, in the embodiment of the present application This function can be referred to as linkage.

When being embodied as, first can utilize the video under wide angle camera shooting large-range monitoring scene, then The video photographing described wide angle camera carries out human detection, obtains human body region；Then length is utilized Burnt camera shoots the video of described human body region, and the video of described focal length camera shooting is carried out face inspection Survey, i.e. available facial image clearly.

Wherein, human detection and Face datection all can use image recognition technology in prior art, such as: Image is split, edge extracting, motion detection etc..As a example by human detection, it is possible to use existing Background modeling method is partitioned into prospect, method based on statistical learning etc.；As a example by Face datection, can be in order to By recognizer based on human face characteristic point, recognizer based on view picture facial image etc..

The face snap method that the embodiment of the present application is provided, it is possible to use wide angle camera obtains in monitoring scene Video data, human body region detected according to described video data, then recycling focal length camera obtains Taking the video data of described human body region, finally, the video data getting described focal length camera enters Row Face datection, obtains facial image, and the embodiment of the present application combines the advantage of different cameral, it is possible to use Wide angle camera obtains the human body target in large-range monitoring scene, and focal length camera then can be utilized to obtain long distance From high-resolution face, overcome conventional face's capturing system and be necessarily dependent upon the shortcoming of shooting at close range face, It is adapted to application scenarios more complicated and changeable.

Further, in order to realize face duplicate removal, the embodiment of the present application can also be implemented in the following way.

In enforcement, described method may further include:

When being embodied as, the embodiment of the present application can be by the facial image got and the most stored face figure As carrying out similarity-rough set.The described facial image stored is specifically as follows and is stored in caching or data base In facial image, these facial images can be the most structurized facial image.Wherein, video structural Can be, according to standard, the target in video and event are described and are stored in the process in data base.

The embodiment of the present application passes through face alignment, it is judged that this face occurred (be stored in caching or In data base), such that it is able to that avoids same face is repeated several times storage, record.

In enforcement, described method may further include:

When being embodied as, if this facial image with the facial image comparison stored after, not do not find The facial image stored has with described facial image similarity higher than the similar value preset, then can recognize Do not occurred in the picture that described wide angle camera is monitored by this face, therefore, it can described face figure As storing to caching or data base.

In enforcement, described method may further include:

When being embodied as, if described facial image and certain the face figure in the some facial images stored The similarity of picture is higher, it may be determined that described face was stored, and therefore, the embodiment of the present application is permissible Compare the picture quality of described facial image and the facial image stored further, at described facial image ratio During the picture quality height of the facial image stored, the facial image stored described in renewal.

Wherein, this facial image of described renewal is specifically as follows to be deleted the described facial image stored, and deposits Store up this facial image got.

By the way, the embodiment of the present application may insure that the face recorded to become apparent from, quality Higher facial image preserves, record, in order to follow-up scan for, other application such as inquiry time can be more Accurately, accurately.

In enforcement, described carry out similarity-rough set according to described facial image with the facial image stored, tool Body can be:

The characteristic information of described face is extracted, when described characteristic information includes current according to described facial image Between, face position in described wide angle camera and the feature representation that extracts according to described facial image；

The characteristic information of the characteristic information of described face with the facial image stored is carried out similarity-rough set；

The described result according to described similarity-rough set, determines that described face was the most stored, specifically Can be:

If the relative position that time interval is in preset time range, in described wide angle camera preset away from From the similarity of feature representation of interior and two facial images higher than predetermined threshold value, it is determined that described face Stored.

The criterion of the similarity in the embodiment of the present application can be: if time interval (occur frame time Between be spaced) within preset time threshold T, (human face region is the phase of coordinate under wide angle camera for relative distance Adjust the distance) within predeterminable range threshold value D, the feature representation similarity of two facial images is higher than presetting phase Like degree threshold value S, then, it is determined that this face is the face occurred；Otherwise, this face is not for occurring Cross.

When carrying out similarity-rough set, can first extract the characteristic information of corresponding face, characteristic information can To include that time (current time), face position occur (relative in the video captured by wide angle camera Coordinate) and feature (feature) expression of human face region extraction, wherein, described feature representation can be Scale invariant feature conversion (SIFT, Scale-invariant feature transform), histograms of oriented gradients (HOG, Histogram of Oriented Gradient), local binary patterns (LBP, Local Binary Etc. Pattern) feature that mode is expressed.

The embodiment of the present application can be by those at short period, the similar face of closer distance by said method It is defined as repeater's face, combines the face record duplicate removal of time, space, human face similarity degree so that people Face deduplication operation is more accurate.

In prior art, face snap system is all to use Haar-like based on Adaboost algorithm training Feature cascade classifier or its mutation carry out Face datection.But, complex scene is propped up by this mode Hold poor.

In order to preferably be captured effect, the embodiment of the present application can be implemented in the following way.

In enforcement, described according to described video data detection human body region, it is specifically as follows:

The video image getting described wide angle camera carries out pretreatment；

In being embodied as, the embodiment of the present application can gather a large amount of human sample in advance, obtains people through training Health check-up depth measurement degree convolutional neural networks, the video image after processing inputs described human detection degree of depth convolution god Characteristics of human body's collection of illustrative plates is i.e. can get through network.The depth measurement degree convolution of training of human health check-up described in the embodiment of the present application god Can use the training method of existing degree of depth convolutional neural networks through network, the application does not repeats at this.

Characteristic spectrum overall score is exceeded certain threshold value and the district of local highest scoring by the embodiment of the present application Territory is as human body region, thus improves the degree of accuracy of human region detection further, when being embodied as one Multiple human body region can be detected in width video image, with specific reference to the quantity of people in monitoring scene and Position determines.

In enforcement, the described video data getting described focal length camera carries out Face datection, the most permissible For:

When being embodied as, described video image is carried out pretreatment can be: equal proportion scales, and is adjusted by image Whole for pre-set dimension (such as: M*N)；Or by Image Adjusting be preset color space (such as: gray scale Figure, BGR or YUV etc.)；Can also be that the pixel value etc. deducting average human or face processes.

The convolutional neural networks of the Face datection degree of depth described in the embodiment of the present application can be according to collecting in advance Substantial amounts of face sample be trained obtaining, concrete training method can use prior art, and the application exists This does not repeats.

The video image in the embodiment of the present application, wide angle camera or focal length camera got as input picture, Successively it is input to multiple convolution layer, the characteristic spectrum (feature maps) in intermediate layer can be obtained, on collection of illustrative plates Action scope (receptive field) of each point corresponding to a picture block (patch) in artwork, On collection of illustrative plates action scope a little corresponding to the whole Zhang Yuantu of input.Wherein, convolutional layer can use existing Front 5 layers of convolutional layer of AlexNet, it is also possible to realized by those skilled in the art's self-developing, the application couple This is not restricted.

Through the characteristic spectrum of convolutional calculation output, can be again with multiple different scales and the sliding window of length-width ratio Mouth is scanned, and obtains the picture block (patch) corresponding to different scale in raw video image and length-width ratio Scoring (score), if score is the highest, then represent this picture block more likely for human body or the place of face Region.

When being embodied as, described convolution kernel, sliding window etc. all can be according to the great amount of samples (people obtained in advance Body/face) carry out machine learning, training obtain, its implement technology can use existing training classification Devices etc., the application does not repeats at this.

Multiple face region can be detected in a width video image, with specific reference to monitoring when being embodied as In scene, quantity and the position of face determine.

The embodiment of the present application can use full convolution deep neural network to carry out the detection of human body or face, permissible Being adapted to the input picture of arbitrary size, the detection that can be obtained all sliding windows by once-through operation is tied Really, and on the premise of not increasing extra computing complexity, it is possible to realize the human body of multiple dimensioned and length-width ratio/ Face datection, detection speed faster, Detection results more accurate.

The embodiment of the present application uses the most advanced human body/Face datection algorithm based on degree of depth study, it is possible to real The most accurate human body/Face datection, and adapt to application scenarios more complicated and changeable.

Embodiment two,

In order to realize the face snap of overlength distance, the application can utilize the wide angle camera pair of linkage phase unit Region interested carries out primary dcreening operation, region interested is called focal length camera shooting high definition the most again and surpasses long distance From video, carry out high-precision face snap.

Fig. 2 shows face snap process schematic in the embodiment of the present application two, as it can be seen, described face Candid photograph process may comprise steps of:

Step 201, human detection is done in the wide angle camera visual field；

From wide angle camera, obtain picture, utilize deep neural network to do human detection, right under extensive angle The region (having the place that human body occurs) being likely to occur target (face) carries out primary dcreening operation.

Step 202, judge whether to find area-of-interest；

If it is found, perform step 203；

If do not found, rebound step 201.

Described area-of-interest can be the sensitivity in the region of human body appearance, the region of densely populated place, the visual field Region etc..

Step 203, telephoto lens is navigated to this region；

Utilize linkage phase machine unit scheduling focal length camera to be directed at this region, shoot remote high definition picture.

Step 204, present viewing field is done Face datection；

To focal length camera shooting area, deep neural network is utilized to do Face datection.

Step 205, judge whether face to be detected；

If be detected that face, then perform step 206；

Without face being detected, rebound step 201.

When being embodied as, if be detected that multiple faces, it is assumed that opening face for N, subsequent step can circulate Perform N this, respectively every face is carried out subsequent operation.

Step 206, extraction feature, search similar face.

Extracting corresponding face characteristic information, characteristic information comprises current time, the face seat in wide angle camera The feature representation that mark and human face region extract.

Step 207, judge occurred before described face is whether；

If there is mistake, then perform step 208；

If do not occurred, then perform step 209.

Carry out similarity-rough set according to the face characteristic extracted with the most structurized face in caching/data base, sentence Whether disconnected is the people occurred.

Such as: assume that this face and the face stored the time interval of frame occur in 3s, in Radix Rumicis phase Under machine the relative distance of coordinate be 5 pixels, the feature representation similarity of two faces be 90%, then this Shen Embodiment please can be determined that this face is the face occurred, otherwise, for the face not occurred.

Step 208, judge whether better than picture quality before；

If this facial image is higher than picture quality before, then perform step 210；

High without picture quality before, then abandon this facial image, process next face, perform step Rapid 205.

Step 209, store this facial image；

Step 210, update this facial image.

In caching or data base, update (record) or add (the most not recording) this people's Facial image.

The embodiment of the present application uses hierarchical detection, in conjunction with the extensive human detection under wide angle camera and narrow Accurate Face datection under the camera of angle, it is achieved the high definition face snap of efficient overlength distance, it is adaptable to square, Many scenes such as fairground.

Embodiment three,

Deep neural network structure involved by human body/Face datection is illustrated by the embodiment of the present application below.

Fig. 3 shows human body/or the process schematic of Face datection in the embodiment of the present application three, as it can be seen, Described human body or Face datection process may include steps of:

Step 301, human body or facial image are carried out pretreatment.

By picture, (the inputted picture of human detection can be wide angle camera shooting, the inputted picture of Face datection Can be the shooting of focal length camera) carry out pretreatment: equal proportion scaling is adjusted to reasonable size and (is assumed to be 300*500), it is adjusted to unified color space (such as BGR) and deducts the pixel of average human/face Value etc..

Step 302, image is carried out multiple convolution calculating, obtain characteristic image.

Input picture is successively input to multiple convolution layer, obtains the characteristic spectrum (feature in intermediate layer Maps), the action scope (receptive field) of each on collection of illustrative plates point is corresponding to a figure in artwork Tile (patch), on collection of illustrative plates action scope a little corresponding to 1) in the whole Zhang Yuantu of input.

Step 303, utilize the sliding window of multiple different scale and length-width ratio that described characteristic image is swept Retouch.

The characteristic spectrum of output in step 302 is swept with the sliding window of multiple different scales and length-width ratio Retouching, it has two outputs:

One of output is the scoring of the picture block (patch) corresponding to different scale in artwork and length-width ratio (score), if score is the highest, then this picture block is represented more likely for human body/face region.

Another obtains after being output as target body/human face region picture block borderline region is carried out regression treatment With the side-play amount of accurate human body/human face region (because above sliding window on characteristic spectrum and may be also Can not the lucky frame of entirely accurate to complete human body/human face region, so needing to do one to return operation).

Step 304, determine the region of human body/face according to scanning result.

Above human body/human face region is carried out descending sort according to scoring (score), and deletes adjacent area The most too much selected frame is (because same people is likely to be lived by multiple circles, it is only necessary to retain the highest that of scoring Individual), obtain final output: the target area (bounding box) of human body/Face datection.

Comparing existing system based on conventional machines learning method, the embodiment of the present application can utilize the degree of depth neural Real-time performance human body/Face datection so that algorithm can adapt to scene complicated and changeable.

Embodiment four,

Based on same inventive concept, additionally providing a kind of face snap system in the embodiment of the present application, these set Standby implementing is similar to the enforcement principle of a kind of face snap method, repeats no more, carry out below in place of repetition Explanation.

Fig. 4 shows the structural representation one of face snap system in the embodiment of the present application four, as it can be seen, Described face snap system may include that linkage phase unit, first detection module 402 and the second detection module 404, described linkage phase unit includes wide angle camera 401 and focal length camera 403, wherein,

Described wide angle camera 401, for obtaining the wide-angle video data in monitoring scene；

Described first detection module 402, for detecting human body region according to described video data；

Described focal length camera 403, for obtaining the video data of described human body region；

Described second detection module 404, carries out face for the video data getting described focal length camera Detection, obtains facial image.

When being embodied as, wide angle camera can be the camera using wide-angle lens, and focal length camera can be to use The camera of telephoto lens, these cameras can be digital camera.Linkage phase unit in the embodiment of the present application can To include a wide angle camera and a focal length camera, it is also possible to include multiple wide angle camera and multiple focal length phase Machine, in the specific implementation, can be divided by multiple focal length cameras in wide angle camera being detected after multiple human bodies It is not directed at the plurality of human body to shoot, or a focal length camera is directed at a people at interval of a period of time Body, is polled, thus the video data of multiple personages in obtaining Same Scene.

Described first detection module and described second detection module all can use existing image recognition technology Realizing, those skilled in the art can also develop design accordingly, as long as being capable of detecting when human body/face , this is not restricted by the application.

The embodiment of the present application combines the advantage of different cameral, it is possible to use wide angle camera obtains large-range monitoring Human body target in scene, then can utilize focal length camera to obtain remote high-resolution face, overcome biography The shortcoming that system face snap system is necessarily dependent upon shooting at close range face.

Fig. 5 shows the structural representation two of face snap system in the embodiment of the present application four, as it can be seen, Described face snap system may further include:

First comparison module 405, for carrying out similarity by described facial image with the facial image stored Relatively；

Determine module 406, for the result according to described similarity-rough set, determine described face the most Stored.

When being embodied as, described first comparison module may be used for described facial image and has been stored in caching Or the facial image in data base compares, described be stored in caching or data base in facial image be The most structurized image.

Fig. 6 shows the structural representation three of face snap system in the embodiment of the present application four, as it can be seen, Described face snap system may further include:

Add module 407, for determining that described face is not out-of-date by storage, store described facial image.

When being embodied as, described interpolation module can specifically for determining described face by storage not out-of-date, Described facial image is stored to caching or data base.

Fig. 7 shows the structural representation four of face snap system in the embodiment of the present application four, as it can be seen, Described face snap system may further include:

Second comparison module 408, for determining that described face is out-of-date by storage, relatively described face The picture quality of image and the facial image stored；

More new module 409, is used for when described facial image is higher than the picture quality of the facial image stored, The facial image stored described in renewal.

When being embodied as, described more new module can be specifically for the face stored at described facial image ratio During the picture quality height of image, the described facial image stored is deleted, stores this face got Image.

Fig. 8 shows the structural representation of the first comparison module in the embodiment of the present application four, as it can be seen, institute State the first comparison module 405 specifically to may include that

Feature extraction unit 4051, for extracting the characteristic information of described face, institute according to described facial image State characteristic information and include current time, the face position in described wide angle camera and according to described face The feature representation of image zooming-out；

Comparing unit 4052, for the feature letter by the characteristic information of described face with the facial image stored Breath carries out similarity-rough set；

Determine that if module specifically may be used for time interval in preset time range, at described Radix Rumicis described Magazine relative position is in predeterminable range and the similarity of feature representation of two facial images is higher than Predetermined threshold value, it is determined that described face was stored.

Those similar faces in short period, closer distance are defined as repeater by the embodiment of the present application Face, combines the face record duplicate removal of time, space, human face similarity degree so that face deduplication operation is more Accurately.

Fig. 9 shows the structural representation of first detection module in the embodiment of the present application four, as it can be seen, institute State first detection module 402 specifically to may include that

First pretreatment unit 4021, carries out pre-place for the video image getting described wide angle camera Reason；

First volume product unit 4022, deep for the human detection that described video image input training in advance is obtained Degree convolutional neural networks, obtains characteristic spectrum；On described characteristic spectrum, the action scope of pixel is corresponding to described Picture block in video image, on described characteristic spectrum, the action scope of all pixels corresponds to described video figure Picture；

First output unit 4023, for utilizing the default sliding window of multiple different scale and/or length-width ratio Described characteristic spectrum is scanned, obtains the score value of human body region；

First determines unit 4024, is used for determining that described score value exceedes preset first threshold value and local highest scoring Region be described human body region.

Figure 10 shows the structural representation of the second detection module in the embodiment of the present application four, as it can be seen, Described second detection module 404 specifically may include that

Second pretreatment unit 4041, carries out pre-place for the video image getting described focal length camera Reason；

Volume Two product unit 4042, deep for the Face datection that described video image input training in advance is obtained Degree convolutional neural networks, obtains characteristic spectrum；On described characteristic spectrum, the action scope of pixel is corresponding to described Picture block in video image, on described characteristic spectrum, the action scope of all pixels corresponds to described video figure Picture；

Second output unit 4043, for utilizing the default sliding window of multiple different scale and/or length-width ratio Described characteristic spectrum is scanned, obtains the score value of face region；

Second determines unit 4044, is used for determining that described score value exceedes default Second Threshold and local highest scoring Region be described face region.

The face snap system that the embodiment of the present application is provided, described wide angle camera obtains regarding in monitoring scene Frequency evidence, first detection module detects human body region, the most described focal length according to described video data Camera obtains the video data of described human body region, and finally, the second detection module is to described focal length camera The video data got carries out Face datection, obtains facial image, and the embodiment of the present application combines not homophase The advantage of machine, it is possible to use wide angle camera obtains the human body target in large-range monitoring scene, then can be in order to Obtain remote high-resolution face with focal length camera, overcome conventional face's capturing system and be necessarily dependent upon low coverage From the shortcoming of shooting face, it is adapted to application scenarios more complicated and changeable.

For convenience of description, each several part of apparatus described above is divided into various module or unit respectively with function Describe.Certainly, can be the function of each module or unit at same or multiple softwares when implementing the application Or hardware realizes.

Those skilled in the art are it should be appreciated that embodiments herein can be provided as method, system or meter Calculation machine program product.Therefore, the application can use complete hardware embodiment, complete software implementation or knot The form of the embodiment in terms of conjunction software and hardware.And, the application can use and wherein wrap one or more Computer-usable storage medium containing computer usable program code (include but not limited to disk memory, CD-ROM, optical memory etc.) form of the upper computer program implemented.

The application is with reference to method, equipment (system) and the computer program product according to the embodiment of the present application The flow chart of product and/or block diagram describe.It should be understood that can by computer program instructions flowchart and / or block diagram in each flow process and/or flow process in square frame and flow chart and/or block diagram and/ Or the combination of square frame.These computer program instructions can be provided to general purpose computer, special-purpose computer, embedding The processor of formula datatron or other programmable data processing device is to produce a machine so that by calculating The instruction that the processor of machine or other programmable data processing device performs produces for realizing at flow chart one The device of the function specified in individual flow process or multiple flow process and/or one square frame of block diagram or multiple square frame.

These computer program instructions may be alternatively stored in and computer or the process of other programmable datas can be guided to set In the standby computer-readable memory worked in a specific way so that be stored in this computer-readable memory Instruction produce and include the manufacture of command device, this command device realizes in one flow process or multiple of flow chart The function specified in flow process and/or one square frame of block diagram or multiple square frame.

These computer program instructions also can be loaded in computer or other programmable data processing device, makes Sequence of operations step must be performed to produce computer implemented place on computer or other programmable devices Reason, thus the instruction performed on computer or other programmable devices provides for realizing flow chart one The step of the function specified in flow process or multiple flow process and/or one square frame of block diagram or multiple square frame.

Although having been described for the preferred embodiment of the application, but those skilled in the art once knowing base This creativeness concept, then can make other change and amendment to these embodiments.So, appended right is wanted Ask and be intended to be construed to include preferred embodiment and fall into all changes and the amendment of the application scope.

Claims

1. a face snap method, it is characterised in that comprise the steps:

Human body region is detected according to described video data；

2. the method for claim 1, it is characterised in that farther include:

3. method as claimed in claim 2, it is characterised in that farther include:

4. method as claimed in claim 2, it is characterised in that farther include:

5. method as claimed in claim 2, it is characterised in that described according to described facial image with deposit The facial image of storage carries out similarity-rough set, particularly as follows: extract the spy of described face according to described facial image Reference ceases, and described characteristic information includes current time, the face position in described wide angle camera, Yi Jigen The feature representation extracted according to described facial image；By the characteristic information of described face and the facial image stored Characteristic information carry out similarity-rough set；

6. the method for claim 1, it is characterised in that described according to described video data detection people Body region, particularly as follows:

The video image getting described wide angle camera carries out pretreatment；

7. the method for claim 1, it is characterised in that described described focal length camera got Video data carries out Face datection, particularly as follows:

8. a face snap system, it is characterised in that including: linkage phase unit, first detection module and Second detection module, described linkage phase unit includes wide angle camera and focal length camera, wherein,

9. system as claimed in claim 8, it is characterised in that farther include:

10. system as claimed in claim 9, it is characterised in that farther include:

11. systems as claimed in claim 9, it is characterised in that farther include:

12. systems as claimed in claim 9, it is characterised in that described first comparison module specifically includes:

13. systems as claimed in claim 8, it is characterised in that described first detection module specifically includes:

14. systems as claimed in claim 8, it is characterised in that described second detection module specifically includes: