CN108875504A

CN108875504A - Image detecting method and image detection device neural network based

Info

Publication number: CN108875504A
Application number: CN201711107369.2A
Authority: CN
Inventors: 林孟潇; 张祥雨
Original assignee: Beijing Megvii Technology Co Ltd; Beijing Maigewei Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd; Beijing Maigewei Technology Co Ltd
Priority date: 2017-11-10
Filing date: 2017-11-10
Publication date: 2018-11-23
Anticipated expiration: 2037-11-10
Also published as: CN108875504B

Abstract

The embodiment of the present disclosure provides a kind of image detecting method neural network based and image detection device.The method includes：Feature extraction is carried out to described image, to obtain characteristics of image；Based on described image feature, the number of people region of human body in described image is detected；Based on the testing result in the number of people region, determine that the human region corresponding with the number of people region in described image, the human region include the number of people region and body region.The embodiment of the present disclosure improves detection speed and detection efficiency relative to traditional detection device.

Description

Image detecting method and image detection device neural network based

Technical field

The embodiment of the present disclosure is related to a kind of image detecting method neural network based and image corresponding with this method inspection Survey device.

Background technique

Pedestrian detection (Pedestrian Detection) is to judge to whether there is pedestrian simultaneously in image or video sequence Give pinpoint technology.Since pedestrian has both the characteristic of rigidity and flexible article, appearance vulnerable to dress, scale, block, appearance State and visual angle etc. influence, so that pedestrian detection is extremely challenging.

When the crowd is dense, people is overlapped, due to needing to eliminate possible overlapping using certain algorithms, so can go out Now a large amount of missing inspections or the phenomenon that multiple people are considered as a people, it is particularly disadvantageous for tasks such as stream of people's statistics.On the other hand, When pedestrian is less in picture, existing method can do unnecessary calculating to regions a large amount of in picture, so that system resource is wasted, Influence computational efficiency.

Summary of the invention

The embodiment of the present invention is designed to provide a kind of image detecting method neural network based and image detection dress It sets, to solve the above technical problems.

According at least one embodiment of the disclosure, a kind of image detecting method neural network based is provided, it is described Method includes：Feature extraction is carried out to described image, to obtain characteristics of image；Based on described image feature, described image is detected The number of people region of middle human body；Based on the testing result in the number of people region, determine in described image with the number of people region pair The human region answered, the human region include the number of people region and body region.

For example, the step of being based on described image feature, detecting the number of people region of human body in described image includes：By the figure As being input in first nerves network, the first nerves network is used to extract the number of people region in image；From first mind Through exporting at least one number of people region candidate frame in described image in network.

For example, the step of being based on described image feature, detecting the number of people region of human body in described image further includes：From described The score of each of at least one number of people region candidate frame of described image, the fraction representation institute are exported in first nerves network State a possibility that region is number of people region；The score is compared with default number of people score threshold；Score is greater than described The number of people region of default number of people score threshold is determined as number of people region.

For example, the testing result based on the number of people region, determine in described image with the number of people region pair The step of human region answered includes：Obtain the relative position parameter of the number of people and body that obtain based on machine learning；According to institute It states relative position parameter and determines at least one human region candidate frame corresponding with the number of people region in described image.

For example, the testing result based on the number of people region, determine in described image with the number of people region pair The step of human region answered further includes：Obtain default estimation parameter, in the default estimation parameter list diagram picture with the people The number of at least one corresponding human region candidate frame of head region；Based on the default estimation parameter, in described image really The human region candidate frame of the fixed number corresponding with the number of people region.

For example, the testing result based on the number of people region, determine in described image with the number of people region pair The step of human region answered further includes：It determines the fractional value of each human region candidate frame, is mentioned described in the fractional value expression A possibility that human region candidate frame taken is human region；It is candidate from the human region of the number based on the fractional value Select at least one as the human region in frame.

For example, the step of determining the fractional value of each human region candidate frame includes：The human region of the number is waited Block diagram picture is selected to be input to trained nervus opticus network, the nervus opticus network is for determining that each human region is waited Select the fractional value of block diagram picture；The corresponding score of each human region candidate frame is exported from the nervus opticus network Value.

For example, the testing result based on the number of people region, determine in described image with the number of people region pair The step of human region answered further includes：The human region candidate frame is modified, to obtain revised human region.

For example, the step of human region candidate frame is modified includes：By the human region candidate frame image It is input in the third nerve network, the third nerve network is for correcting the human region candidate frame；

The correction result of each human region candidate frame is exported from the third nerve network.

For example, the step of exporting the correction result of each human region candidate frame from third nerve network packet It includes：The third nerve network determines the human region candidate frame in described image based on the human region candidate frame Original area；Determine whether the original area is complete；When the original area is imperfect, people corresponding to the original area Body region candidate frame is modified.

For example, being modified when the original area is imperfect to the corresponding human region candidate frame of the original area The step of include：Obtain multiple standard physical frames of the trained third nerve network output；The human region is waited Frame is selected to be matched with the multiple standard physical frame；It is waited the highest standard physical frame of matching rate as corresponding the people's body region Select the amendment frame of frame；The corresponding human region candidate frame is modified based on the amendment frame.

For example, the modified parameter includes：Regional center point position, peak width, at least one of region height.

For example, the method also includes：To number of people region described in described image and people corresponding with the number of people region Body region carries out non-maximum value and constrains post-processing, to obtain the human body detected in described image.

For example, carrying out non-maximum to number of people region described in described image and human region corresponding with the number of people region Value constrains post-processing, and to obtain the human body detected in described image the step of includes：When any of the number of people region is right When the human region answered is multiple, the score in each of multiple human regions corresponding with the number of people region is determined Value；The highest human region of fractional value is determined as human region corresponding with the number of people region.

For example, the number of people region be it is multiple, the multiple number of people region includes the first head region and the second number of people area Domain, it is described that number of people region described in described image and the non-maximum value of human region corresponding with number of people region progress are constrained Post-processing, to obtain the human body detected in described image the step of further include：When the first head region and the second number of people region are deposited When overlapping, the ratio of the overlapping region Yu the first head region and the second number of people region union refion is determined；Work as institute When stating ratio greater than the first fractional threshold, corresponding first human region of the first head region and the second number of people region are obtained Corresponding second human region；The fractional value for comparing first human region and second human region, when first When the fractional value of human region is greater than the fractional value of the second human region, first human region and the first with this is determined The corresponding the first head region of body region is the human region eventually detected and number of people region.

According at least one embodiment of the disclosure, a kind of image detection device neural network based is provided, including： Memory, processor, the memory store program instruction, and the processor is executed when handling described program instruction：To institute It states image and carries out feature extraction, to obtain characteristics of image；Based on described image feature, the number of people area of human body in described image is detected Domain；Based on the testing result in the number of people region, the human region corresponding with the number of people region in described image, institute are determined Stating human region includes the number of people region and body region.

For example, being based on described image feature, the number of people region for detecting human body in described image includes：Described image is inputted Into first nerves network, the first nerves network is used to extract the number of people region in image；From the first nerves network At least one number of people region candidate frame in middle output described image.

For example, being based on described image feature, the number of people region for detecting human body in described image further includes：From first mind The score of each of at least one number of people region candidate frame of described image, region described in the fraction representation are exported in network A possibility that being number of people region；The score is compared with default number of people score threshold；Score is greater than the default people The number of people region of head score threshold is determined as number of people region.

For example, the testing result based on the number of people region, determine in described image with the number of people region pair The human region answered includes：Obtain the relative position parameter of the number of people and body that obtain based on machine learning；According to described opposite Location parameter determines at least one human region candidate frame corresponding with the number of people region in described image.

For example, the testing result based on the number of people region, determine in described image with the number of people region pair The human region answered further includes：Obtain default estimation parameter, in the default estimation parameter list diagram picture with the number of people region The number of at least one corresponding human region candidate frame；Based on the default estimation parameter, the determining and institute in described image State the human region candidate frame of the corresponding number in number of people region.

For example, the testing result based on the number of people region, determine in described image with the number of people region pair The human region answered further includes：Determine that the fractional value of each human region candidate frame, the fractional value indicate the people of the extraction A possibility that body region candidate frame is human region；Based on the fractional value, selected from the human region candidate frame of the number At least one is selected as the human region.

For example, determining that the fractional value of each human region candidate frame includes：By human region candidate's block diagram of the number As being input to trained nervus opticus network, the nervus opticus network is for determining each human region candidate frame figure The fractional value of picture；The corresponding fractional value of each human region candidate frame is exported from the nervus opticus network.

For example, the testing result based on the number of people region, determine in described image with the number of people region pair The human region answered further includes：The human region candidate frame is modified, to obtain revised human region.

For example, by the human region candidate frame be modified including：The human region candidate frame image is input to In the third nerve network, the third nerve network is for correcting the human region candidate frame；From the third nerve Network exports the correction result of each human region candidate frame.

For example, including from the correction result that the third nerve network exports each human region candidate frame：It is described Third nerve network determines region of initiation of the human region candidate frame in described image based on the human region candidate frame Domain；Determine whether the original area is complete；When the original area is imperfect, human region corresponding to the original area Candidate frame is modified.

For example, being modified when the original area is imperfect to the corresponding human region candidate frame of the original area Including：Obtain multiple standard physical frames of the trained third nerve network output；By the human region candidate frame with The multiple standard physical frame is matched；Using the highest standard physical frame of matching rate as corresponding human body region candidate frame Correct frame；The corresponding human region candidate frame is modified based on the amendment frame.

For example, described device further includes：To number of people region described in described image and people corresponding with the number of people region Body region carries out non-maximum value and constrains post-processing, to obtain the human body detected in described image.

For example, carrying out non-maximum to number of people region described in described image and human region corresponding with the number of people region Value constrains post-processing, includes to obtain the human body detected in described image：As the corresponding people in any of the number of people region When body region is multiple, the fractional value in each of multiple human regions corresponding with the number of people region is determined；It will divide The highest human region of numerical value is determined as human region corresponding with the number of people region.

According at least one embodiment of the disclosure, a kind of image detection device neural network based is additionally provided, it is described Device includes：Acquiring unit is configured as carrying out feature extraction to described image, to obtain characteristics of image；Detection unit is matched It is set to based on described image feature, detects the number of people region of human body in described image；Determination unit is configured as based on the people The testing result of head region determines the human region corresponding with the number of people region in described image, the human region packet Include the number of people region and body region.

According at least one embodiment of the disclosure, a kind of executable non-volatile memory medium of computer is additionally provided, Program instruction is stored in the medium, described program instruction is loaded by the processor of the computer and executes above-described embodiment institute The step of method stated.

The embodiment of the present disclosure, it is first from the number of people region detected in image to be detected in image, be then with number of people region Pedestrian is detected near the position of someone's head region in basis, relative to traditional detection device, improves detection speed and detection Efficiency.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, making below by required in the description to embodiment Attached drawing is briefly described.The accompanying drawings in the following description is only exemplary embodiment of the present invention.

Fig. 1 shows the flow chart of image detecting method according to an embodiment of the present invention；

Fig. 2 shows the modification methods of human region candidate frame according to an embodiment of the present invention；

Fig. 3 shows human body detecting method flow chart according to an embodiment of the present invention；

Fig. 4 shows the architecture diagram of image detection device according to an embodiment of the present invention；

Fig. 5 shows the image detection device according to the embodiment of the present disclosure；

Fig. 6 shows convolution kernel example according to an embodiment of the present invention.

Specific embodiment

Hereinafter, by preferred embodiments of the present invention will be described in detail with reference to the annexed drawings.Note that in the specification and drawings In, there is substantially the same step and element to be denoted by the same reference numerals, and to the repetition solution of these steps and element Releasing will be omitted.

Fig. 1 shows image detecting method neural network based 100 according to an embodiment of the present invention.Referring to Fig. 1, the figure As detection method 100 may include following step.

In step s101, feature extraction is carried out to image, to obtain characteristics of image.An example according to the present invention, Trained neural network can be used, feature extraction is carried out to image, such as SIFT (Scale-invariant can also be used Feature transform, scale extraneous features transformation), HOG (Histogram of oriented gradient, direction ladder The methods of spend histogram).

Convolutional neural networks (CNN) are part connection networks.Relative to fully-connected network, its maximum feature is exactly：Part Connectivity and weight sharing.Because pixel generally closer from pixel p is to it for some pixel p in a sub-picture It influences also bigger (local connectivity).In addition, the weight in some region can be used for according to the statistical property of natural image Another region (weight sharing).Here the shared convolution kernel that can be of weight is shared, for a convolution kernel by its with give Fixed image, which does convolution, can extract a kind of feature of image, and different convolution kernels can extract different characteristics of image.Example Such as, the calculation method of convolutional layer can be calculated according to following formula：

Wherein " σ " indicates activation primitive；" imgMat " indicates gray level image matrix；" W " indicates convolution kernel；" ο " indicates volume Product operation；" b " indicates bias.

An example according to the present invention can carry out feature extraction to image by CNN.Fig. 6 is shown according to this The convolution kernel example of inventive embodiments.Convolution kernel can use the first convolution kernel shown in fig. 6, wherein and Gx indicates horizontal direction, Gy indicates vertical direction.

Convolution is done to image with the first convolution kernel-Gx first, such as using in formula (1)

Here convolution kernel size can be 3x3, and image size can be 512x512.If do not done to image any other Processing, directly if progress convolution, the image size after convolution be can be:(512-3+1)x(512-3+1).After inciting somebody to action convolution above Acquired results, i.e. a matrix, each element add bias b, and each element in gained matrix is input to One activation primitive, it can obtain image characteristics extraction result.Activation primitive for example can be：

Further, it is also possible to further carry out feature extraction using the first convolution kernel-Gy, another image characteristics extraction is obtained As a result.In the example, two convolution kernels have been used.Each convolution kernel has extracted different characteristics of image.Art technology Personnel will be seen that, also can be used more than ten or tens convolution kernels extract characteristics of image.

Another example according to the present invention can also use Scale invariant features transform (ScaleInvariant Feature Transform, SIFT) algorithm to image carry out feature extraction.SIFT algorithm is for translation, rotation and dimensional variation Invariance is all had, and noise, visual angle change and illumination variation etc. are had good robustness.

It chooses image and carries out SIFT feature extraction, primarily to improving computational efficiency, the image of selection cannot be too small, no Then cause the characteristic point of detection very few, to influence matched accuracy.

SIFT feature extract method may include：

(1) scale spatial extrema is detected：

Visual point image and the Gaussian function of different IPs are subjected to convolution, obtain corresponding Gaussian image, wherein dimensional Gaussian Function is defined as follows：

Wherein σ is known as the variance of Gaussian function；X and y is respectively two dimensions of row and column of image.

Two factors are differed and are formed by Gaussian image progress difference for the Gaussian function of k, form the DoG of image (Difference of Gaussian) scale space, as following formula indicates：

D (x, y, σ)=(G (x, y, k σ)-G (x, y, σ)) * I (x, y)=L (x, y, k σ)-L (x, y, σ)；

Take 3 adjacent scales of DoG scale space, each pixel of middle layer and its same layer and upper and lower level adjacent bit The pixel set is compared one by one, if the point is maximum or minimum, changing the time is candidate under this scale Characteristic point.

(2) positioning feature point：

Since DoG value is more sensitive for noise and edge, for Local Extremum, it is still necessary to be used by Taylor expansion Accurately to determine position and the scale of candidate feature point, while removing the characteristic point of low contrast.

(3) characteristic point principal direction is determined：

It determines that the principal direction of characteristic point is mainly used for Feature Points Matching, after finding out principal direction, is carrying out Feature Points Matching When can be image rotation to principal direction, to guarantee the rotational invariance of image.For the gradient at pixel (x, y) Value and direction are respectively：

Wherein m (x, y) indicates that the energy in direction, θ (x, y) indicate direction.

It is sampled in the neighborhood window centered on characteristic point, and counts the ladder of neighborhood territory pixel with gradient orientation histogram It spends to the corresponding direction of peak-peak point of histogram is principal direction.So far the characteristic point detection of image, each feature are completed All there are three information for point：Position, corresponding scale and direction.

(4) SIFT feature descriptor is generated：

SIFT algorithm generates feature descriptor in a manner of sample region, can be first by reference axis to ensure rotational invariance The direction for rotating to be characteristic point takes 8 × 8 window centered on characteristic point, and 8 sides are then calculated on 4 × 4 image fritter To gradient orientation histogram, draw each gradient direction accumulated value, form a seed point.Then a characteristic point just uses 16 Seed point describes, and each seed point has 8 direction vector information, therefore each characteristic point can produce 16 × 8 totally 128 numbers According to, that is, formed 128 dimension SIFT feature descriptors.

An example according to the present invention, can also be using HOG (Histogram of oriented gradient, side To histogram of gradients) feature extraction is carried out to image.HOG feature extracting method may comprise steps of.

1) image is normalized

Normalized operation purpose be：In order to improve characteristics of image descriptor to the robust of illumination and environmental change Property, it reduces that the shade of image local, partial exposure be excessive and texture distortion, inhibits the interference noise of image as far as possible.Normalizing Changing processing operation is first to convert gray level image for image, and Gamma correction is recycled to realize.

2) segmented image

Because histograms of oriented gradients HOG is local feature description's symbol of a description image local texture information, If directly to one substantially image and retrograde feature extraction, it will the effect that cannot be got well.It would therefore be desirable to first will figure Picture is divided into lesser pane location, for example we first divide an image into the pane location Cell of 20*20 size in a program, Then 2*2 Cell forms a block, finally, all blocks form image.

3) histograms of oriented gradients of each pane location Cell is calculated

It divides an image into after small Cell, is next exactly the histograms of oriented gradients for calculating each Cell, it can To solve the gradient image in the zonule X-direction and Y-direction to each cell domain.Then, then each cell domain is calculated In each pixel gradient direction and gradient magnitude.After having been calculated, it will generating an abscissa X is gradient direction, Ordinate Y is the histograms of oriented gradients of gradient magnitude.

4) feature vector normalizes

In order to which contrast difference is surveyed in the variation for overcoming uneven illumination even and foreground and background, need to calculate each zonule Feature vector out is normalized.For example, being normalized using normalized function.

5) generation of HOG feature vector

Firstly, the HOG feature vector that the HOG feature vector ratio of components of the lattice unit in image is relatively large, such as A block is formed using 2*2 pane location.So mutually again by the HOG feature of the HOG feature vector composition full images of all blocks Vector.The specific combination of feature vector is that small feature vector is formed a dimension according to end to end mode to compare Big feature vector.For example, piece image is divided into m*n block, the dimension of the feature vector of each block is 9 dimension (each Gradient direction is exactly one-dimensional).So, the final feature vector dimension of this image is exactly m*n*9.

Those skilled in the art can understand, and features above extracting method is only example of the invention, the side of feature extraction There are many methods, can also use other image characteristics extractions.

In step s 102, it is based on characteristics of image, the number of people region of human body in detection image.

An example according to the present invention can be used trained for the number of people region of human body in detection image One neural network, the trained first nerves network can extract the number of people region in image.Image to be detected is input to In first nerves network, then from least one number of people region candidate frame exported in first nerves network in image.The number of people Region candidate frame is in image may be number of people region region, in the next steps, the candidate frame can also be screened come Determine number of people region.

An example according to the present invention, in order to judge a possibility that number of people region candidate frame of output is number of people region, Can also from first nerves network export the number of people region candidate frame while, from first nerves network export image to A possibility that score of each of few people's head region candidate frame, fraction representation region is number of people region.Then by the score It is compared with default number of people score threshold.Default number of people score threshold can pass through engineering in training first nerves network Acquistion is arrived, and can also be set according to the actual situation.If the score is greater than default number of people score threshold, by the people's Head Section Domain candidate frame is determined as number of people region.If the score is less than default number of people score threshold, and the people's head region candidate frame is true Being set to is not number of people region.Thus, it is possible to which apparent inhuman head region is filtered out.

In step s 103, the testing result based on number of people region determines the human body corresponding with number of people region in image Region, human region include number of people region and body region.

An example according to the present invention, on the basis of step S102 obtains number of people region candidate frame, in order into one Step detects human region corresponding with the people's head region, the available number of people obtained based on machine learning according to number of people region With the relative position parameter of body.The relative position parameter indicates the proportionate relationship of the number of people and human body, then, according to opposite position It sets parameter and determines at least one human region candidate frame corresponding with number of people region in image.Human body region candidate frame is also It is the regional frame that may be human region.In another example, which can also artificially set.

Due to that may will detect that multiple human regions for people's head region during detecting human region, In order to obtain an appropriate number of human region candidate frame, an example according to the present invention can obtain default estimation ginseng first Number, the default number for estimating at least one human region candidate frame corresponding with number of people region in parameter list diagram picture.Then Based on default estimation parameter, the human region candidate frame of number corresponding with number of people region is determined in the picture.

After obtaining multiple human region candidate frames, in order to be filtered out from multiple human region candidate frames obviously not It is the candidate frame of human region, an example according to the present invention can also determine the fractional value of each human region candidate frame, The fractional value indicates a possibility that human region candidate frame extracted is human region.It is then based on the fractional value, it can be from inspection Select at least one as human region in the multiple human region candidate frames measured.

It is, for example, possible to use trained nervus opticus network, nervus opticus network is for determining that each human region is waited Select the fractional value of block diagram picture.The human region candidate frame that front exports is input in trained nervus opticus network, and from The corresponding fractional value of each human region candidate frame is exported in nervus opticus network.Determine that human region is waited using the fractional value A possibility that selecting frame to be human region, to carry out Effective selection.

In addition, in order to improve the accuracy of pedestrian detection, an example according to the present invention is exported in nervus opticus network After human region candidate frame, human region candidate frame can also be modified, to obtain revised human region.

An example according to the present invention, repairs human region candidate frame using trained third nerve network Just.For example, human region candidate frame is input in third nerve network, each human body area then is exported from third nerve network The correction result of domain candidate frame.

Fig. 2 shows the modification methods 200 of human region candidate frame according to an embodiment of the present invention.Referring to fig. 2, human body area Domain candidate frame modification method may comprise steps of.

In step s 201, third nerve network determines human region candidate frame in the picture based on human region candidate frame Original area.For example, in training third nerve network, it can be by the image of the progress feature extraction in preceding step S101 As original image, i.e., image to be detected.Original area is exactly human region candidate frame where in the image to be detected Region.As an example, this can be cut out from the original image according to position of the human region candidate frame in original image Original area.

In step S202, determine whether original area is complete.For example, can according to human body proportion, human body part Normal rates range and size range judge whether human body or head part are complete in the original area.

In step S203, when original area is imperfect, the corresponding human region candidate frame of the original area is carried out Amendment.As an example, obtaining multiple standard physical frames of trained third nerve network output first when being modified. The standard physical frame is obtained according to the labeled data of the sample of multiple normal humans.Therefore standard physical frame can also have more It is a, for example, all ages and classes, gender human body have different standard physical frames.Then by human region candidate frame and multiple standards Body frame is matched.Using the highest standard physical frame of matching rate as the amendment frame of corresponding human body region candidate frame.Also It is to say, the highest standard physical frame of matching rate and human body region candidate frame are most close, are referred to the highest standard of matching rate Body frame is modified human region candidate frame.The modified parameter is needed for example to may include：The area of human region candidate frame Domain center position, peak width, at least one of region height.

As an example, parameters revision mode can adopt in the following method.Tentative standard body frame is rectangle frame, and x is mark The abscissa of the central point of quasi- body frame, y are the ordinate of the central point of standard physical frame, and w is the width of standard physical frame, h For the height of standard physical frame.(x₀, y₀) it is standard physical frame lower-left angular coordinate, (x₁, y₁) be standard physical frame the upper right corner sit Mark, then it is available,

X=(x₀+x₁)/2, y=(y₀+y₁)/2, w=x₁-x₀, h=y₁-y₀。

Assuming that human region candidate frame is also rectangle frame, x_aFor the abscissa of the central point of human region candidate frame, y_aFor people The ordinate of the central point of body region candidate frame, w_aFor the width of human region candidate frame, h_aFor the height of human region candidate frame Degree.

Assuming that revised human region candidate frame is also rectangle, x ' is the central point of revised human region candidate frame Abscissa, y ' is the ordinate of the central point of revised human region candidate frame, and w ' is that revised human region is candidate The width of frame, h ' are the height of revised human region candidate frame.

Amendment offset t can be so obtained according to following formula_x,t_y,t_w,t_hRegressive object：

t_x=(x-x_a)/w_a,t_y=(y-y_a)/h_a,

t_w=log (w/w_a),t_h=log (h/h_a),

t’_x=(x '-x_a)/w_a,t’_y=(y '-y_a)/h_a,

t’_w=log (w '/w_a),t’_h=log (h '/h_a)。

In training third nerve network, according to the parameter of characteristics of image, human region candidate frame and standard physical frame, in advance Survey t '_x, t '_y, t '_w, t '_hSo that ∑ (t_i-t′_i)²It is as small as possible, wherein i ∈ { x, y, w, h }.As ∑ (t_i-t′_i)²When convergence, The third nerve network training is completed, then, the t ' exported using trained neural network_x, t '_y, t '_w, t '_hTo calculate The center (x ', y ') of revised human region candidate frame and width w ' height h '：

X '=t '_xw_a+ xa, y '=t '_yh_a+y_a

The lower-left angular coordinate of revised human region candidate frame is (x₀',y₀'), upper right angular coordinate is (x₁',y₁').That Use following formula coordinates computed value:

x′₀- w '/2=x '

x′₁+ w '/2=x '

y′₀- h '/2=y '

y′₁-y′+h′/2

The embodiment of the present invention can effectively improve human region time by being modified to the human region candidate frame of output The identification for selecting frame improves the accuracy of detection.

Fig. 3 shows human body detecting method flow chart 300 according to an embodiment of the present invention, referring to Fig. 3, human body detecting method It may comprise steps of.

In step S301, feature extraction is carried out to image, to obtain characteristics of image.

In step s 302, it is based on characteristics of image, the number of people region of human body in detection image；

In step S303, based on the testing result in number of people region, the human body corresponding with number of people region in image is determined Region, human region include the number of people region and body region.

In step s 304, non-maximum value pressure is carried out to number of people region in image and human region corresponding with number of people region Suppression post-processing, to obtain the human body detected in image.

Wherein, step S301-S303 is identical as the step S101-S103 in previous embodiment respectively, and details are not described herein, Referring in particular to previous embodiment.

In step s 304, as an example, multiple number of people regions ought detected in step s 302, in step S303 When determining that the corresponding human region of people's head region is multiple, it may further determine that multiple human bodies corresponding with number of people region Fractional value in each of region.For example, as previously mentioned, determining each human region using trained nervus opticus network Fractional value.Then the highest human region of fractional value is determined as human region corresponding with number of people region.

In pedestrian's detection process, it is likely that detect two overlapping number of people regions together, in order to reduce rate of false alarm, It can be handled as follows.Assuming that the number of people region detected is at least two, including the first head region and the second number of people area Domain.When the first head region and the second number of people region have overlapping, overlapping region and the first head region and the second people are determined The ratio of head region union refion.When ratio is greater than the first fractional threshold, corresponding first human body of the first head region is obtained Corresponding second human region in region and the second number of people region.Then compare the first human region and the second human region Fractional value, when the fractional value of the first human region be greater than the second human region fractional value when, determine the first human region and The first head region corresponding with first human region is the human region eventually detected and number of people region.That is, When there is overlapping number of people frame, the human region for only keeping score high and corresponding number of people region.

The image detecting method of the embodiment of the present invention detects the number of people region in image in image to be detected first, so Afterwards based on number of people region, pedestrian is detected near the position of someone's head region, to improve the accuracy rate of pedestrian detection, together When reduce the region detected, relative to traditional detection method, detect speed and have not equal to a certain degree mention It rises.This method is capable of providing the corresponding informance of people and head position simultaneously, provides more information for subsequent various demands.

Fig. 4 shows 400 architecture diagram of image detection device according to an embodiment of the present invention.Image detection device 400 is with before The image detecting method for stating embodiment is corresponding, in order to illustrate the succinct of book, is only briefly described herein, referring specifically to aforementioned implementation The description of example.

Referring to fig. 4, which includes：Memory 401, processor 402, memory 401 store program and refer to It enables, processor 402 is executed when processing routine instructs：Feature extraction is carried out to image, to obtain characteristics of image；Based on image spy It levies, the number of people region of human body in detection image；Based on the testing result in number of people region, determine corresponding with number of people region in image Human region, human region includes number of people region and body region.

For example, being based on characteristics of image, the number of people region of human body includes in detection image：Input an image into first nerves net In network, first nerves network is used to extract the number of people region in image；From in first nerves network export image at least one Personal head region candidate frame.

For example, being based on characteristics of image, the number of people region of human body further includes in detection image：It is exported from first nerves network A possibility that score of each of at least one number of people region candidate frame of image, fraction representation region is number of people region；It will divide Number is compared with default number of people score threshold；The number of people region that score is greater than default number of people score threshold is determined as number of people area Domain.

For example, the testing result based on number of people region, determines that the human region corresponding with number of people region in image includes： Obtain the relative position parameter of the number of people and body that obtain based on machine learning；Depending on the relative position parameter determine in image with At least one corresponding human region candidate frame of number of people region.

For example, the testing result based on number of people region, determines that the human region corresponding with number of people region in image also wraps It includes：Default estimation parameter is obtained, at least one human region corresponding with number of people region in estimation parameter list diagram picture is preset and waits Select the number of frame；Based on default estimation parameter, the human region candidate frame of number corresponding with number of people region is determined in the picture.

For example, the testing result based on number of people region, determines that the human region corresponding with number of people region in image also wraps It includes：Determine that the fractional value of each human region candidate frame, fractional value indicate that the human region candidate frame extracted is human region Possibility；Based on fractional value, select at least one as human region from the human region candidate frame of number.

For example, determining that the fractional value of each human region candidate frame includes：The human region candidate frame image of number is defeated Enter to trained nervus opticus network, nervus opticus network is for determining the fractional value of each human region candidate frame image； The corresponding fractional value of each human region candidate frame is exported from nervus opticus network.

For example, the testing result based on number of people region, determines that the human region corresponding with number of people region in image also wraps It includes：Human region candidate frame is modified, to obtain revised human region.

For example, by human region candidate frame be modified including：Human region candidate frame image is input to third nerve In network, third nerve network is for correcting human region candidate frame；It is candidate that each human region is exported from third nerve network The correction result of frame.

For example, including from the correction result that third nerve network exports each human region candidate frame：Third nerve network The original area of human region candidate frame in the picture is determined based on human region candidate frame；

Determine whether original area is complete；When original area is imperfect, the corresponding human region of the original area is waited Frame is selected to be modified.

For example, when original area is imperfect, to the corresponding human region candidate frame of the original area be modified including： Obtain multiple standard physical frames of trained third nerve network output；By human region candidate frame and multiple standard physical frames It is matched；Using the highest standard physical frame of matching rate as the amendment frame of corresponding human body region candidate frame；Based on amendment frame To be modified to corresponding human region candidate frame.

For example, modified parameter includes regional center point position, peak width, at least one of region height.

For example, device further includes：Non- maximum is carried out to number of people region in image and human region corresponding with number of people region Value constrains post-processing, to obtain the human body detected in image.

For example, after constraining to number of people region in image and the non-maximum value of human region corresponding with number of people region progress It manages, includes to obtain the human body detected in image：When the corresponding human region in any of number of people region is multiple, determine Fractional value in each of multiple human regions corresponding with number of people region；The highest human region of fractional value is determined as and people The corresponding human region of head region.

For example, multiple number of people regions include the first head region and the second number of people region, to figure when number of people region is multiple Number of people region and human region corresponding with number of people region carry out the oppressive post-processing of non-maximum value as in, are detected with obtaining in image Human body the step of further include：When the first head region and the second number of people region have overlapping, overlapping region and first are determined The ratio in number of people region and the second number of people region union refion；When ratio is greater than the first fractional threshold, the first Head Section is obtained Corresponding second human region of corresponding first human region in domain and the second number of people region；Compare the first human region and The fractional value of two human regions determines first when the fractional value of the first human region is greater than the fractional value of the second human region Human region and the first head region corresponding with first human region are the human region eventually detected and number of people area Domain.

The image detection device of the embodiment of the present invention, first from the number of people region detected in image to be detected in image, so Afterwards based on number of people region, pedestrian is detected near the position of someone's head region, relative to traditional detection device, detection speed Degree has the promotion not waited to a certain degree.The device is capable of providing the corresponding informance of people and head position simultaneously, is subsequent various need It asks and provides more information.

In addition, additionally providing a kind of executable non-volatile memories of computer according at least one embodiment of the disclosure Medium, the medium is corresponding with the memory 401 in the image detection device 400 in previous embodiment, in order to illustrate the succinct of book, It is only briefly described below, referring specifically to the description of previous embodiment.Program instruction, journey are stored in the non-volatile memory medium The step of sequence instruction is loaded by the processor of computer and executes method described in above-described embodiment.

In addition, a kind of image detection device neural network based is additionally provided according at least one embodiment of the disclosure, The device is corresponding with the method for previous embodiment, in order to illustrate the succinct of book, is only briefly described below.Fig. 5 shows basis The image detection device 500 of the embodiment of the present disclosure.Referring to Fig. 5, image detection device 500 includes：Acquiring unit 501, detection are single Member 502, determination unit 503.For example, acquiring unit 501 is configured as carrying out feature extraction to image, to obtain characteristics of image. Detection unit 502 is configured as being based on characteristics of image, the number of people region of human body in detection image.Determination unit 503 is configured as Based on the testing result in number of people region, the human region corresponding with number of people region in image is determined, human region includes the number of people Region and body region.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two.And software module can be set In any form of computer storage medium.In order to clearly illustrate the interchangeability of hardware and software, in the above description Each exemplary composition and step are generally described according to function.These functions are come actually with hardware or software mode It executes, specific application and design constraint depending on technical solution.Those skilled in the art can specifically answer each For using different methods to achieve the described function, but such implementation should not be considered as beyond the scope of the present invention.

Various repair is carried out to the present invention it should be appreciated by those skilled in the art that can be dependent on design requirement and other factors Change, combine, partially combining and replacing, as long as they are in the range of the appended claims and its equivalent.

Claims

1. a kind of image detecting method neural network based, the method includes：

Feature extraction is carried out to described image, to obtain characteristics of image；

Based on described image feature, the number of people region of human body in described image is detected；

Based on the testing result in the number of people region, the human region corresponding with the number of people region in described image is determined, The human region includes the number of people region and body region.

2. detecting the number of people of human body in described image according to the method described in claim 1, wherein, being based on described image feature The step of region includes：

Described image is input in first nerves network, the first nerves network is used to extract the number of people region in image；

From at least one number of people region candidate frame exported in the first nerves network in described image.

3. detecting the number of people of human body in described image according to the method described in claim 2, wherein, being based on described image feature The step of region further includes：

The score of each of at least one number of people region candidate frame of described image is exported from the first nerves network, it is described A possibility that region described in fraction representation is number of people region；

The score is compared with default number of people score threshold；

The number of people region that score is greater than the default number of people score threshold is determined as number of people region.

4. according to the method described in claim 1, wherein, the testing result based on the number of people region determines the figure As in human region corresponding with the number of people region the step of include：

Obtain the relative position parameter of the number of people and body that obtain based on machine learning；

Parameter determines at least one human region corresponding with the number of people region in described image depending on that relative position Candidate frame.

5. according to the method described in claim 4, wherein, the testing result based on the number of people region determines the figure As in human region corresponding with the number of people region the step of further include：

Obtain default estimation parameter, at least one people corresponding with the number of people region in the default estimation parameter list diagram picture The number of body region candidate frame；

Based on the default estimation parameter, the human body area of the number corresponding with the number of people region is determined in described image Domain candidate frame.

6. according to the method described in claim 5, wherein, the testing result based on the number of people region determines the figure As in human region corresponding with the number of people region the step of further include：

Determine that the fractional value of each human region candidate frame, the fractional value indicate that the human region candidate frame of the extraction is people A possibility that body region；

Based on the fractional value, select at least one as the human region from the human region candidate frame of the number.

7. according to the method described in claim 6, wherein it is determined that the step of fractional value of each human region candidate frame include：

The human region candidate frame image of the number is input to trained nervus opticus network, the nervus opticus network For determining the fractional value of each human region candidate frame image；

The corresponding fractional value of each human region candidate frame is exported from the nervus opticus network.

8. according to the method described in claim 4, wherein, the testing result based on the number of people region determines the figure As in human region corresponding with the number of people region the step of further include：

The human region candidate frame is modified, to obtain revised human region.

9. according to the method described in claim 8, wherein, the step of human region candidate frame is modified, includes：

The human region candidate frame image is input in the third nerve network, the third nerve network is for correcting The human region candidate frame；

10. according to the method described in claim 9, wherein, being waited from each human region of third nerve network output The step of selecting the correction result of frame include：

The third nerve network determines the human region candidate frame in described image based on the human region candidate frame Original area；

Determine whether the original area is complete；

When the original area is imperfect, the corresponding human region candidate frame of the original area is modified.

11. according to the method described in claim 10, wherein, when the original area is imperfect, being corresponded to the original area Human region candidate frame the step of being modified include：

Obtain multiple standard physical frames of the trained third nerve network output；

The human region candidate frame is matched with the multiple standard physical frame；

Using the highest standard physical frame of matching rate as the amendment frame of corresponding human body region candidate frame；

The corresponding human region candidate frame is modified based on the amendment frame.

12. according to the method for claim 11, wherein the modified parameter includes：

Regional center point position, peak width, at least one of region height.

13. according to the method described in claim 1, wherein, the method also includes：

After constraining to number of people region described in described image and the non-maximum value of human region corresponding with number of people region progress Processing, to obtain the human body detected in described image.

14. according to the method for claim 13, wherein to number of people region described in described image and with the number of people region Corresponding human region carries out non-maximum value and constrains post-processing, and to obtain the human body detected in described image the step of includes：

When the corresponding human region in any of the number of people region is multiple, determination is corresponding more with the number of people region The fractional value in each of a human region；

The highest human region of fractional value is determined as human region corresponding with the number of people region.

15. according to the method for claim 14, wherein the number of people region be it is multiple, the multiple number of people region includes The first head region and the second number of people region,

It is described that non-maximum value pressure is carried out to number of people region described in described image and human region corresponding with the number of people region Suppression post-processing, to obtain the human body detected in described image the step of further include：

When the first head region and the second number of people region have overlapping, the overlapping region and the first head region are determined With the ratio of the second number of people region union refion；

When the ratio is greater than the first fractional threshold, corresponding first human region of the first head region and second are obtained Corresponding second human region in number of people region；

The fractional value for comparing first human region and second human region, when the fractional value of the first human region Greater than the second human region fractional value when, determine first human region and with first human region corresponding first Number of people region is the human region eventually detected and number of people region.

16. a kind of image detection device neural network based, including：Memory, processor, the memory storage program refer to It enables, the processor is executed when handling described program instruction：

17. device according to claim 16, wherein be based on described image feature, detect the people of human body in described image Head region includes：

18. device according to claim 17, wherein be based on described image feature, detect the people of human body in described image Head region further includes：

The score is compared with default number of people score threshold；

19. device according to claim 16, wherein the testing result based on the number of people region, determine described in Human region corresponding with the number of people region in image includes：

20. device according to claim 19, wherein the testing result based on the number of people region, determine described in Human region corresponding with the number of people region in image further includes：

21. device according to claim 20, wherein the testing result based on the number of people region, determine described in Human region corresponding with the number of people region in image further includes：

22. device according to claim 21, wherein the fractional value for determining each human region candidate frame includes：

23. device according to claim 19, wherein the testing result based on the number of people region, determine described in Human region corresponding with the number of people region in image further includes：

The human region candidate frame is modified, to obtain revised human region.

24. device according to claim 23, wherein by the human region candidate frame be modified including：

25. device according to claim 24, wherein waited from each human region of third nerve network output The correction result for selecting frame includes：

Determine whether the original area is complete；

26. device according to claim 25, wherein corresponding to the original area when the original area is imperfect Human region candidate frame be modified including：

27. device according to claim 26, wherein the modified parameter includes：

Regional center point position, peak width, at least one of region height.

28. device according to claim 16, wherein described device further includes：

29. device according to claim 28, wherein to number of people region described in described image and with the number of people region Corresponding human region carries out non-maximum value and constrains post-processing, includes to obtain the human body detected in described image：

30. device according to claim 29, wherein the number of people region be it is multiple, the multiple number of people region includes The first head region and the second number of people region,

31. a kind of image detection device neural network based, described device include：

Acquiring unit is configured as carrying out feature extraction to described image, to obtain characteristics of image；

Detection unit is configured as detecting the number of people region of human body in described image based on described image feature；

Determination unit is configured as the testing result based on the number of people region, determine in described image with the number of people area The corresponding human region in domain, the human region include the number of people region and body region.

32. non-volatile memory medium can be performed in a kind of computer, program instruction is stored in the medium, described program instructs quilt The step of processor of the computer loads and executes any one of the claims 1-15 the method.