CN110310301A

CN110310301A - A kind of method and device detecting target image

Info

Publication number: CN110310301A
Application number: CN201810258574.7A
Authority: CN
Inventors: 白博; 朱博; 毛坤
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-03-27
Filing date: 2018-03-27
Publication date: 2019-10-08
Anticipated expiration: 2038-03-27
Also published as: WO2019184604A1; CN110310301B

Abstract

This application discloses a kind of method and devices for detecting target image, belong to the communications field.The described method includes: carrying out the fisrt feature picture that convolution algorithm obtains to the picture to be detected by obtaining the corresponding foreground moving image of picture to be detected and obtaining；It detects the target image in the fisrt feature picture and obtains the first candidate frame configuration information set, the first candidate frame configuration information set includes the configuration information of each candidate frame at least one candidate frame；The configuration information for filtering the candidate frame that the target image for including is Partial image image from the first candidate frame configuration information set according to the foreground moving image, obtains the second candidate frame configuration information set；According to the second candidate frame configuration information set, detection block is added in the picture to be detected, includes at least one target image in the picture to be detected in the detection block.The application can be improved detection accuracy.

Description

A kind of method and device detecting target image

Technical field

This application involves the communications field, in particular to a kind of method and device for detecting target image.

Background technique

Along with the construction of safe city, a large amount of monitoring cameras are arranged at present, these monitoring cameras are for shooting Monitor video.For the monitor video of each monitoring camera shooting, need to detect the mesh in every frame picture in monitor video Logo image, target image can be the human body image being kept in motion in monitor video and/or vehicle image etc..Picture is held After the detection operation of row target image, the human body image being kept in motion or vehicle image etc. are framed in the picture using rectangle frame Target image tracks someone or tracks etc. to some vehicle in order to subsequent.

The target image in picture can be detected in the following way at present, it can be with are as follows: picture is input to convolutional Neural Network (Convolutional Neural Network, CNN), which obtains first after multiple convolution operation in CNN Feature image and second feature picture, the convolution algorithm number that fisrt feature picture passes through are less than the volume that second feature picture passes through Product operation times.Fisrt feature picture is input to region candidate network (Region Proposal Network, RPN), is passed through RPN obtains location information and confidence score of each candidate frame at least one candidate frame in fisrt feature picture, Each candidate circle lives a target image in one feature image, and the location information of candidate frame includes that a pair of the candidate frame is diagonal The position of point, the state for the target image that the confidence score of candidate frame is used to indicate that the candidate circle is lived are the general of motion state Rate.According to the location information and second feature picture of each candidate frame in the maximum N number of candidate frame of confidence score, in the picture The middle type for adding the target image for including in each corresponding detection block of candidate frame and the detection block.

During realizing the application, the inventor finds that the existing technology has at least the following problems:

The target image lived in the maximum N number of candidate frame of confidence score there are part candidate circle is not complete target figure Part human body image or Some vehicles image may be framed in picture, such as some candidate frames, candidate frame exists according to the part in this way The target image in detection block added in picture is also incomplete target image, reduces detection accuracy.

Summary of the invention

In order to improve detection accuracy, the embodiment of the present application provides a kind of method and device for detecting target image.It is described Technical solution is as follows:

In a first aspect, the method is by obtaining mapping to be checked this application provides a kind of method for detecting target image The corresponding foreground moving image of piece and acquisition carry out the fisrt feature picture that convolution algorithm obtains, institute to the picture to be detected Stating foreground moving image includes the target image being kept in motion in the picture to be detected and in addition to the target image Background image；It detects the target image in the fisrt feature picture and obtains the first candidate frame configuration information set, described first Candidate frame configuration information set includes the configuration information of each candidate frame at least one candidate frame, in the fisrt feature figure Include at least one target image in each candidate frame described in piece, target image in the fisrt feature picture and it is described to The target image detected in picture is identical；According to the foreground moving image from the first candidate frame configuration information set mistake The target image that filter includes is the configuration information of the candidate frame of Partial image image, obtains the second candidate frame configuration information collection It closes；According to the second candidate frame configuration information set, detection block is added in the picture to be detected, is wrapped in the detection block Include at least one target image in the picture to be detected.Due to filtering out from the first candidate frame configuration information set including non- The configuration information of the candidate frame of complete object object obtains the second candidate frame configuration information set, so according to the second candidate frame Configuration information is integrated into picture to be detected and adds detection block, and detection accuracy can be improved.

In a kind of possible implementation of first aspect, by carrying out mixed Gaussian background to the picture to be detected Modeling, obtains the corresponding foreground moving image of the picture to be detected.In this way can by the foreground moving image, with realize from The configuration information of the candidate frame including Partial image object is filtered out in first candidate frame configuration information set.

In a kind of possible implementation of first aspect, according to the foreground moving image, the prospect fortune is calculated The corresponding integrogram of motion video；The target for including is filtered from the first candidate frame configuration information set according to the integrogram Image is the configuration information of the candidate frame of Partial image image.Due to obtaining the corresponding integrogram of foreground moving image, in this way The configuration information in the first candidate frame configuration information set can be filtered according to integrogram, and filtering velocity can be improved by integrogram Degree, and then improve detection efficiency.

In a kind of possible implementation of first aspect, according to the configuration information of target candidate frame, the mesh is obtained Candidate frame corresponding integral graph region in the integrogram is marked, the configuration information of the target candidate frame is first candidate The configuration information of any one candidate frame in frame configuration information set；It is calculated according to the integral graph region and is located at the target marquis Select the ratio between the area of the target image area and the target candidate frame in frame；It is less than default ratio threshold in the ratio When value, the configuration information of the target candidate frame is filtered from the first candidate frame configuration information set.Due to getting mesh The corresponding integral graph region of candidate frame is marked, can reduce the calculation amount for calculating the ratio according to the integral graph region, to improve Calculating speed.

In a kind of possible implementation of first aspect, four vertex positions for being located at the integral graph region are obtained Pixel integrated value；According to the integrated value of each pixel of the acquisition, the mesh being located in the target candidate frame is calculated Logo image area；According to the configuration information of the target candidate frame, the area of the target candidate frame is calculated；Calculate the target Ratio between image area and the area of the target candidate frame.Wherein, the integrated value of the pixel of position is pushed up according to four, It is smaller to calculate calculation amount required for target image area, so as to quickly calculate target image area, improves calculating Efficiency.

In a kind of possible implementation of first aspect, according to the configuration information of target candidate frame, the mesh is obtained Candidate frame corresponding image-region in the foreground moving image is marked, the configuration information of the target candidate frame is described first The configuration information of any one candidate frame in candidate frame configuration information set；It is calculated according to described image region and is located at the target Target image area in candidate frame and the ratio between the area of the target candidate frame；It is less than default ratio in the ratio When threshold value, the configuration information of the target candidate frame is filtered from the first candidate frame configuration information set.In this way according to figure As region can then determine whether to filter out target candidate frame, simplifies scheme and realize logic.

In a kind of possible implementation of first aspect, the pixel for belonging to target image in described image region is counted Total pixel number of point number and described image region；It calculates between the pixel number and total pixel number Ratio obtains the ratio between the target image area being located in the target candidate frame and the area of the target candidate frame.

In a kind of possible implementation of first aspect, obtains and the picture progress convolution algorithm to be detected is obtained Second feature picture, to the fisrt feature picture carry out convolution algorithm number be less than to the second feature picture carry out The number of convolution algorithm；According to the second feature picture and the second candidate frame configuration information set, described to be detected The type of detection block and the target image in the detection block is added in picture.Due to the second candidate frame configuration information set by The configuration information of a large amount of candidate frame is filtered out, in this way according to the second candidate frame configuration information set, is added in picture to be detected It can reduce operand when detection block, and then improve detection efficiency.

Second aspect, this application provides a kind of devices for detecting target image, for executing first aspect or first party Method in the possible implementation of any one of face.Specifically, described device includes for executing first aspect or first The module of the method for the possible implementation of any one of aspect.

The third aspect, this application provides a kind of devices for detecting target image, and described device includes: at least one processing Device；And at least one processor；At least one processor is stored with one or more programs, one or more of programs It is configured to be executed by least one described processor, one or more of programs include for carrying out first aspect or first The instruction of the method for the possible implementation of any one of aspect.

Fourth aspect, this application provides a kind of devices for detecting target image, and described device includes transceiver, processor And memory.Wherein, can be connected by bus system between the transceiver, the processor and the memory.Institute It states memory and is used to execute program, instruction or the generation in the memory for storing program, instruction or code, the processor Code completes the method in any possible implementation of first aspect or first aspect.

5th aspect, this application provides a kind of computer program product, the computer program product is included in calculating The computer program stored in machine readable storage medium storing program for executing, and the calculation procedure loaded by processor it is above-mentioned to realize The method of any possible implementation of first aspect or first aspect.

6th aspect, this application provides a kind of non-volatile computer readable storage medium storing program for executing, for storing computer journey Sequence, the computer program are loaded to execute any possible reality of above-mentioned first aspect or first aspect by processor The instruction of the method for existing mode.

7th aspect, the application propose embodiment and have supplied a kind of chip, and the chip includes programmable logic circuit and/or journey Sequence instruction, when the chip operation when for realizing above-mentioned first aspect or first aspect any possible implementation side Method.

Detailed description of the invention

Fig. 1 is a kind of network architecture schematic diagram provided by the embodiments of the present application；

Fig. 2-1 is a kind of method flow diagram for detecting target image provided by the embodiments of the present application；

Fig. 2-2 is the module map of RPN device provided by the embodiments of the present application；

Fig. 2-3 is the schematic diagram of RPN device addition sliding window provided by the embodiments of the present application；

Fig. 2-4 is the method flow diagram of filtering configuration information provided by the embodiments of the present application；

Fig. 2-5 is the schematic diagram of integral graph region provided by the embodiments of the present application；

Fig. 2-6 is the method flow diagram of another filtering configuration information provided by the embodiments of the present application；

Fig. 2-7 is the module map of Fast Rcnn device provided by the embodiments of the present application；

Fig. 2-8 is the software system module figure of detection target image provided by the embodiments of the present application；

Fig. 3-1 is a kind of apparatus structure schematic diagram for detecting target image provided by the embodiments of the present application；

Fig. 3-2 is the apparatus structure schematic diagram of another detection target image provided by the embodiments of the present application；

Fig. 3-3 is the apparatus structure schematic diagram of another detection target image provided by the embodiments of the present application；

Fig. 4 is the apparatus structure schematic diagram of another detection target image provided by the embodiments of the present application.

Specific embodiment

The application embodiment is described in further detail below in conjunction with attached drawing.

Referring to Fig. 1, the embodiment of the present application provides a kind of network architecture, comprising:

Picture pick-up device and server, establishing between picture pick-up device and server has network connection, which can be Wireless connection or wired connection.

Picture pick-up device may be mounted at the places such as market and road, and for shooting picture, the figure of shooting is sent to server Piece.

Optionally, which can be applied to the scenes such as video monitoring, for example, under video monitoring scene, camera shooting Equipment can shoot to obtain the picture of a frame frame, and the picture of shooting can be sent to server.

It wherein, include that the foreground moving image being kept in motion and processing are static in the picture that picture pick-up device is shot The background image of state.The foreground moving image being kept in motion can be the human body image and/or vehicle being kept in motion Image etc., the background image to remain static can be building image, Tree image and/or the vehicle to remain static Image etc..

When picture pick-up device shoots to obtain the picture of a frame frame, the target image in the picture, target image can detecte It can be one or more of the foreground moving image being kept in motion in the picture.In the target image for detecting picture When, the target image also is framed using detection block in the picture simultaneously, some target is tracked convenient for subsequent in this way.Example Such as, when target is someone or some vehicle, can according to each picture of addition detection block, to the someone or to this some Vehicle tracks etc..

Optionally, it for the treatment process of the target image in above-mentioned detection picture, can be executed by picture pick-up device, i.e., Picture pick-up device can execute the treatment process for detecting the target image in the picture after shooting obtains a frame frame picture.

Wherein, in order to improve the detection efficiency of picture pick-up device, higher computing resource, the meter can be configured for picture pick-up device Calculating resource can be central processing unit (Central Processing Unit, CPU), graphics processor (Graphics At least one of Processing Unit, GPU) and the resources such as memory size.

Optionally, for the treatment process of the target image in above-mentioned detection picture, picture pick-up device can not be executed, but It can be executed by server, i.e. for server after the frame frame picture for receiving picture pick-up device transmission, can execute detection should The treatment process of target image in picture；Alternatively, server reads a frame picture from memory, and executes and detect the picture In target image treatment process, the picture in memory can be the picture of picture pick-up device camera shooting.

Wherein, server can first store the picture in memory when receiving the picture of picture pick-up device transmission. Picture pick-up device can be the equipment such as monitor camera or the mobile phone with camera.

Referring to fig. 2-1, the embodiment of the present application provides a kind of method for detecting target image, and this method can be applied to figure The executing subject of the network architecture that embodiment shown in 1 provides, this method can be the picture pick-up device or clothes in the network architecture Business device etc., comprising:

Step 201: obtaining the corresponding foreground moving image of picture to be detected and obtain to picture to be detected progress convolution The fisrt feature picture that operation obtains, foreground moving image include the target image being kept in motion in picture to be detected and remove Background image outside target image.

Picture to be detected can be any picture in the video of picture pick-up device shooting.When the executing subject of the present embodiment When for picture pick-up device, picture pick-up device, can be using the picture as picture to be detected when taking a frame picture.Work as the present embodiment Executing subject be server when, server receive picture pick-up device transmission a frame picture when, can using the picture as Picture to be detected；Alternatively, server reads a frame picture as picture to be detected from memory.Wherein, server reception is taken the photograph As equipment send picture when, can by the picture store in memory.

Foreground moving image corresponding for picture to be detected, can be by carrying out mixed Gaussian background to picture to be detected Modeling processing, obtains the corresponding foreground moving image of picture to be detected.

In the embodiment of the present application, mixed Gauss model device and the convolutional neural networks based on fast area are preset (Fast Region-based Convolution Neural Network, Fast Rcnn) device, Fast Rcnn device packet Include CNN.In this step, picture to be detected can be separately input to mixed Gauss model device and Fast Rcnn device In CNN；Then, mixed Gaussian background modeling processing is carried out to picture to be detected by mixed Gauss model device, obtained to be checked The corresponding foreground moving image of mapping piece carries out convolution algorithm processing to picture to be detected by CNN, obtains picture pair to be detected The fisrt feature picture answered.

The corresponding foreground moving image of picture to be detected is a black and white picture, each pixel in foreground moving image Pixel value be 1 or be 0.

The corresponding foreground moving image of picture to be detected is the picture of the sizes such as the size of a size and picture to be detected. For each pixel of picture to be detected, there are corresponding pixels in foreground moving image for the pixel.If to be checked A pixel in mapping piece is the pixel in the target image being kept in motion in picture to be detected, then the pixel The pixel value of corresponding pixel is 1 in the corresponding foreground moving image of picture to be detected.If one in picture to be detected A pixel is the pixel in the background image to remain static in picture to be detected, then the pixel is in picture to be detected The pixel value of corresponding pixel is 0 in corresponding foreground moving image.In the present embodiment, target image may be to be detected Human body image and/or vehicle image in picture etc..

Optionally, to picture to be detected carry out mixed Gaussian background modeling processing operation, can be divided into following 2011 to 2014 operation:

2011: the blank foreground moving image of the sizes such as one size of creation and the size of picture to be detected.

2012: from the pixel value of the pixel read in picture to be detected in picture to be detected, which includes the channel R Pixel value, the pixel value in the channel G and the pixel value of channel B, then as follows (1) calculate the pixel belong in fortune The probability of the target image of dynamic state.

Wherein, in above-mentioned formula (1), P (x_j) be picture to be detected in j-th of pixel probability, the probability is just It is the probability that j-th of pixel belongs to the target image being kept in motion, x_jFor the pixel value of j-th of pixel, x_j= [x_jRx_jGx_jB], x_jRFor the pixel value in the channel R, x_jGFor the pixel value in the channel G, x_jBFor the pixel value of channel B；T is mapping to be checked At the time of piece corresponds to, the frame number of picture to be detected can be used as moment t when realizing,It indicates in moment t mixed Gaussian mould The estimated value of the weight coefficient of i-th of Gaussian Profile in type device,WithIt is illustrated respectively in moment t mixed Gauss model dress The mean vector and covariance matrix (it is assumed herein that the red, green, blue component of pixel is mutually indepedent) of i-th of Gaussian Profile in setting；η Indicate Gaussian Profile probability density function.

K is default value, before calculating the probability of the pixel using formula (1), presets mixed Gauss model device According to j-th of pixel in the probability of j-th of pixel in the picture at the 0th moment having calculated that, the picture at the 1st moment Probability ... the t-1 moment picture in j-th of pixel probability, obtain the power of K Gaussian Profile in moment t The estimated value of coefficient, the mean vector and K covariance matrix of K Gaussian Profile.

Wherein, the estimated value of the weight coefficient of the K Gaussian Profile is respectivelyIt is this K high This distribution mean vector be respectivelyThe K covariance matrix be respectively

2013: if calculated probability is greater than predetermined probabilities threshold value, it is determined that the pixel is in picture to be detected Pixel in the target image of motion state, according to position of the pixel in picture to be detected, in the prospect of creation The pixel that filler pixels value is 1 in moving image.

2014: if calculated probability is less than or equal to predetermined probabilities threshold value, it is determined that the pixel is mapping to be checked The pixel in background image to remain static in piece is being created according to position of the pixel in picture to be detected Foreground moving image in filler pixels value be 0 pixel.For each pixel in picture to be detected, in a manner described The pixel is filled in the corresponding pixel of foreground moving image of creation, obtains the corresponding foreground moving figure of picture to be detected Picture.

Wherein, CNN includes multiple convolutional layers, and first convolutional layer is for rolling up the picture to be detected for being input to CNN Product calculation process.The input of other each convolutional layers in CNN in addition to first convolutional layer is its adjacent upper convolution The output of layer, the result that other each convolutional layers are used for the upper convolutional layer output adjacent to its carry out convolution algorithm processing.

The result of each convolutional layer output in CNN is the corresponding feature image of picture to be detected, for each volume Lamination, the level of abstraction of the feature image of convolutional layer output are greater than the spy that a upper convolutional layer adjacent with the convolutional layer exports Levy the level of abstraction of picture.

In this step, the process that CNN carries out convolution algorithm processing to picture to be detected can be with are as follows: picture to be detected is defeated Enter into CNN, first in CNN convolutional layer carries out process of convolution to picture to be detected, obtains the corresponding spy of picture to be detected Picture is levied, and this feature picture is input to second convolutional layer.Second convolutional layer carries out convolution algorithm to this feature picture Processing still obtains the corresponding feature image of picture to be detected, and the level of abstraction of this feature picture is greater than first convolution The level of abstraction of the feature image of layer output, is input to third convolutional layer for this feature picture.By the above process until CNN's Until when the last one convolutional layer exports picture to be detected corresponding feature image.

In this step, the feature image of the picture to be detected of first object convolutional layer output is obtained as fisrt feature figure Piece, first object convolutional layer are other convolutional layers in CNN in addition to first convolutional layer and the last one convolutional layer.

Optionally, it can choose a convolutional layer positioned at the middle position CNN as first object convolutional layer, obtain first The corresponding feature image of picture to be detected of target convolutional layer output is as fisrt feature picture.

Optionally, in this step, the second feature picture that process of convolution is carried out to picture to be detected can also be obtained, the The convolution algorithm number that one feature image is passed through is less than the convolution algorithm number that second feature picture passes through.

Optionally, it can choose a convolutional layer positioned at the position rearward CNN as the second target convolutional layer, obtain second The corresponding feature image of picture to be detected of target convolutional layer output is as second feature picture, where the second target convolutional layer The number of plies is greater than the number of plies where first object convolutional layer.

Optionally, a convolutional layer of the position rearward so-called CNN, it can certain in selection CNN in last N number of convolutional layer A convolutional layer is as the second target convolutional layer.N is default value, for example, N can be equivalent for numerical value 5,4,3,2 or 1.

Optionally, it can choose the last layer convolutional layer in CNN as the second target convolutional layer, i.e., by the last of CNN The corresponding feature image of picture to be detected of one layer of convolutional layer output is as second feature picture.

Step 202: the target image in detection fisrt feature picture obtains the first candidate frame configuration information set, the first marquis Selecting frame configuration information set includes the configuration information of each candidate frame at least one candidate frame.

It wherein, include at least one target image in each candidate frame in fisrt feature picture, in fisrt feature picture Target image it is identical as the target image in picture to be detected.

The configuration information of candidate frame includes at least the location information and confidence score of candidate frame.The location information of candidate frame can With include the candidate frame a pair of of angle steel joint position, which can be on any one diagonal line of candidate frame Two angle steel joints, the position of angle steel joint can be position of the angle steel joint in fisrt feature picture；Alternatively, the position of candidate frame Information may include the position on a vertex of the candidate frame and the size of the candidate frame, which can be appointing for the candidate frame One vertex, the position on the vertex are position of the vertex in fisrt feature picture, and the size of the candidate frame may include this The width and height of candidate frame.

Optionally, candidate frame can be rectangle frame, and the confidence score of candidate frame can indicate the target object in candidate frame State be motion state probability.

Optionally, in the embodiment of the present application, RPN device is preset.In this step, fisrt feature picture can be inputted To RPN device, fisrt feature picture is handled by the RPN device, obtains each candidate at least one candidate frame The configuration information of each candidate frame is formed the first candidate frame configuration information set by the configuration information of frame.

Wherein, it should be understood that may include at least one target image in fisrt feature picture, target image can be with For human body image and/or vehicle image etc..RPN device passes through RPN device in the fisrt feature picture for receiving input The Propoasls layers of candidate frame added in fisrt feature picture for framing target image obtain the candidate frame in the first spy It levies the location information in picture and estimates the confidence score for the probability that the state for indicating the target image is motion state, To obtain the configuration information of the candidate frame.

The module map of RPN device shown in -2 referring to fig. 2, when fisrt feature picture is input to RPN device, RPN device exists Sliding window is added in fisrt feature picture, the position of mobile sliding window and the size for zooming in or out the sliding window obtain To multiple and different sliding windows, the feature vector of each sliding window is encoded out by convolutional layer, by full articulamentum according to The feature vector of each sliding window exports the location information of at least one candidate frame and the confidence score of at least one candidate frame.

Referring to figure 2-3, after adding sliding window in fisrt feature picture, pass through mobile sliding window and amplification or contracting The small sliding window obtains the confidence score of multiple candidate frames and each candidate frame of output.

In this step, there are a part of candidate frame in the candidate frame obtained, the target figure that includes in the part candidate frame Seem Partial image image, and further includes the biggish background image of area.

Step 203: the target figure for including is filtered from the first candidate frame configuration information set according to the foreground moving image As the configuration information of the candidate frame for Partial image image, the second candidate frame configuration information set is obtained.

There are many filter methods for realizing this step, for example, according to the integrogram of the foreground moving image to the first candidate Frame configuration information set is filtered；For another example being carried out according to the foreground moving image to the first candidate frame configuration information set Filtering.Other filter methods will not enumerate.

Referring to fig. 2-4, to the above-mentioned integrogram according to the foreground moving image to the first candidate frame configuration information set into The process of row filtering, can be completed by following 2031 to 2034 operation, be respectively as follows:

2031: according to the foreground moving image, calculating the corresponding integrogram of foreground moving image.

The equal sized blank integrogram for creating a size and the foreground moving image first, for the foreground moving Any one pixel in image, it is assumed that be the pixel of M row Nth column, the integrated value of the pixel can be by following public Formula (2) is calculated, and according to position of the pixel in the foreground moving image, the pixel is filled in the integrogram of creation The integrated value of point, i.e., fill the integrated value of the pixel at the position of the M row Nth column of the integrogram of creation.

In above-mentioned formula (2), Integral (M, N) is the integrated value of the pixel of M row Nth column, image (i, j) For the pixel value for the pixel that the i-th row jth in the foreground moving image arranges.

For other each pixels of the foreground moving image, filled in the integrogram of creation in a manner described each The integrated value of pixel obtains the corresponding integrogram of foreground moving image.

Since in foreground moving image, the pixel value of the pixel for the foreground moving image being kept in motion is 1, place In the background image of stationary state pixel pixel value be 0, so the integrated value of the pixel of M row Nth column can wait The area of the foreground moving image in an image-region in the foreground moving image, the image-region include prospect fortune The pixel of the first row first row in motion video and the pixel of M row Nth column, and the size of the image-region is M × N.

Next the target image for including can be filtered from the first candidate frame configuration information set according to the integrogram is The configuration information of the candidate frame of Partial image image, detailed implementation may include following 2032 to 2034 operation.

2032: according to the configuration information of target candidate frame, obtaining target candidate frame corresponding integrogram area in integrogram Domain, the configuration information of target candidate frame are the configuration information of any one candidate frame in the first candidate frame configuration information set.

Optionally, target candidate frame corresponding product in integrogram can be obtained according to the location information of target candidate frame Component region.

It is right according to a pair when the location information of target candidate frame includes the position of a pair of of angle steel joint of target candidate frame The position of each angle steel joint in angle point obtains the corresponding integral graph region of target candidate frame in integrogram.

When the location information of target candidate frame includes the position on a vertex of target candidate frame and the ruler of target candidate frame When very little, according to the position on a vertex and the size, the corresponding integral graph region of target candidate frame is obtained in integrogram.

For example, with reference to Fig. 2-5, it is assumed that the location information of target candidate frame includes the position of a pair of of angle steel joint of target candidate frame It sets, the position of one of angle steel joint is i-th₁Row jth₁Column, the position of another angle steel joint are i-th₂Row jth₂Column.According to this The position of two angle steel joints obtains the corresponding integral graph region of target candidate frame in the integrogram shown in Fig. 2-5.

2033: the target image area and target candidate frame being located in target candidate frame are calculated according to the integral graph region Ratio between area.

Optionally, for the realization of this step, the pixel of available four vertex positions positioned at the integral graph region The integrated value of point；According to the integrated value of each pixel of acquisition, the target image area being located in target candidate frame is calculated.

Wherein, referring to fig. 2-5, the pixel for integrating the left upper apex position of graph region is i-th₁Row jth₁The pixel of column, The pixel of bottom right vertex position is i-th₂Row jth₂The pixel of column, the pixel of bottom left vertex position are i-th₂Row jth₁Column, The pixel of right vertices position is i-th₁Row jth₂Column.According to the integrated value of four pixels, (3) are calculated as follows The target image area Area being located in target candidate frame out；

Area=Integral (i₂,j₂)-Integral(i₁,j₂)-Integral(i₂,j₁)+Integral(i₁, j₁)……(3)；

In above-mentioned formula (3), Integral (i₂,j₂) it is i-th₂Row jth₂The integrated value of the pixel of column, Integral (i₁,j₂) it is i-th₁Row jth₂The integrated value of the pixel of column, Integral (i₂,j₁) it is i-th₂Row jth₁The product of the pixel of column Score value, Integral (i₁,j₁) it is i-th₁Row jth₁The integrated value of the pixel of column.

And the configuration information according to the target candidate frame, calculate the area of the target candidate frame；Calculate target image face Ratio between the long-pending and area of target candidate frame.

Optionally, the area of the target candidate frame can be calculated according to the location information of target candidate frame.

It is right according to a pair when the location information of target candidate frame includes the position of a pair of of angle steel joint of target candidate frame The position of each angle steel joint in angle point calculates the area of the target candidate frame.

When the location information of target candidate frame includes the position on a vertex of target candidate frame and the ruler of target candidate frame When very little, according to the size, the area of the target candidate frame is calculated.

In this step, it is only necessary to according to the integrated value of the pixel of four vertex positions, target image can be calculated Area, required calculation amount is smaller, so as to reduce calculation amount required for filter operation, improves the rate of filtration, into And improve the efficiency of detection target image.

2034: when the ratio is less than default fractional threshold, the Filtration Goal marquis from the first candidate frame configuration information set Select the configuration information of frame.

When the ratio is greater than or equal to default fractional threshold, target marquis is retained in the first candidate frame configuration information set Select the location information of frame.

Referring to fig. 2-6, according to the foreground moving image the first candidate frame configuration information set is filtered to above-mentioned Process can be completed by following 2131 to 2134 operation, is respectively as follows:

2131: according to the configuration information of target candidate frame, obtaining target candidate frame corresponding figure in foreground moving image As region, the configuration information of target candidate frame is any one candidate frame in the first candidate frame configuration information set with confidence Breath.

Optionally, it is right in foreground moving image can be obtained according to the location information of target candidate frame for target candidate frame The image-region answered.

It is right according to a pair when the location information of target candidate frame includes the position of a pair of of angle steel joint of target candidate frame The position of each angle steel joint in angle point obtains target candidate frame corresponding image-region in foreground moving image.

When the location information of target candidate frame includes the position on a vertex of target candidate frame and the ruler of target candidate frame When very little, according to the position on a vertex and the size, target candidate frame corresponding image district in foreground moving image is obtained Domain.

Next, the target image area and target candidate being located in target candidate frame can be calculated according to the image-region Ratio between the area of frame, realization process may include following 2132 to 2134 operation.

2132: counting total pixel number of the pixel number and the image-region that belong to target image in the image-region Mesh.

Pixel in foreground moving image is divided into two classes, and a kind of pixel belongs to the target image being kept in motion, And the pixel value for belonging to such each pixel is 1, another kind of pixel belongs to the background image to remain static, and belongs to In another kind of each pixel pixel value be 0.

Optionally, the pixel number that pixel value is 1 in the image-region can be counted, obtains belonging in the image-region The pixel number of target image.

2133: calculating the ratio between the pixel number and total pixel number, obtain being located in target candidate frame Target image area and target candidate frame area between ratio.

2134: when the ratio is less than default fractional threshold, the Filtration Goal marquis from the first candidate frame configuration information set Select the configuration information of frame.

204: according to the second candidate frame configuration information set, adding detection block in picture to be detected, wrapped in the detection block Include at least one target image in picture to be detected.

In the present embodiment, can according to the confidence score of each candidate frame in the second candidate frame configuration information set, The configuration information of each candidate frame in second candidate frame configuration information set is ranked up, the first configuration information sequence is obtained Column.

Optionally, matching for the maximum default value candidate frame of confidence score can be selected from the first configuration information sequence Confidence breath, the detection block of each candidate frame, marquis are added according to the location information of each candidate frame of selection in picture to be detected Select the equal in magnitude of the detection block of frame and the candidate frame.

Optionally, non-maxima suppression operation can also be carried out to the first configuration information sequence, obtains the second configuration information Sequence, the number of the candidate frame configuration information in the second configuration information sequence are less than or equal to the marquis in the first configuration information sequence Select the number of frame configuration information.The maximum default value candidate frame of confidence score can be selected from the second configuration information sequence Configuration information, the detection of each candidate frame is added in picture to be detected according to the location information of each candidate frame of selection The detection block of frame, candidate frame and the candidate frame it is equal in magnitude.

So-called non-maxima suppression operation is exactly to identify that overlapping area is more than default threshold from the first configuration information sequence Any two candidate frame of value filters out the configuration information of one of candidate frame from two candidate frames, alternatively, by this two A candidate frame synthesizes a candidate frame, and the configuration information of the candidate frame after being synthesized.

Wherein, due in step 203, filtering out a large amount of candidate frame configuration from the first candidate frame configuration information set Information, so when the configuration information to the candidate frame in the first configuration information sequence carries out non-maxima suppression operation, it can be with Reduction needs the number of the candidate frame configuration information of operation processing to further improve to improve the efficiency of operation processing Detect the efficiency of target image.

Optionally, when picture to be detected adds detection block, the type of the target image in the detection block can also be added, It can be be added in picture to be detected according to the configuration information of second feature picture and each candidate frame of selection when realizing The type of target image in detection block and the detection block.

When realizing, the configuration information of second feature picture and each candidate frame of selection can be input to Fast Area-of-interest (Region of Interest, RoI) pond layer of Rcnn device, passes through the pond RoI of Fast Rcnn device Layer exports the target image types in each candidate frame of selection, and according to the location information of each candidate frame, to be detected The type of the target image in detection block and the detection block is added in picture.

Above-mentioned treatment process can be executed to each frame picture, thus realize in each frame picture add detection block and The type of target image in detection block.

The module map of Fast Rcnn device shown in -7 referring to fig. 2, Fast Rcnn device includes shared convolutional layer, peculiar Convolutional layer, the pond RoI layer and full articulamentum, picture to be detected are special with second after shared convolutional layer and the processing of peculiar convolutional layer Sign picture and the second candidate frame configuration information set are input to the pond RoI layer, using the processing of the pond RoI layer and full articulamentum The type of detection block and the target image in each detection block is exported afterwards.

Referring to fig. 2-8, pass through above-mentioned process, it can be deduced that the embodiment of the present application is applied to following software systems, the software System can be executed by the device, and to execute the process of the above method, which can set for the camera shooting in embodiment shown in FIG. 1 Standby or server etc., the software systems may include filter device, mixed Gauss model device, RPN device and Fast Rcnn dress It sets.

Picture to be detected is separately input to CNN and mixed Gauss model device in Fast Rcnn device, Fast Rcnn CNN in device inputs fisrt feature picture to RPN device；Mixed Gauss model device exports foreground moving figure to filter device Picture, RPN device input the first candidate frame configuration information set to filtering module again；Filter device through the above steps 203 behaviour The second candidate frame configuration information set is obtained, inputs the second candidate frame configuration information set, Fast to Fast Rcnn device Rcnn device adds the type of detection block and the target image in detection block in picture to be detected.

In the embodiment of the present application, the foreground moving image for obtaining picture to be detected, according to the foreground moving image, from The configuration information that the candidate frame including Partial image object is filtered out in one candidate frame configuration information set, obtains the second candidate Frame configuration information set is integrated into picture to be detected according to the second candidate frame configuration information and adds detection block, improves detection Precision.Due to filtering out the configuration information of a large amount of candidate frames in the second candidate frame configuration information set, in this way to the second marquis When frame configuration information set being selected to carry out non-maxima suppression operation, the configuration information for needing the candidate frame of operation processing is reduced Number improves processing speed, and then improve detection efficiency to reduce calculation amount.

Referring to Fig. 3-1, the embodiment of the present application provides a kind of device 300 for detecting target image, and described device 300 can be used In realizing embodiment shown in Fig. 2-1, the function of the server or picture pick-up device in embodiment illustrated in fig. 1 can also be realized, wrap It includes:

Acquiring unit 301, for obtaining the corresponding foreground moving image of picture to be detected and obtaining to picture to be detected The fisrt feature picture that convolution algorithm obtains is carried out, which includes the mesh being kept in motion in picture to be detected Logo image and the background image in addition to the target image；

Detection unit 302 is also used to detect the target image in fisrt feature picture and obtains the first candidate frame configuration information Set, the first candidate frame configuration information set includes the configuration information of each candidate frame at least one candidate frame, first It include at least one target image in each candidate frame in feature image, target image and mapping to be checked in fisrt feature picture Target image in piece is identical；

Filter element 303 is also used to according to the foreground moving image bag filter from the first candidate frame configuration information set The target image included is the configuration information of the candidate frame of Partial image image, obtains the second candidate frame configuration information set；

Adding unit 304 is also used to add detection in picture to be detected according to the second candidate frame configuration information set Frame includes at least one target image in picture to be detected in the detection block.

Optionally, referring to Fig. 3-2, the device 300 further include: at least one in Transmit-Receive Unit 305 and storage unit 306 It is a；

Wherein, picture to be detected can be the received picture of Transmit-Receive Unit 305, alternatively, picture to be detected can be storage The picture stored in unit 306.

Optionally, referring to Fig. 3-3, when function of the device 300 for realizing picture pick-up device, which can be with Including camera unit 307, which can be camera etc., and picture to be detected can shoot for camera unit 307 The picture arrived.The device 300 can also include Transmit-Receive Unit 305 and/or storage unit 306, which can be used for The picture that camera unit 307 is shot is sent, which can be used for storing the picture of the shooting of camera unit 307.

Optionally, when function of the device 300 for realizing server, which may include Transmit-Receive Unit 305 And/or storage unit 306.

Optionally, acquiring unit 301, for obtaining to be checked by carrying out mixed Gaussian background modeling to picture to be detected The corresponding foreground moving image of mapping piece.

Optionally, filter element 303 are used for:

According to the foreground moving image, the corresponding integrogram of foreground moving image is calculated；

Filtering the target image for including from the first candidate frame configuration information set according to the integrogram is Partial image The configuration information of the candidate frame of image.

Optionally, filter element 303 are used for

According to the configuration information of target candidate frame, target candidate frame corresponding integral graph region in the integrogram is obtained, The configuration information of target candidate frame is the configuration information of any one candidate frame in the first candidate frame configuration information set；

The area of the target image area and target candidate frame that are located in target candidate frame is calculated according to the integral graph region Between ratio；

When the ratio is less than default fractional threshold, the Filtration Goal candidate frame from the first candidate frame configuration information set Configuration information.

Optionally, filter element 303 are used for:

Obtain the integrated value for being located at the pixel of four vertex positions of the integral graph region；

According to the integrated value of each pixel of acquisition, the target image area being located in target candidate frame is calculated；

According to the configuration information of target candidate frame, the area of target candidate frame is calculated；

Calculate the ratio between target image area and the area of target candidate frame.

Optionally, filter element 303 are used for:

According to the configuration information of target candidate frame, target candidate frame corresponding image district in the foreground moving image is obtained Domain, the configuration information of target candidate frame are the configuration information of any one candidate frame in the first candidate frame configuration information set；

According to the image-region calculate be located at target candidate frame in target image area and target candidate frame area it Between ratio；

Optionally, filter element 303 are used for:

Count total pixel number of the pixel number and the image-region that belong to target image in the image-region；

The ratio between the pixel number and total pixel number is calculated, the target being located in target candidate frame is obtained Ratio between image area and the area of target candidate frame.

Optionally, adding unit 304 are used for:

It obtains and the second feature picture that convolution algorithm obtains is carried out to picture to be detected, convolution is carried out to fisrt feature picture The number of operation is less than the number that convolution algorithm is carried out to second feature picture；

According to second feature picture and the second candidate frame configuration information set, detection block is added in picture to be detected and is somebody's turn to do The type of target image in detection block.

In the embodiment of the present application, due to the foreground moving image of acquisition picture to be detected, in this way according to the foreground moving Image filters out the configuration information of the candidate frame including Partial image object from the first candidate frame configuration information set, obtains To the second candidate frame configuration information set, it is integrated into picture to be detected according to the second candidate frame configuration information and adds detection block, Detection accuracy can be improved.

Referring to fig. 4, Fig. 4 show a kind of 400 schematic diagram of device for detecting target image provided by the embodiments of the present application.It should Device 400 includes at least one processor 401, bus system 402, memory 403 and at least one transceiver 404.

The device 400 is a kind of device of hardware configuration, can be used to implement the function mould in device described in Fig. 3-1 Block.For example, it may occur to persons skilled in the art that acquiring unit 301, detection unit 302 in device 300 shown in Fig. 3-1, Filter element 303 and/or adding unit 304 can be called by least one processor 401 code in memory 403 come It realizes, the Transmit-Receive Unit 305 in device 300 shown in Fig. 3-1 can be realized by least one transceiver 404.

Optionally, which can also be used in the function of realizing picture pick-up device in embodiment as described in Figure 1, Huo Zheshi The function of server in existing embodiment shown in FIG. 1.

When the device 400 is used for the function of picture pick-up device, which can also include camera 407, shown in Fig. 3-1 Device 300 in camera unit 307 can be realized by the camera 407.

Optionally, above-mentioned processor 401 can be a general central processor (central processing unit, CPU), microprocessor, application-specific integrated circuit (application-specific integrated circuit, ASIC), Or it is one or more for controlling the integrated circuit of application scheme program execution.

Above-mentioned bus system 402 may include an access, and information is transmitted between said modules.

Above-mentioned transceiver 404 is used for and other equipment or communication, such as Ethernet, wireless access network (radio Access network, RAN), WLAN (wireless local area networks, WLAN) etc..

Above-mentioned memory 403 can be read-only memory (read-only memory, ROM) or can store static information and The other kinds of static storage device of instruction, random access memory (random access memory, RAM) or can deposit The other kinds of dynamic memory for storing up information and instruction, is also possible to Electrically Erasable Programmable Read-Only Memory (electrically erasable programmable read-only memory, EEPROM), CD-ROM (compact Disc read-only memory, CD-ROM) or other optical disc storages, optical disc storage (including compression optical disc, laser disc, light Dish, Digital Versatile Disc, Blu-ray Disc etc.), magnetic disk storage medium or other magnetic storage apparatus or can be used in carry or Store have instruction or data structure form desired program code and can by any other medium of computer access, but It is without being limited thereto.Memory, which can be, to be individually present, and is connected by bus with processor.Memory can also be integrated with processor Together.

Wherein, memory 403 is used to store the application code for executing application scheme, and is controlled by processor 401 System executes.Processor 401 is for executing the application code stored in memory 403, to realize in this patent method Function.

In the concrete realization, as one embodiment, processor 401 may include one or more CPU, such as in Fig. 4 CPU0 and CPU1.

In the concrete realization, as one embodiment, which may include multiple processors, such as the place in Fig. 4 Manage device 401 and processor 408.Each of these processors can be monokaryon (single-CPU) processor, can also To be multicore (multi-CPU) processor.Here processor can refer to one or more equipment, circuit, and/or be used for Handle the processing core of data (such as computer program instructions).

In the concrete realization, as one embodiment, when function of the device 400 for realizing server, the device 400 can also include output equipment 405 and input equipment 406.Output equipment 405 and processor 401 communicate, can be with a variety of sides Formula shows information.For example, output equipment 405 can be liquid crystal display (liquid crystal display, LCD), hair Light diode (light emitting diode, LED) shows equipment, and cathode-ray tube (cathode ray tube, CRT) is aobvious Show equipment or projector (projector) etc..Input equipment 406 and processor 401 communicate, and can receive use in many ways The input at family.For example, input equipment 406 can be mouse, keyboard, touch panel device or sensing equipment etc..

Above-mentioned the embodiment of the present application serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.

Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..

The foregoing is merely the alternative embodiments of the application, not to limit the application, it is all in spirit herein and Within principle, any modification, equivalent replacement, improvement and so on be should be included within the scope of protection of this application.

Claims

1. a kind of method for detecting target image, which is characterized in that the described method includes:

It obtains the corresponding foreground moving image of picture to be detected and obtains and the picture progress convolution algorithm to be detected is obtained Fisrt feature picture, the foreground moving image includes the target image being kept in motion in the picture to be detected and removes Background image outside the target image；

It detects the target image in the fisrt feature picture and obtains the first candidate frame configuration information set, the first candidate frame Configuration information set includes the configuration information of each candidate frame at least one candidate frame, the institute in the fisrt feature picture Stating includes at least one target image in each candidate frame, target image and the mapping to be checked in the fisrt feature picture Target image in piece is identical；

It is non-for filtering the target image for including from the first candidate frame configuration information set according to the foreground moving image The configuration information of the candidate frame of complete object image obtains the second candidate frame configuration information set；

According to the second candidate frame configuration information set, detection block is added in the picture to be detected, in the detection block Including at least one target image in the picture to be detected.

2. the method as described in claim 1, which is characterized in that it is described to obtain the corresponding foreground moving image of picture to be detected, Include:

By carrying out mixed Gaussian background modeling to the picture to be detected, the corresponding foreground moving of the picture to be detected is obtained Image.

3. method according to claim 1 or 2, which is characterized in that it is described according to the foreground moving image from described first The configuration information for the candidate frame that the target image for including is Partial image image, packet are filtered in candidate frame configuration information set It includes:

According to the foreground moving image, the corresponding integrogram of the foreground moving image is calculated；

Filtering the target image for including from the first candidate frame configuration information set according to the integrogram is incomplete mesh The configuration information of the candidate frame of logo image.

4. method as claimed in claim 3, which is characterized in that described to be configured according to the integrogram from the first candidate frame The configuration information for the candidate frame that the target image for including is Partial image image is filtered in information aggregate, comprising:

According to the configuration information of target candidate frame, the target candidate frame corresponding integrogram area in the integrogram is obtained Domain, the configuration information of the target candidate frame are the configuration of any one candidate frame in the first candidate frame configuration information set Information；

The target image area and the target candidate frame being located in the target candidate frame are calculated according to the integral graph region Area between ratio；

When the ratio is less than default fractional threshold, the target marquis is filtered from the first candidate frame configuration information set Select the configuration information of frame.

5. method as claimed in claim 4, which is characterized in that described calculated according to the integral graph region is located at the target Target image area in candidate frame and the ratio between the area of the target candidate frame, comprising:

According to the integrated value of each pixel of the acquisition, the target image area being located in the target candidate frame is calculated；

According to the configuration information of the target candidate frame, the area of the target candidate frame is calculated；

Calculate the ratio between the target image area and the area of the target candidate frame.

6. method according to claim 1 or 2, which is characterized in that it is described according to the foreground moving image from described first The configuration information for the candidate frame that the target image for including is Partial image image, packet are filtered in candidate frame configuration information set It includes:

According to the configuration information of target candidate frame, the target candidate frame corresponding image in the foreground moving image is obtained Region, the configuration information of the target candidate frame are matching for any one candidate frame in the first candidate frame configuration information set Confidence breath；

The target image area and the target candidate frame being located in the target candidate frame are calculated according to described image region Ratio between area；

7. method as claimed in claim 6, which is characterized in that described calculated according to described image region is located at the target marquis Select the ratio between the area of the target image area and the target candidate frame in frame, comprising:

Statistics belongs to the pixel number of target image and total pixel number in described image region in described image region；

The ratio between the pixel number and total pixel number is calculated, obtains being located in the target candidate frame Ratio between target image area and the area of the target candidate frame.

8. method as described in any one of claim 1 to 7, which is characterized in that described to match confidence according to the second candidate frame Breath set, adds detection block in the picture to be detected, comprising:

It obtains and the second feature picture that convolution algorithm obtains is carried out to the picture to be detected, the fisrt feature picture is carried out The number of convolution algorithm is less than the number that convolution algorithm is carried out to the second feature picture；

According to the second feature picture and the second candidate frame configuration information set, inspection is added in the picture to be detected Survey the type of frame and the target image in the detection block.

9. a kind of device for detecting target image, which is characterized in that described device includes:

Acquiring unit carries out the picture to be detected for obtaining the corresponding foreground moving image of picture to be detected and obtaining The fisrt feature picture that convolution algorithm obtains, the foreground moving image include being kept in motion in the picture to be detected Target image and the background image in addition to the target image；

Detection unit is also used to detect the target image in the fisrt feature picture and obtains the first candidate frame configuration information collection It closes, the first candidate frame configuration information set includes the configuration information of each candidate frame at least one candidate frame, in institute Stating includes at least one target image in each candidate frame described in fisrt feature picture, the target in the fisrt feature picture Image is identical as the target image in the picture to be detected；

Filter element, is also used to be filtered from the first candidate frame configuration information set according to the foreground moving image and includes Target image be Partial image image candidate frame configuration information, obtain the second candidate frame configuration information set；

Adding unit is also used to add detection in the picture to be detected according to the second candidate frame configuration information set Frame includes at least one target image in the picture to be detected in the detection block.

10. device as claimed in claim 9, which is characterized in that

The acquiring unit, for obtaining described to be detected by carrying out mixed Gaussian background modeling to the picture to be detected The corresponding foreground moving image of picture.

11. the device as described in claim 9 or 10, which is characterized in that the filter element is used for:

12. device as claimed in claim 11, which is characterized in that the filter element is used for:

13. device as claimed in claim 12, which is characterized in that the filter element is used for:

14. the device as described in claim 9 or 10, which is characterized in that the filter element is used for:

15. device as claimed in claim 14, which is characterized in that the filter element is used for:

16. such as the described in any item devices of claim 9 to 15, which is characterized in that the adding unit is used for:

17. a kind of device for detecting target image, which is characterized in that described device includes:

At least one processor；With

At least one processor；

At least one processor is stored with one or more programs, one or more of programs be configured to by it is described extremely A few processor executes, and one or more of programs include for carrying out such as any one of claim 1 to 8 claim institute The instruction for the method stated.

18. a kind of non-volatile computer readable storage medium storing program for executing, which is characterized in that for storing computer program, the calculating Machine program is loaded to execute the instruction of the method as described in any one of claim 1 to 8 claim by processor.