CN101968884A

CN101968884A - Method and device for detecting target in video image

Info

Publication number: CN101968884A
Application number: CN2009101616698A
Authority: CN
Inventors: 梅树起; 吴伟国
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2009-07-28
Filing date: 2009-07-28
Publication date: 2011-02-09

Abstract

The invention provides a method and a device for detecting a target in a video image. The method comprises the following steps of: respectively detecting a multi-frame image containing a target frame to obtain one or more than one first candidate target and a confidence degree thereof; and merging the first candidate target and the confidence degree thereof in the multi-frame image to obtain one or more than one second candidate target in the target frame.

Description

Detect the method and apparatus of the target in the video image

Technical field

The present invention relates to image processing techniques, particularly, relate to a kind of method and apparatus that detects the target in the video image.

Background technology

Object detection technology in the image is an important branch of computer vision.Have the difference on the profile more or less between the same type objects, the influence that is subjected to multiple factors such as illumination, visual angle, attitude when imaging may produce distinct state in image, and this object detection technology in the image of giving is brought very big difficulty.

Object detection technology in the video image has the characteristics of himself again, and in general the picture quality of video image will differ from manyly, and resolution is low, have motion blur, noise is remarkable etc., has brought new difficulty to target detection.When the detection of target be sorted in when carrying out in the video image, because the existence of noises such as decrease in image quality and motion blur, handling property has bigger decline.

Summary of the invention

Provide hereinafter about brief overview of the present invention, so that basic comprehension about some aspect of the present invention is provided.Should be appreciated that this general introduction is not about exhaustive general introduction of the present invention.It is not that intention is determined key of the present invention or pith, neither be intended to limit scope of the present invention.Its purpose only is to provide some notion with the form of simplifying, with this as the preorder in greater detail of argumentation after a while.

According to an aspect of the present invention, provide a kind of method that detects the target in the video image, described method comprises: frame detects step: the multiple image that comprises target frame is detected respectively, to obtain one or more first candidate targets and degree of confidence thereof; And combining step: first candidate target and degree of confidence thereof in the described multiple image are merged, obtain one or more second candidate targets in the target frame.

According to a further aspect in the invention, a kind of device that detects the target in the video image is provided, this device comprises: target detection and sorter are used for video sequence is comprised that the multiple image of target frame detects, to obtain one or more first candidate targets and degree of confidence thereof; And the merging module, be used for the degree of confidence of first candidate target of described multiple image is merged, to obtain one or more second candidate targets in the target frame.

In addition, embodiments of the invention also provide the computer program of the method for the target that is used for realizing above-mentioned detection video image.

In addition, embodiments of the invention also provide the computer program of computer-readable medium form at least, record the computer program code of the method for the target that is used for realizing above-mentioned detection video image on it.

Description of drawings

With reference to below in conjunction with the explanation of accompanying drawing, can understand above and other purpose of the present invention, characteristics and advantage more easily to the embodiment of the invention.Parts in the accompanying drawing are just in order to illustrate principle of the present invention.In the accompanying drawings, same or similar technical characterictic or parts will adopt identical or similar Reference numeral to represent.

Fig. 1-6 shows the indicative flowchart according to the method for the target in the detection video image of the embodiment of the invention respectively;

Fig. 7 and Fig. 8 show a process flow diagram of using example of the method that detects the target in the video image;

Fig. 9-13 shows the schematic block diagram according to the device of the target in the detection video image of the embodiment of the invention respectively;

Figure 14 shows the schematic block diagram of the structure of target detection and sorter according to an embodiment of the invention;

Figure 15 shows a schematic block diagram of using example of target detection and sorter; And

Figure 16 is the block diagram that the structure that is used to realize computing machine of the present invention is shown.

Embodiment

Embodiments of the invention are described with reference to the accompanying drawings.Element of describing in an accompanying drawing of the present invention or a kind of embodiment and feature can combine with element and the feature shown in one or more other accompanying drawing or the embodiment.Should be noted that for purpose clearly, omitted the parts that have nothing to do with the present invention, those of ordinary skills are known and the expression and the description of processing in accompanying drawing and the explanation.

Fig. 1 shows the schematic flow diagram of the method for the target in the detection video image according to an embodiment of the invention.

As shown in Figure 1, this method can comprise hereinafter described step S101 and S103.

Step S101 is used for multiple image is detected, and is also referred to as frame and detects step.Particularly, in this step, behind the multiple image that comprises target frame in obtaining the sequence of video images that will detect, multiple image is detected respectively, obtain one or more candidate targets (, hereinafter being called first candidate target) and degree of confidence thereof in order to narrate conveniently.

In step S103, will in step S101, first candidate target and the degree of confidence thereof in the detected described multiple image merge, thereby obtain one or more candidate targets (, hereinafter being called second candidate target) in the target frame in order to narrate conveniently.Step S103 is also referred to as combining step.

The multiple image that is obtained can be continuous in time, also can be discontinuous in time.Narration is supposed the P two field picture Image in the video sequence for convenience ₀..., Image _i..., Image _P-1Handle respectively, wherein Image _iBe target frame, P is integer and P＞1, i=0 ..., P-1.For example, can utilize target frame Image _iMultiple image Image before _I-1, Image _I-2..., also can utilize this frame multiple image Image afterwards _I+1, Image _I+2..., can also utilize the multiple image before and after this frame ..., Image _I-2, Image _I-1, Image _I+1, Image _I+2....The number of image frames of choosing also can be decided according to practical application.Should be understood that the present invention should not be regarded as being confined to any specific embodiment or example.

Should also be understood that and to adopt target detection and sorter that image is detected.Described target detection and sorter can adopt any suitable technology to realize, omit the description to it here.

As an example, the information that detects the candidate target that obtains can comprise position, size and the degree of confidence etc. of each candidate target in every two field picture.In one example, target detection and sorter can be exported response after the image pattern of input is handled, and for example, described response can be the number between 0 to 1, certainly, also can be other numerical value.According to described response, can determine the value of corresponding degree of confidence.Can adopt any suitable method, utilize described response to determine the degree of confidence of target or candidate target, repeat no more here.

In one example, target detection that is adopted and sorter can use the image pattern of fixed measure as input.In this case, be input to target detection and sorter and can be target detection and image pattern that sorter is accepted position and size position and the size of candidate target in original image that promptly can be used as output thereof in original image.

In one example, the step that first candidate target in the detected described multiple image and degree of confidence thereof are merged can comprise: according to position and the size of detected first candidate target in each two field picture in step S101, the degree of confidence of the candidate target that the position is adjacent in each image, size is close is merged.

In each embodiment described herein and/or example, described target or candidate target can be static targets, also can be moving targets.Under the situation of moving target, position and the size of same target in a plurality of picture frames changes.For example, when target drew near, its size in each image of arranging in chronological order can be increasing, and detected its position in each image also can change according to its direction of motion; Vice versa.Under the situation of static target, when the multiple image that comprises this static target is detected, because factors such as picture quality also may cause detected position slightly different in various images.

In each embodiment described herein and/or example, so-called " the adjacent size in position is close " is meant that detected candidate target The corresponding area has the adjacent center size close with size in each two field picture.For example, adjacent each center that can refer to, center differs one or more pixels, those skilled in the art will appreciate that the dbjective state (as movement velocity, direction etc.) that pixel count can detect according to actual needs and decides.Here do not enumerate one by one.And for example, the close size that can refer to of size differs one or more pixels.Those skilled in the art will appreciate that the dbjective state (as target sizes, movement velocity etc.) that the pixel count that differs can detect according to actual needs here and decide.Here do not enumerate one by one.As an example, depend on applied scene, for example a priori the possible path of object-aware and possible speed then can be set the center of the candidate target that is considered as same target between the consecutive frame and the scope of change in size in view of the above.

In one example, can also determine in target frame (as image I mage according to the amalgamation result among the step S103 _i) in whether have target.For example, if amalgamation result is then determined at image I mage more than or equal to a certain predetermined threshold value (being called first threshold) _iThe relevant position on have target; Otherwise, then determine at image I mage _iThe relevant position on do not have target.

In some cases, may not detect corresponding candidate target in some two field picture in the P two field picture.In these cases, when merging, can give tacit consent to that these frames are detected the confidence value that obtains is 0.

In one example, the step that degree of confidence is merged can comprise: calculate each degree of confidence and.In another example, the step that degree of confidence is merged can comprise: with each degree of confidence normalization, and to the degree of confidence after normalization summation or average.In one example, the step that degree of confidence is merged can comprise: the mean value that calculates each degree of confidence.Should be understood that the method that degree of confidence is merged described herein only is exemplary, is not to be intended to the present invention is limited to this.Within the scope of the invention, those of ordinary skill in the art can utilize various other suitable merging methods (for example compute histograms etc.) that described degree of confidence is merged.

In said method, by comprehensive continuous multiple frames image (as Image ₀..., Image _i..., Image _P-1) detection and classified information obtain wherein target frame (as Image _i) the result.Like this, the response that degree of confidence was lower when this frame was detected separately can be strengthened by the support of front and back frame, thereby can be detected.And the false response that randomness occurs also can be suppressed because of the continued support that can not get successive frame.Compare with only utilizing single-frame images, utilize continuous sequence of video images can improve the effect that detects or classify.

In one example, in order to detect multi-class targets, can adopt the target detection and the sorter that comprise a plurality of sub-classifiers.Figure 14 shows the schematic block diagram of this target detection and sorter.As shown in figure 14, this target detection and sorter can comprise N sub-classifier (N＞1), and wherein each sub-classifier can detect a class target.The quantity N of sub-classifier can be provided with according to other quantity of target class that needs in the practical application detect.In addition, those of ordinary skill in the art should be understood that address embodiment hereinafter and or example in sub-classifier can adopt any suitable technology to realize, omitted detailed description here to them.

In one example, when use is used to detect a plurality of sub-classifier of different classes of target, can also judge the classification of target.For example, after certain image pattern is imported a plurality of sub-classifiers, the degree of confidence of other sub-classifiers outputs is 0 if certain sub-classifier is exported certain degree of confidence, then can determine the classification of target according to the pairing classification of sub-classifier of this degree of confidence of output, this degree of confidence is then as the degree of confidence of this target.If the certain degree of confidence of sub-classifier output more than is arranged, then can be worth the classification that pairing classification (promptly exporting the pairing classification of sub-classifier of maximum confidence value) is determined this target according to maximum confidence; In this case, the degree of confidence of the maximum degree of confidence as this target also can be merged the confidence value of each sub-classifier output.If exist two or more confidence values and maximal value to equate and correspond respectively to different classifications, then the category attribute of this target can be labeled as " uncertain ", in this case, the degree of confidence of the maximum degree of confidence as this target also can be merged the confidence value of each sub-classifier output.In addition, the attribute that in the merging of degree of confidence is handled, can ignore each candidate target.For example, when first candidate target in the detected multiple image and degree of confidence thereof are merged, the degree of confidence of each first candidate target can be merged, and do not consider its objective attribute target attribute.And for example, under situation about existing for certain image pattern, each degree of confidence of these sub-classifier outputs can be merged degree of confidence, and not consider the attribute of each sub-classifier as the corresponding candidate target more than the certain degree of confidence of one sub-classifier output.To the method that degree of confidence merges can foregoingly sue for peace like that, average, to the summation of the degree of confidence after the normalization or average etc., do not enumerate one by one here.

In the method, owing to combine the detection information of multiple image, therefore, the detection of target category attribute had good robustness.

In addition, it should be noted that, above so-called " first candidate target " and " second candidate target " and " the 3rd candidate target " that hereinafter will mention and terms such as " the 4th candidate targets " only are detection and the results for different phase in the differentiating and processing process, and be irrelevant with the classification of candidate target.

Fig. 2 shows the schematic flow diagram of the method for the target in according to another embodiment of the present invention the detection video image.

Embodiment shown in Figure 2 and embodiment shown in Figure 1 are similar.Difference is, in the embodiment shown in Figure 2, also comprises in this process that each two field picture is detected every two field picture is carried out traversal processing, thereby make testing result more accurate.

In step S201, utilize predetermined window (hereinafter being called first window) the every two field picture in the multiple image to be traveled through with predetermined step-length (hereinafter being called first step-length), and resulting each video in window detected, obtain one or more candidate targets (hereinafter claiming the 3rd candidate target) and degree of confidence thereof.In order to narrate conveniently, step S201 is also referred to as first traversal step.

In step S203, the 3rd candidate target that the adjacent size in position is close and degree of confidence thereof merge, as first candidate target and degree of confidence thereof.In order to narrate conveniently, step S203 is also referred to as the first traversing result combining step.

When in step S201 and S203, the every two field picture in the multiple image carried out above-mentioned traversal with merge handle after, execution in step S205 then.

Step S205 and step S103 shown in Figure 1 are similar, repeat no more here.

In this embodiment, every two field picture is traveled through, and will the testing result of all video in windows of obtaining be merged, as testing result this image.As an example, the size that travels through the image pattern that the size of employed window can detect according to employed target detection and sorter is determined.Suppose to have target to be detected in a certain zone of image, then this target may be arrived by a plurality of cycling among windows " frame " when this image is traveled through.In other words, may have a plurality of video in windows and comprise all or part of of this target, these video in windows be detected a plurality of the 3rd candidate targets and the degree of confidence thereof that then can obtain corresponding to this same target.These the 3rd candidate targets promptly belong to the close candidate target of the adjacent size in position.In this case, can in step S203, the 3rd candidate target and the degree of confidence thereof that the adjacent size in these positions is close merge.The method that degree of confidence is merged above has been described, has repeated no more here.

In one example, after having carried out step S203, described method can also comprise one first determining step.Particularly, in first determining step, whether the amalgamation result of degree of confidence of judging the 3rd candidate target that the adjacent size in position is close is less than a certain predetermined threshold value (being called second threshold value), if, then abandon this amalgamation result, otherwise keep this amalgamation result, as the degree of confidence of first candidate target.The resulting first candidate target The corresponding area can be with the 3rd candidate target with maximum confidence regional corresponding, can also cover the zone of the 3rd candidate target of all correspondences.

In the embodiment shown in Figure 2, utilize window that every two field picture is traveled through, processing procedure is than embodiment complexity shown in Figure 1, but testing result is more accurate.In one example, described window can be a rectangular window, and its size can be decided according to the actual requirements.Described first step-length also can be decided according to the actual requirements, and for example, this step-length can be one or more pixels, can also with the proportional relation of the size of current window.The order of described traversal and mode also are arbitrarily, can be from left to right, from top to bottom, can also be from right to left, from top to bottom.The present invention does not impose any restrictions this.

Fig. 3 shows the schematic flow diagram of the method for the target in according to another embodiment of the present invention the detection video image.Embodiment shown in Figure 3 is similar to embodiment shown in Figure 2.Difference is, in the embodiments of figure 3, after having carried out first traversal step, to the zone that may have target also will be meticulousr traversal, thereby make testing result more accurate.

As shown in Figure 3, step S301-S302 is similar to the step S201-S203 in embodiment illustrated in fig. 2, is respectively first traversal step and the first traversing result combining step, no longer repeats here.

In step S305,, set up one or more region-of-interest ROI (Regions OfInterest) of every two field picture according to the amalgamation result that obtains among the step S303 (i.e. the first traversing result combining step).Each region-of-interest ROI covers the zone corresponding with the amalgamation result of corresponding the 3rd candidate target.Step S305 also can be called the region-of-interest establishment step.

In step S307, utilize predetermined window (hereinafter being called second window) each region-of-interest to be traveled through with predetermined step-length (hereinafter being called second step-length), resulting each video in window is detected, obtain one or more candidate targets (the 4th candidate target hereinafter referred to as).Step S307 also can be called second traversal step.

In one embodiment, can be by the zone corresponding with amalgamation result be suitably enlarged to determine each region-of-interest, promptly appropriateness enlarges the scope of second traversal.Like this, can further reduce the possibility of omission and flase drop, therefore can further improve the accuracy of detection.For example,, then can should on length and width, expand one or more pixels respectively in the zone, thereby obtain corresponding region-of-interest according to size, movement velocity and the direction etc. of target if the zone corresponding with amalgamation result is rectangle.And for example,, then can should expand one or more pixels at radius in the zone, thereby obtain corresponding region-of-interest according to size, movement velocity and the direction etc. of target if the zone corresponding with amalgamation result is circular.

The mode of second traversal can be similar to first mode that travels through among the step S201, repeats no more here.Second window also can be different in size or in shape can be identical with first window.In order to carry out meticulousr traversal, second step-length can be less than first step-length.In other examples, second step-length also can be equal to or greater than first step-length.In one example, can avoid repeating traversal, promptly identical at first window and under the situation that step-length is different with second window size, the window's position of avoiding repeat search to search for, thus accelerate the speed handled.

In step S309, the 3rd candidate target that the adjacent size in position is close and the 4th candidate target and degree of confidence thereof merge, as first candidate target and degree of confidence thereof.Step S309 is also referred to as the second traversing result combining step.

Step S311 is identical with step S205 or S103, repeats no more here.

In one example, after having carried out step S303, before the execution in step S305, described method can also comprise first determining step.Particularly, in first determining step, whether the amalgamation result of degree of confidence of judging the 3rd candidate target that the adjacent size in position is close is less than a certain predetermined threshold value (being called the 3rd threshold value), if, then abandon this amalgamation result, otherwise keep this amalgamation result and execution in step S305, to set up ROI according to this amalgamation result.

In one example, after execution in step S309, described method can also comprise second determining step.Particularly, in second determining step, if the degree of confidence amalgamation result of close the 4th candidate target of the adjacent size in position is less than a certain predetermined threshold value (being called the 4th threshold value), then abandon this amalgamation result, otherwise, keep this amalgamation result, as the degree of confidence of first candidate target.The resulting first candidate target The corresponding area can be with the 4th candidate target with maximum confidence regional corresponding, can also cover the zone of the 4th candidate target of all correspondences.

In the above-described embodiments, image has been carried out twice traversal (to first traversal step of image with to second traversal step of ROI).One or more ROI by setting up every two field picture, each ROI is carried out meticulousr traversal, can further improve the accuracy of target detection.

Fig. 4 shows the schematic flow diagram of the method for the target in according to another embodiment of the present invention the detection video image.Embodiment shown in Figure 4 is similar to embodiment shown in Figure 2.Difference is, in the embodiment shown in fig. 4, first traversal step also comprises the multiple dimensioned pyramid diagram picture of setting up every two field picture and the process that the multiple dimensioned pyramid diagram of every two field picture is looked like to handle, and the accuracy so that further raising detects reduces omission and flase drop.So-called multiple dimensioned pyramid diagram similarly is that the yardstick (being size) based on original image changes continuously the set of diagrams picture that generates, and the size between adjacent two images of yardstick differs a constant factor.Former figure dwindled or amplify generate other scalograms as the time, can add other processing, gaussian filtering for example, the gained new images is different and variant because of its Preprocessing Algorithm that applies and scale algorithm, no longer describes in detail here.

As shown in Figure 4, in step S401, set up the multiple dimensioned pyramid diagram picture (supposing that this pyramid diagram looks like to comprise the K layer, K＞1) of every two field picture in the multiple image.In order to narrate conveniently, this step S410 is also referred to as frame pyramid establishment step.

Described multiple dimensioned pyramid diagram picture can adopt any suitable method to set up, in the description of this omission to it.The number of plies of described pyramid diagram picture can be set according to the needs of practical application, and the present invention does not do any qualification to it.

In step 403, utilize predetermined window (simple and clear in order to narrate, here hypothesis adopts first window) (simple and clear with predetermined step-length in order to narrate, here hypothesis adopts first step-length) in the pyramid diagram picture each traveled through, resulting each video in window is detected, thereby obtain one or more candidate targets (simple and clear in order to narrate, as to be also referred to as the 3rd candidate target here) and degree of confidence thereof.Step S403 is also referred to as the 3rd traversal step.The mode of the 3rd traversal can be similar to first mode that travels through among the step S201, repeats no more here.In one example, the 3rd candidate target The corresponding area that pyramid diagram is looked like to detect and obtain can be mapped in the original image, thereby determine the 3rd candidate target The corresponding area in original image.

Step S405-S407 is similar to the step S203-S205 in embodiment illustrated in fig. 2, no longer repeats here.

In method shown in Figure 4, the multiple dimensioned pyramid diagram picture by setting up image also looks like to travel through and detect to pyramid diagram, can further improve the accuracy of video image being carried out multi-target detection, reduces the possibility of omission and flase drop.

Fig. 5 shows the schematic flow diagram of the method for the target in according to another embodiment of the present invention the detection video image.Embodiment shown in Figure 5 is similar to embodiment shown in Figure 4.Difference is that in the embodiment shown in fig. 5, the 3rd traversal step is that the image of the minimum dimension from the pyramid diagram picture of being set up begins to handle.Promptly carrying out once interim degree of confidence every certain yardstick span merges.If should the stage amalgamation result enough big (promptly the degree of confidence of He Binging is enough high), then can set up target mask Mask, this target mask is corresponding to corresponding candidate target.Like this, when remaining pyramid diagram is looked like to handle, can not handle these target mask The corresponding area, so that accelerate detection speed.

As shown in Figure 5, step S501 is similar to the step S401 in embodiment illustrated in fig. 4, no longer repeats here.In order to narrate conveniently, suppose image I mage _iMultiple dimensioned pyramid diagram look like to comprise K layer, i.e. Pyramid ₀..., Pyramid _j..., Pyramid _K-1, wherein, j=0 ..., K-1, K＞1.

Step S503 is similar to step S403.Difference is, in step S503, traversal is that the pyramid diagram from minimum dimension (being that resolution is minimum) looks like to begin.Suppose Pyramid ₀Image for minimum dimension in the pyramid diagram picture increases progressively Pyramid successively from 0 to K-1 _K-1Image for full-size in the pyramid diagram picture (being that resolution is the highest).In order to narrate conveniently, step S503 is divided into two sub-steps S5031, S5032.

In step S5031, from Pyramid ₀Begin each image in the pyramid diagram picture is traveled through, resulting each video in window is detected, thereby obtain one or more the 3rd candidate targets.When handling N image Pyramid ₀..., Pyramid _N-1(during N＜K), execution in step S5032.In order to narrate conveniently, step S5031 is also referred to as the 3rd traversal step.

In step S5032, based on to the N in the pyramid diagram picture image Pyramid ₀..., Pyramid _N-1(1≤N＜K) detects and one or more the 3rd candidate targets of obtaining, sets up one or more target mask Mask.The zone of each target mask Mask can be determined according to the zone of corresponding the 3rd candidate target.For example the target mask can cover corresponding the 3rd candidate target, also can be approximately equal to or less than the zone of corresponding the 3rd candidate target.Then, repeated execution of steps S5031 handles a remaining K-N image, in processing procedure, with described target mask Mask The corresponding area as the zone of not handling.In order to narrate conveniently, step S5032 is also referred to as the mask establishment step.

Above-mentioned steps S5031 and S5032 can carry out repeatedly, and all the K tomographic images in handling the pyramid diagram picture obtain one or more the 3rd candidate targets and degree of confidence thereof.

The foregoing description is set up the target mask according to interim testing result, and in subsequent treatment processing target mask The corresponding area not.The speed that can accelerate to detect like this.But, too much if the target mask is provided with, then in subsequent treatment, may increase the possibility of omission.In one example, can be by will suitably dwindling to determine described target mask Mask with the zone of corresponding the 3rd candidate target, thus avoid occurring because of the excessive omission that causes of target mask.For example,, then can should on length and width, dwindle one or more pixels respectively in the zone, thereby obtain corresponding target mask according to size, movement velocity and the direction etc. of target if the zone corresponding with corresponding the 3rd candidate target is rectangle.And for example, if the zone corresponding with corresponding the 3rd candidate target is circular, then can according to size, movement velocity and the direction etc. of target should the zone in the one or more pixels of reduced radius, thereby obtain corresponding target mask Mask.

In addition, though the target mask is to determine in the image of current yardstick, it has represented relative position and the relative size of determining in image.That is to say, to follow-up pyramid Flame Image Process the time, because the variation of graphical rule need be mapped to target mask Mask respectively a remaining K-N pyramid diagram as Pyramid _N..., Pyramid _K-1Thereby, obtain the masks area Mask in current yardstick pyramid diagram picture _N..., Mask _K-1, but their relative position and sizes in image do not change.Based on this reason, hereinafter, no longer be repeated in this description above-mentioned mapping step.Can censure all corresponding mask zones in all follow-up pyramid diagram pictures with " target mask " in addition.

As an example, can also comprise that in step S5032 one merges determining step, is about to from described N image Pyramid ₀..., Pyramid _N-1In the 3rd close candidate target of the adjacent size in position that obtains merge, if amalgamation result is then set up target mask Mask more than or equal to predetermined threshold value (being called the 5th threshold value), otherwise do not set up target mask Mask.In this example, only when the degree of confidence of candidate target is enough high, just set up the target mask.Compare with the foregoing description, can reduce the possibility of omission.

Step S505-S507 is similar to the step S405-S407 among the embodiment shown in Figure 4 respectively, no longer repeats here.

The above-mentioned interim step-length (being N) that merges can be selected according to actual conditions, does not impose any restrictions here.According to the difference of step-length N and the number of plies K of pyramid diagram picture, can repeatedly set up the target mask, progressively get rid of the candidate target The corresponding area of having determined, thus speed up processing.

Fig. 6 shows the schematic flow diagram of the method for the target in according to another embodiment of the present invention the detection video image.The method of Fig. 6 is similar to embodiment shown in Figure 3, and difference is, in the embodiment of Fig. 6, can also set up the multiple dimensioned pyramid of each region-of-interest and carry out corresponding traversal processing, thereby make testing result more accurate.

As shown in Figure 6, step S601-S605 is similar to step S301-S305 embodiment illustrated in fig. 3 respectively no longer repeats here.

In step S606, set up the multiple dimensioned pyramid diagram picture of each region-of-interest.

Can utilize any suitable method to set up multiple dimensioned pyramid diagram picture, repeat no more here.In one example, can also be with each region-of-interest projection (mapping) to original image Image _iPyramid diagram as Pyramid ₀..., Pyramid _K-1Thereby, obtain the multiple dimensioned pyramid diagram picture of this region-of-interest.

In step S607, utilize predetermined window (as second window) in the pyramid diagram picture of region-of-interest each to be traveled through with predetermined step-length (as second step-length), resulting each video in window is detected, to obtain one or more candidate targets (simple and clear in order to narrate, as to be called the 4th candidate target) and degree of confidence thereof.In one example, described second step-length can be less than described first step-length, to carry out meticulousr traversal.Certainly, described second step-length also can be more than or equal to described first step-length.

In one example, for speed up processing, can no longer repeat traversal for the window that had traveled through.

Step S609-S611 is similar to the step S309-S311 among the embodiment shown in Figure 3 respectively, repeats no more here.

Fig. 7 and Fig. 8 show of the method that detects the target in the video image according to an embodiment of the invention and use example, and Figure 15 schematically shows the target detection used in the example of Fig. 7 and Fig. 8 and the mode of operation of sorter.In this example, with the multiclass automobile as the target that will detect, exemplarily, with car, bus and truck (positive visual angle) as the target that will detect.Fig. 7 shows the indicative flowchart of this example, and Fig. 8 shows the window traversal shown in Figure 7 and the detail flowchart of the meticulous traversal of ROI.

As shown in figure 15, target detection and sorter comprise three sub-classifiers, i.e. car CDC (Classifier for Detection and Classification), bus CDC and truck CDC.For the input sample, if certain CDC is output as a positive number, can think that then sample has passed through this CDC with certain degree of confidence, sample has this type of objective attribute target attribute.If certain CDC is output as a negative, then can think sample by this CDC with certain degree of confidence refusal.A plurality of CDC can adopt different patterns when associated working.This example adopts parallel schema.Certainly, in actual applications, can also adopt other appropriate mode, as serial mode.For example under serial mode, the input sample is handled by next CDC after being handled by a CDC again.As shown in figure 15.Under parallel schema, sample to be detected is imported the CDC of three classifications simultaneously.Certain sample to be tested may all just have for a plurality of CDC and responds, thereby can have multiple category attribute.

As shown in Figure 7, at step S701, the every two field picture in the multiple image is carried out the window traversal.

As shown in Figure 8, get image I mage from the multiple image that is used for detecting _i, set up image I mage _iMultiple dimensioned pyramid diagram as Pyramid _o..., Pyramid _j..., Pyramid _K-1, j=0,1 ..., K-1, K＞1.Then, (pyramid diagram that is the size minimum is as Pyramid from the cat head of pyramid diagram picture ₀) beginning, use predetermined window to travel through, each video in window is used target detection and sorter shown in Figure 15, write down its positive response results.

As image Pyramid for current size _jWhen traversal finished, the pyramid diagram that switches to next large-size was as Pyramid _J+1Proceed traversal.

Every certain yardstick span, carry out once interim the merging (merging similar) to the stage among the embodiment shown in Figure 5.Particularly, the zone with the positive response correspondence of current all that obtain is mapped to original image Image _i, ignore its category attribute, the positive response that the adjacent size in position is close merges.If the result after certain merges is greater than certain predetermined threshold value T ₁, then will merge the zone and suitably inwardly dwindle (employing and previous embodiment and/or example similar methods) and obtain a target mask Mask.To all just response amalgamation results, obtain one group of target mask Mask corresponding to one or more candidate targets.

With each target mask Mask projection (mapping) all images that do not detect in the multiple dimensioned pyramid diagram picture, obtain one group of mapping masks area, will shine upon masks area and be labeled as non-surveyed area.Afterwards, repeat above-mentioned steps, all images that do not detect in the multiple dimensioned pyramid diagram picture detected and handle, wherein, non-surveyed area is not handled, thus under the situation of not omission as far as possible speed up processing.

At step S702, carry out merging the first time.

Particularly, will be at image I mage _iAll positive response results of carrying out above-mentioned window traversal and obtaining are mapped to original image Image _i, ignore its category attribute, the positive response that the adjacent size in position is close merges, if amalgamation result is less than certain predetermined threshold T ₂, then ignore this amalgamation result.Otherwise, the The corresponding area of amalgamation result is suitably outwards enlarged (similar) and obtains one group of region-of-interest ROI to the method among the embodiment shown in Figure 3.

At step S703, carry out the meticulous traversal of ROI.

Particularly, to i-1 frame Image _I-1Carry out the window traversal among the above-mentioned steps S701.According to i-1 frame Image _I-1Amalgamation result generate another the group ROI, with image I mage _iROI merge into one group.To each ROI, it is projected corresponding multiple dimensioned pyramid diagram picture, remember that current yardstick is scale _iWith scale _iBe the center, this ROI is projected several adjacent pyramid diagram pictures (scale for example simultaneously _i(i=-2 ,-1,0,1,2)), form ROI pyramid (perhaps directly generating the meticulousr ROI pyramid of yardstick) with the ROI image.With meticulousr traversal step-length the ROI pyramid is replenished traversal.The so-called traversal of replenishing is meant that window that will travel through and the window that had traveled through do not repeat.

At step S704, carry out merging the second time.

Particularly, all positive response results that will obtain in step S701 (window traversal) and step S703 (the meticulous traversal of ROI) are mapped to original image, ignore its category attribute, the just response that the adjacent size in position is close merges, and each amalgamation result is write down all category attributes and corresponding degree of confidence.

At step S705, the time domain of carrying out multiframe information merges.

Particularly, the amalgamation result second time of i-2, i-1, continuous altogether three two field pictures of i two field picture is carried out the merging once more of " the adjacent size in position is close ", promptly add up all categories attribute of above-mentioned amalgamation result, and merge respective confidence (being the amalgamation result of above-mentioned positive response).If the amalgamation result of degree of confidence less than certain predetermined threshold value, is then given up this amalgamation result.

At last, each result for the time domain merging, that that get degree of confidence maximum in its all categories attribute be the final category attribute of testing result for this reason, if exist its degree of confidence of plural category attribute to equate and maximum, then the category attribute of this testing result of mark is " uncertain ".

Those of ordinary skill in the art should be understood that choose i-2, i-1 here, the i two field picture only is an example.In actual applications, choose which image and the number of image frames chosen can be decided according to concrete needs.The present invention should not be regarded as being confined to any specific embodiment or example.

In above-mentioned example, metric space traversal strategy from coarse to fine had both quickened testing process, had reduced to occur in target image inside the possibility of false response again.In addition, based on the meticulous traversal of ROI of the amalgamation result first time,, be effective utilization to successive frame information particularly based on i-1 frame result's ROI, make the possibility of omission reduce greatly.Though the possibility of flase drop also increases, be the effect gain on the whole.Reduce omission is very beneficial for can producing great risk when omission takes place application scenarios (for example security monitoring).

Fig. 9 shows the structural representation of the device that detects the target in the video image according to an embodiment of the invention.As shown in Figure 9, the device of the target in the detection video image can comprise target detection and sorter 901 and merge module 902.

Target detection and sorter 901 to be used for video sequence is comprised that the multiple image of target frame detects, and to export one or more candidate targets (in order narrating conveniently, being also referred to as first candidate target) and degree of confidence thereof.

Identical with previous embodiment, described multiple image can be continuous in time, also can be discontinuous in time.Narration has supposed to obtain the P two field picture Image in the video sequence for convenience ₀..., Image _i..., Image _P-1, Image wherein _iBe target frame, P is integer and P＞I, i=0 ..., P-1.For example, can utilize target frame Image _iMultiple image Image before _I-1, Image _I-2..., also can utilize this frame multiple image Image afterwards _I+1, Image _I+2..., can also utilize the multiple image before and after this frame ..., Image _I-2, Image _I-1, Image _I+1, Image _I+2....The number of image frames of choosing also can be decided according to practical application.Should be understood that the present invention should not be regarded as being confined to any specific embodiment or example.

Target detection and sorter 901 can adopt any suitable technology to realize, omit the description to it here.

Merge the degree of confidence that module 902 is used for first candidate target of described multiple image that target detection and sorter 901 detections are obtained and merge, to obtain the one or more candidate targets (being also referred to as second candidate target) in the target frame.

Said apparatus passes through comprehensive multiple image (as Image ₀..., Image _i..., Image _P-1) detection and classified information obtain wherein target frame (as Image _i) the result.Like this, the response that degree of confidence was lower when this frame was detected separately can be strengthened by the support of front and back frame, thereby can be detected.And the false response that randomness occurs also can be suppressed because of the continued support that can not get successive frame.Compare with only utilizing single-frame images, utilize continuous sequence of video images can improve the effect that detects or classify.

In one example, described merging module 902 is used for also judging that whether the amalgamation result of degree of confidence of first candidate target of described multiple image is more than or equal to a certain predetermined threshold value (as first threshold), if then judge at target frame Image _iThe relevant position on have target; Otherwise, then determine at image I mage _iThe relevant position on do not have target.

In some cases, may not detect certain corresponding candidate target in some two field picture in the P two field picture.In these cases, merge module 902 when merging, the confidence value that can give tacit consent to these frames detection outputs is 0.

In one example, merge the degree of confidence merging that module 902 also is used for first candidate target that the adjacent size in described multiple image position is close, to obtain one or more second candidate targets in the target frame.The merging method of so-called " the adjacent size in position is close " and degree of confidence is all explained hereinbefore, no longer repeats here.

Figure 10 shows the structural representation of the device that detects the target in the video image according to another embodiment of the present invention.Similar to embodiment shown in Figure 9, the device of the target in the detection video image shown in Figure 10 comprises target detection and sorter 1001 and merges module 1002 that the function of these two modules is similar to the corresponding module of Fig. 9.Different with the embodiment of Fig. 9 is that device shown in Figure 10 can also comprise spider module 1003.In this embodiment, utilize 1003 pairs of every two field pictures of spider module to travel through, and will the testing result of all video in windows of obtaining be merged, as testing result this image.Then, with the testing result merging of multiple image, to obtain the candidate target in the target frame.

Every two field picture in the multiple image that spider module 1003 is used for utilizing predetermined window (as first window) video sequence to be comprised target frame with predetermined step-length (as first step-length) travels through, and resulting each video in window is outputed to target detection and sorter 1001.

1001 pairs of each video in windows from spider module 1003 of target detection and sorter detect, and obtain one or more candidate targets (in order to narrate conveniently, being called the 3rd candidate target) and degree of confidence thereof.

Merge the degree of confidence merging that module 1002 is used for the 3rd candidate target that the adjacent size in position of target detection and sorter 1001 outputs is close, to obtain one or more first candidate targets and degree of confidence thereof.

Merging module 1002 also is used for and will merges from detected one or more first candidate targets of multiple image and degree of confidence thereof, thereby obtains one or more second candidate targets in the target frame.

As an example, the size of the image pattern that the size of spider module 1003 employed windows can detect according to target detection and sorter 1001 is determined.Suppose to have target to be detected in a certain zone of image, then this target may be arrived by a plurality of cycling among windows " frame " when 1003 pairs of these images of spider module travel through.In other words, may have a plurality of video in windows and comprise all or part of of this target, these video in windows be detected a plurality of the 3rd candidate targets and the degree of confidence thereof that then can obtain corresponding to this same target.These the 3rd candidate targets promptly belong to the close candidate target of the adjacent size in position.

In one example, merge module 1002 and can also judge that whether the amalgamation result of degree of confidence of the 3rd candidate target that the adjacent size in position is close is less than a certain predetermined threshold value (being called second threshold value), if, then abandon this amalgamation result, otherwise keep this amalgamation result, as the degree of confidence of first candidate target.The resulting first candidate target The corresponding area can be with the 3rd candidate target with maximum confidence regional corresponding, can also cover the zone of the 3rd candidate target of all correspondences.

Device shown in Figure 10 can utilize predetermined window and step-length that every two field picture is traveled through, and makes testing result more accurate.In one example, described window can be a rectangular window, and its size can be decided according to the actual requirements.Described first step-length also can be decided according to the actual requirements, and for example, this step-length can be one or more pixels, can also with the proportional relation of the size of current window.The order of described traversal and mode also are arbitrarily, can be from left to right, from top to bottom, can also be from right to left, from top to bottom.The present invention does not impose any restrictions this.

Figure 11 shows the structural representation of the device that detects the target in the video image according to another embodiment of the present invention.Similar to embodiment shown in Figure 10, the device of the target in the detection video image shown in Figure 11 comprises target detection and sorter 1101, merges module 1102 and spider module 1103, and these modules have the function similar to corresponding module shown in Figure 10.Difference is, device shown in Figure 11 can also comprise that region-of-interest sets up module 1104.

In this embodiment, merge after the degree of confidence merging of module 1102 with the 3rd candidate target, amalgamation result can also be fed back to region-of-interest and set up module 1104.

Region-of-interest is set up module 1104 and is used for setting up one or more region-of-interest ROI, and outputing to spider module 1103 according to the amalgamation result that merges module 1102 feedbacks.Each region-of-interest ROI covers the zone corresponding with the amalgamation result of corresponding the 3rd candidate target.

Spider module 1103 also is used to utilize predetermined window (hereinafter being called second window) with predetermined step-length (hereinafter being called second step-length) each region-of-interest to be traveled through, and exports to target detection and sorter 1101.

Target detection and sorter 1101 are used for resulting each video in window is detected, and obtain one or more candidate targets (the 4th candidate target hereinafter referred to as).

Merge module 1102 and also be used for the 3rd candidate target that the adjacent size in position of target detection and sorter 1101 outputs is close and the 4th candidate target and degree of confidence merging thereof, as first candidate target and degree of confidence thereof.Then, merge module 1102 will be from multiple image detected one or more first candidate targets and degree of confidence thereof merge, thereby obtain one or more second candidate targets in the target frame.

In this embodiment, afterwards, set up region-of-interest every two field picture being traveled through (for the first time traversal), and region-of-interest is further traveled through (traversal for the second time) according to testing result, thereby further improve the accuracy that detects, reduce omission and flase drop.In one embodiment, can be by the zone corresponding with amalgamation result be suitably enlarged to determine each region-of-interest, promptly appropriateness enlarges the scope of region-of-interest.Can further reduce the possibility of omission and flase drop like this.For example,, then can should on length and width, expand one or more pixels respectively in the zone, thereby obtain corresponding region-of-interest according to size, movement velocity and the direction etc. of target if the zone corresponding with amalgamation result is rectangle.And for example,, then can should expand one or more pixels at radius in the zone, thereby obtain corresponding region-of-interest according to size, movement velocity and the direction etc. of target if the zone corresponding with amalgamation result is circular.

The mode of traversal can be similar to the mode that travels through for the first time for the second time, repeats no more here.Second window also can be different in size or in shape can be identical with first window.In order to carry out meticulousr traversal, second step-length can be less than first step-length.In other examples, second step-length also can be equal to or greater than first step-length.In one example, can avoid repeating traversal, promptly identical at first window and under the situation that step-length is different with second window size, the window's position of avoiding repeat search to search for, thus accelerate the speed handled.

In one example, region-of-interest was set up module 1104 before setting up region-of-interest, whether the amalgamation result of degree of confidence that can also judge the 3rd candidate target that the adjacent size in position is close is less than a certain predetermined threshold value (being called the 3rd threshold value), if, then abandon this amalgamation result, otherwise keep this amalgamation result and set up region-of-interest.

In one example, merge module 1102 after the degree of confidence that has merged the 4th close candidate target of the adjacent size in position, can also judge further that whether this amalgamation result is less than a certain predetermined threshold value (being called the 4th threshold value), if, then abandon this amalgamation result, otherwise, keep this amalgamation result, as the degree of confidence of first candidate target.The resulting first candidate target The corresponding area can be with the 4th candidate target with maximum confidence regional corresponding, can also cover the zone of the 4th candidate target of all correspondences.

Figure 12 shows the structural representation of the device that detects the target in the video image according to another embodiment of the present invention.Similar to the embodiment of Figure 10, the device of the target in the detection video image shown in Figure 12 comprises target detection and sorter 1201, merges module 1202 and spider module 1203, and these modules have the function similar to corresponding module shown in Figure 10.Difference is that the device of Figure 12 can also comprise that pyramid diagram looks like to set up module 1205.

Suppose to comprise in the video sequence multiple image Image of target frame ₀..., Image _i...., Image _P-1Represent.

Pyramid diagram looks like to set up module 1205 and is used for setting up every two field picture of described multiple image (as Image _i) multiple dimensioned pyramid diagram as Pyramid ₀..., Pyramid _j..., Pyramid _K-1, and each image in the pyramid diagram picture exported to spider module 1203.j?＝0，...，K-1，K＞1。

Spider module 1203 is used for utilizing predetermined window (as first window) to travel through with predetermined step-length (as first step-length) each image to the pyramid diagram picture, and resulting each video in window is outputed to target detection and sorter 1201.

1201 pairs of each video in windows from spider module 1203 of target detection and sorter detect, and obtain one or more the 3rd candidate targets and degree of confidence thereof.

Merge the degree of confidence merging that module 1202 is used for the 3rd candidate target that the adjacent size in position of target detection and sorter 1201 outputs is close, to obtain one or more first candidate targets and degree of confidence thereof.Then, merge module 1202 will be from multiple image detected one or more first candidate targets and degree of confidence thereof merge, thereby obtain one or more second candidate targets in the target frame.

Described multiple dimensioned pyramid diagram picture can adopt any suitable method to set up, in the description of this omission to it.The number of plies of described pyramid diagram picture can be set according to the needs of practical application, and the present invention does not do any qualification to it.The mode of traversal is similar to previous embodiment and/or example, repeats no more here.

In one example, merging module 1202 can also be mapped to the 3rd candidate target The corresponding area that pyramid diagram is looked like to detect and obtain in the original image, thereby determines the 3rd candidate target The corresponding area in original image.

In the above-described embodiments, the multiple dimensioned pyramid diagram picture by setting up image also looks like to travel through and detect to pyramid diagram, can further improve the accuracy of video image being carried out multi-target detection, reduces the possibility of omission and flase drop.

Figure 13 shows the structural representation of the device that detects the target in the video image according to another embodiment of the present invention.Similar to embodiment shown in Figure 12, the device of the target in the detection video image shown in Figure 13 can comprise that target detection and sorter 1301, merging module 1302, spider module 1303 and pyramid diagram look like to set up module 1305, and these modules have the function similar to corresponding module shown in Figure 12.Difference is, the device of Figure 13 can comprise that also mask sets up module 1306.

When 1303 pairs of pyramid diagrams of spider module look like to set up each image in the pyramid diagram picture that module 1305 sets up and travel through from the image of minimum dimension (as Pyramid ₀) beginning.Resulting each video in window is detected by target detection and sorter 1301, obtains one or more the 3rd candidate targets and degree of confidence thereof.

Handled N image in the pyramid diagram picture (as Pyramid at target detection and sorter 1301 ₀..., Pyramid _N-1, N＜K) afterwards merges module 1302 and target detection and sorter can be detected the degree of confidence merging of the 3rd candidate target that obtains, and amalgamation result is outputed to mask sets up module 1306.

Mask is set up module 1306 and is used for setting up one or more target masks according to this amalgamation result, and feeds back to spider module 1303.Each target mask corresponding to the corresponding zone of corresponding the 3rd candidate target.1303 couples of remaining K-N of spider module image Pyramid _N..., Pyramid _K-1When handling, target mask The corresponding area is not traveled through.

Module

1303,1301,1302 and 1306 can be carried out above-mentioned manipulation repeatedly, all the K tomographic images in handling the pyramid diagram picture.After one or more the 3rd candidate targets that obtain multiple image and degree of confidence thereof, further to handle by merging module 1302, processing procedure is identical with merging module 1202, repeats no more here.

As an example, mask was set up module 1306 before setting up the target mask, can also judge described N image Pyramid ₀..., Pyramid _N-1In the degree of confidence amalgamation result of close the 3rd candidate target of the adjacent size in position whether more than or equal to predetermined threshold value (being called the 5th threshold value), if then set up target mask Mask, otherwise do not set up target mask Mask.In this example, only when the degree of confidence of candidate target is enough high, just set up the target mask.Compare with the foregoing description, can reduce the possibility of omission.

As an example, device shown in Figure 13 can also comprise that region-of-interest sets up module 1304.Region-of-interest is set up module 1304 and be can be used for that also each region-of-interest of being set up is exported to pyramid diagram and look like to set up module 1305.Pyramid diagram looks like to set up the pyramid diagram picture that module 1305 can also be set up each region-of-interest, and exports to spider module 1303.Spider module 1303 can also utilize predetermined window (as second window) to look like to travel through with the pyramid diagram of predetermined step-length (as second step-length) to each region-of-interest, and each video in window exported to target detection and sorter 1301, to obtain one or more the 4th candidate targets and degree of confidence thereof.

In one example, in order to detect multi-class targets, target detection and sorter 1301 can comprise a plurality of sub-classifiers as shown in Figure 14, and wherein each sub-classifier can detect a class target.The quantity N of sub-classifier can be provided with according to other quantity of target class that needs in the practical application detect.

As an example, if target detection and sorter 1301 comprise a plurality of sub-classifiers that are used to detect different classes of target, device then shown in Figure 13 can also comprise target classification judge module 1307.Target classification judge module 1307 can be used for judging according to the category attribute that maximum confidence is worth pairing sub-classifier the classification of target.For example, after certain image pattern is imported a plurality of sub-classifiers, the degree of confidence of other sub-classifiers outputs is 0 if certain sub-classifier is exported certain degree of confidence, then target classification judge module 1307 can be determined the classification of target according to the pairing classification of sub-classifier of this degree of confidence of output, and this degree of confidence is then as the degree of confidence of this target.If the certain degree of confidence of sub-classifier output more than is arranged, then target classification judge module 1307 can be worth the classification that pairing classification (promptly exporting the pairing classification of sub-classifier of maximum confidence value) is determined this target according to maximum confidence.If exist two or more confidence values and maximal value to equate and correspond respectively to different classifications, then the category attribute of this target can be labeled as " uncertain ".In this example, owing to combine the detection information of multiple image, therefore, the detection of target category attribute had good robustness.

In the foregoing description and example, described threshold value (for example first threshold, second threshold value, the 3rd threshold value, the 4th threshold value and the 5th threshold value etc.) can be provided with according to concrete application scenarios and adjust, when being provided with greatly when threshold value, the target loss may increase, and being provided with hour when threshold value, the target false drop rate may increase, therefore, need select suitable threshold according to the actual requirements, repeat no more here.In addition, those of ordinary skill in the art should be understood that in the various embodiments described above and example described one or more sub-classifiers can adopt parallel schema to come work, can also adopt as other patterns such as serial modes and come work, are not described further at this.

In the description of the foregoing description and/or example, used terms such as " first window ", " second window " and " first step-length ", " second step-length " that traversal processing is described.In description, may adopt identical term, the term as the embodiment of Fig. 2, Fig. 4 has adopted " first window " and " first step-length " to different embodiment.Those of ordinary skill in the art should be understood that such description only is the convenience in order to narrate, and is not to mean that these two embodiment must adopt identical window or step-length.In fact, when traveling through, window and step-length all can be chosen according to actual needs, are not limited to above-mentioned description.

In the foregoing description and example, described first window and second window can be rectangular windows, and its size can be decided according to the actual requirements.Described first step-length and second step-length also can be decided according to the actual requirements, and for example, step-length can be one or more pixels, can also with the proportional relation of the size of current window.For ROI is carried out meticulousr traversal, can be provided with second step-length less than first step-length.

In the foregoing description and example, the order of described traversal and mode also are arbitrarily, can be from left to right, from top to bottom, can also be from right to left, from top to bottom.

In this manual, it only is for described feature is distinguished on literal that " first ", " second ", " the 3rd " reach statements such as " N ", clearly to describe the present invention.Therefore, it should be considered as having any determinate implication.

The multi-class targets that the method and apparatus of the foregoing description and example can be used in the video image detects and classification.Here, can there be certain similarity between the multiclass, such as classifications such as truck, car, buses.

Each forms module in the said apparatus, module can be configured by the mode of software, firmware, hardware or its combination.Dispose spendable concrete means or mode and be well known to those skilled in the art, do not repeat them here.Under situation about realizing by software or firmware, from storage medium or network the program that constitutes this software is installed to the computing machine with specialized hardware structure (multi-purpose computer 1600 for example shown in Figure 16), this computing machine can be carried out various functions etc. when various program is installed.

In Figure 16, CPU (central processing unit) (CPU) 1601 carries out various processing according to program stored among ROM (read-only memory) (ROM) 1602 or from the program that storage area 1608 is loaded into random-access memory (ram) 1603.In RAM 1603, also store data required when CPU 1601 carries out various processing or the like as required.CPU 1601, ROM 1602 and RAM 1603 are connected to each other via bus 1604.Input/output interface 1605 also is connected to bus 1604.

Following parts are connected to input/output interface 1605: importation 1606 (comprising keyboard, mouse or the like), output 1607 (comprise display, such as cathode ray tube (CRT), LCD (LCD) etc. and loudspeaker etc.), storage area 1608 (comprising hard disk etc.), communications portion 1609 (comprising that network interface unit is such as LAN card, modulator-demodular unit etc.).Communications portion 1609 is handled such as the Internet executive communication via network.As required, driver 1610 also can be connected to input/output interface 1605.Detachable media 1611 is installed on the driver 1610 as required such as disk, CD, magneto-optic disk, semiconductor memory or the like, makes the computer program of therefrom reading be installed to as required in the storage area 1608.

Realizing by software under the situation of above-mentioned series of processes, such as detachable media 1611 program that constitutes software is being installed such as the Internet or storage medium from network.

It will be understood by those of skill in the art that this storage medium is not limited to shown in Figure 16 wherein having program stored therein, distribute separately so that the detachable media 1611 of program to be provided to the user with equipment.The example of detachable media 1611 comprises disk (comprising floppy disk (registered trademark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trademark)) and semiconductor memory.Perhaps, storage medium can be hard disk that comprises in ROM 1602, the storage area 1608 or the like, computer program stored wherein, and be distributed to the user with the equipment that comprises them.

The present invention also proposes a kind of program product that stores the instruction code that machine readable gets.When described instruction code is read and carried out by machine, can carry out above-mentioned method according to the embodiment of the invention.

Correspondingly, being used for carrying the above-mentioned storage medium that stores the program product of the instruction code that machine readable gets is also included within of the present invention open.Described storage medium includes but not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick or the like.

In the above in the description to the specific embodiment of the invention, can in one or more other embodiment, use in identical or similar mode at the feature that a kind of embodiment is described and/or illustrated, combined with the feature in other embodiment, or the feature in alternative other embodiment.

Should emphasize that term " comprises/comprise " existence that refers to feature, key element, step or assembly when this paper uses, but not get rid of the existence of one or more further feature, key element, step or assembly or additional.

In addition, the time sequencing of describing during method of the present invention is not limited to is to specifications carried out, also can according to other time sequencing ground, carry out concurrently or independently.Therefore, the execution sequence of the method for describing in this instructions is not construed as limiting technical scope of the present invention.

Although the present invention is disclosed above by description to specific embodiments of the invention,, should be appreciated that all above-mentioned embodiment and example all are illustrative, and not restrictive.Those skilled in the art can design various modifications of the present invention, improvement or equivalent in the spirit and scope of claims.These modifications, improvement or equivalent also should be believed to comprise in protection scope of the present invention.

Claims

1. method that detects the target in the video image comprises:

Frame detects step: the multiple image that comprises target frame is detected respectively, to obtain one or more first candidate targets and degree of confidence thereof; And

Combining step: first candidate target and degree of confidence thereof in the described multiple image are merged, obtain one or more second candidate targets in the target frame.

2. the method for claim 1, wherein described frame detects step and comprises:

First traversal step: utilize first window every two field picture in the described multiple image to be traveled through, each video in window is detected to obtain one or more the 3rd candidate target and degree of confidence thereof with first step-length;

The first traversing result combining step: the 3rd position is adjacent, that size is close candidate target and degree of confidence thereof merge, to obtain described first candidate target and degree of confidence thereof.

3. method as claimed in claim 2, wherein, described frame detects step and also comprises:

The region-of-interest establishment step: according to the result of the described first traversing result combining step, set up one or more region-of-interests, each region-of-interest covers corresponding the 3rd candidate target;

Second traversal step: utilize second window each region-of-interest to be traveled through, each video in window is detected to obtain one or more the 4th candidate target and degree of confidence thereof with second step-length;

The second traversing result combining step: the 3rd position is adjacent, that size is close candidate target and the 4th candidate target and degree of confidence thereof merge, to obtain described first candidate target and degree of confidence thereof.

4. method as claimed in claim 2, wherein, described first traversal step comprises:

Frame pyramid establishment step: set up the multiple dimensioned pyramid diagram picture of every two field picture, this multiple dimensioned pyramid diagram looks like to comprise the K layer, K＞1;

The 3rd traversal step utilizes described first window with described first step-length in the described pyramid diagram picture each to be traveled through, and each video in window is detected to obtain one or more the 3rd candidate target and degree of confidence thereof.

5. method as claimed in claim 4, wherein:

Described the 3rd traversal step is that the image of minimum dimension from described pyramid diagram picture begins to handle, and comprises:

The mask establishment step: based on to first to the testing result of N pyramid diagram picture and set up one or more target masks, each target mask is corresponding to corresponding the 3rd candidate target,

Wherein, when all the other K-N image is handled, zone corresponding with each target mask in the described K-N image is not traveled through, wherein 1≤N＜K.

6. method as claimed in claim 3, wherein

After the described region-of-interest establishment step, described method also comprises: set up the multiple dimensioned pyramid diagram picture of each region-of-interest,

Described second traversal step comprises: utilize described second window to look like to travel through with the pyramid diagram of described second step-length to each region-of-interest, each video in window is detected to obtain one or more the 4th candidate target and degree of confidence thereof.

7. the method for claim 1, wherein described combining step comprises: the degree of confidence of first candidate target that the adjacent size in the position in the described multiple image is close merges.

8. method as claimed in claim 7, wherein, the degree of confidence of first candidate target that the adjacent size in position is close in the described multiple image merged comprise:, perhaps the degree of confidence of the first close candidate target of the adjacent size in the position in the described multiple image is carried out normalization and to the summation of the degree of confidence after the normalization or average the degree of confidence summation or the mean value of the first close candidate target of the adjacent size in the position in the described multiple image.

9. it is to utilize to comprise that the target detection of a plurality of sub-classifiers and sorter carry out that the method for claim 1, wherein described frame detects step, and each sub-classifier can detect a class target.

10. method as claimed in claim 9 also comprises: according to the pairing classification of sub-classifier of output maximum confidence, determine the classification of target.

11. a device that detects the target in the video image comprises:

Target detection and sorter are used for video sequence is comprised that the multiple image of target frame detects, to obtain one or more first candidate targets and degree of confidence thereof;

Merge module, be used for the degree of confidence of first candidate target of described multiple image is merged, to obtain one or more second candidate targets in the target frame.

12. device as claimed in claim 11 also comprises spider module,

Described spider module is used to utilize first window with first step-length each image to be traveled through, and each video in window is outputed to described target detection and sorter, obtaining one or more the 3rd candidate targets and degree of confidence thereof,

The degree of confidence that described merging module also is used for the 3rd candidate target that the adjacent size in position is close merges, to obtain described first candidate target and degree of confidence thereof.

13. device as claimed in claim 12 comprises that also region-of-interest sets up module,

Described region-of-interest is set up module and is used for amalgamation result according to the degree of confidence of described the 3rd candidate target of described merging module feedback, set up one or more region-of-interests, and outputing to described spider module, each region-of-interest covers corresponding the 3rd candidate target;

Described spider module also is used to utilize second window with second step-length each region-of-interest to be traveled through, and each video in window is outputed to described target detection and sorter, to obtain one or more the 4th candidate targets and degree of confidence thereof;

The degree of confidence that described merging module also is used for the 3rd candidate target that the adjacent size in position is close and the 4th candidate target merges, to obtain described first candidate target and degree of confidence thereof.

14. device as claimed in claim 12 comprises that also pyramid diagram looks like to set up module,

Described pyramid diagram looks like to set up the multiple dimensioned pyramid diagram picture that comprises the K layer that module is used to set up every two field picture, and each image in the described pyramid diagram picture is exported to described spider module, wherein, and K＞1,

Described spider module also is used for utilizing described first window to travel through with described first step-length each image to described pyramid diagram picture, and each video in window outputed to described target detection and sorter, to obtain one or more the 3rd candidate targets and degree of confidence thereof.

15. device as claimed in claim 14, wherein,,

The image that described spider module is configured to minimum dimension from described pyramid diagram picture begins to travel through,

Described merging module is configured to and will looks like to detect and the degree of confidence of the 3rd candidate target that obtains merges to the N pyramid diagram to first, and amalgamation result is outputed to described mask sets up module,

This device comprises that also mask sets up module, be used for setting up one or more target masks according to this amalgamation result, and output to described spider module, make described spider module not travel through to zone corresponding in the described K-N image with each target mask, 1≤N＜K wherein, each target mask is corresponding to corresponding the 3rd candidate target.

16. device as claimed in claim 13, wherein:

Described region-of-interest is set up module and is used for that also each region-of-interest is exported to described pyramid diagram and looks like to set up module, and described pyramid diagram looks like to set up the pyramid diagram picture that module also is used to set up each region-of-interest, and exports to described spider module,

Described spider module also is used to utilize described second window to look like to travel through with the pyramid diagram of described second step-length to each region-of-interest, and each video in window exported to described target detection and sorter, to obtain one or more the 4th candidate targets and degree of confidence thereof.

17. device as claimed in claim 11, wherein, described merging module also is used for coming the degree of confidence of first candidate target in the described multiple image is merged by the degree of confidence that merges the candidate target that described multiple image position is adjacent, size is close.

18. device as claimed in claim 17, wherein, described merging module also is used for degree of confidence summation or the mean value by the candidate target adjacent to described multiple image position, that size is close, perhaps the degree of confidence of the candidate target that the position is adjacent in the described multiple image, size is close is carried out normalization and to the summation of the degree of confidence after the normalization or average the degree of confidence to first candidate target in the described multiple image to merge.

19. device as claimed in claim 11, wherein, described target detection and sorter comprise a plurality of sub-classifiers, and each sub-classifier can detect a class target.

20. device as claimed in claim 19 also comprises target classification determination module, is used for the pairing classification of sub-classifier according to the output maximum confidence, determines the classification of target.