CN103294685A

CN103294685A - Method and equipment used for generating image description vector as well as image detection method and equipment

Info

Publication number: CN103294685A
Application number: CN2012100441560A
Authority: CN
Inventors: 姜涌; 张文文; 胥立丰
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2012-02-24
Filing date: 2012-02-24
Publication date: 2013-09-11
Anticipated expiration: 2032-02-24
Also published as: CN103294685B

Abstract

The invention relates to a method and equipment used for generating an image description vector as well as an image detection method and equipment. The method used for generating the image description vector comprises the steps as follows: a coding step: each pixel region in multiple pixel regions of an image is coded to M N-bit binary codes; a feature extraction step: if a mth N-bit binary code in the M N-bit binary codes corresponding to the pixel regions in the multiple pixel regions is matched with a mth category of specific code pattern, the mth N-bit binary code is extracted to serve as a notable N-bit binary code of the mth category; and a step of generating the image description vector: for each category of M categories, the number of the notable N-bit binary codes is counted so that the image description vector is generated.

Description

Be used for generating method and apparatus, image detecting method and the equipment of iamge description vector

Technical field

The present invention relates to for the method and apparatus that generates the iamge description vector, and image detecting method and image detecting apparatus.

Background technology

In nearly decades, obtained very much progress for the detection technique of special object or target (such as people, face, vehicle etc.).For the form (morphology) of describing image, can from image, extract feature or the pattern (pattern) of identification, to form image descriptor (iamge description vector).In some technology, the training process that utilizes great amount of samples is necessary.And for more generally or the object detection that need not to train, effective and strong feature descriptor (descriptor vector) is very important.

In recent years, strong (grey-scale invariant) local grain descriptor that does not change because of gray scale as the microstructure that is used for the description image, partial binary pattern (LBP) descriptor has been proposed and local three-shift pattern (LTP) descriptor (for example sees also T.Ojala, " the Multi-resolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns " of M.Pietikainen and T.Maenpaa, IEEE Transaction on pattern analysis and machine intelligence, 24 (7), 2002 and " the Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions " of Xiaoyang Tan and Bill Triggs, IEEE Transactions on Image Processing, PP.1635-1650,19 (6), 2010).These two kinds of patterns (image descriptor) are widely used in area of facial recognition, and have obtained very ten-strike.

Now with reference to Fig. 1 and Fig. 2 LBP descriptor and LTP descriptor are described tout court.

Fig. 1 is the synoptic diagram that the principle of LBP descriptor is shown.

As shown in Figure 1, the LBP method becomes 8 binary codes with each pixel coder in the image.More specifically, picture element matrix for 3 * 3, if the pixel value of neighbor is more than or equal to the pixel value of center pixel, then the position of this neighbor of expression in 8 binary codes is set as " 1 ", if and the pixel value of neighbor is less than the pixel value of center pixel, then the position of this neighbor of expression in 8 binary codes is set as " 0 ".By this way, by the pixel value with respect to center pixel eight neighbors are carried out threshold decision, form 8 binary codes for this center pixel.In Fig. 1, white point represents that binary digit " 1 " and black color dots represent binary digit " 0 ".The LBP feature can be described pixel (center pixel) texture structure on every side that is encoded.

Yet single threshold value and two pixels relatively make the LBP method very responsive for noise, and reliability will significantly reduce under strong illumination.In addition, this encoding scheme is gathered the LBP feature limits for only presenting little texture structure, as brighter or darker edge and point.Further, by the structure of LBP feature representative only catch pixel around feature, and the feature of this pixel itself is lost.

Fig. 2 is the synoptic diagram that the principle of LTP descriptor is shown.

As shown in Figure 2, the LTP method is 8 ternary codes with each pixel coder in the image.More particularly, picture element matrix for 3 * 3, if the pixel value of neighbor is greater than upper limit threshold, then the position of this neighbor of expression in 8 ternary codes is set as " 1 ", if the pixel value of neighbor is not more than this upper limit threshold and is not less than lower threshold, then the position of this neighbor of expression in 8 ternary codes is set as " 0 ", and if the pixel value of neighbor less than this lower threshold, then the position of this neighbor of expression in 8 ternary codes is set as " 1 ".Upper limit threshold can be set as that (center pixel value+T), (center pixel value-T), wherein T is the constant allowance that can set according to suitable mode and lower threshold can be set as.By this way, by the pixel value with respect to center pixel eight neighbors are carried out dual thresholds and judge, form 8 ternary codes for this center pixel.In Fig. 2, white point is represented trit " 1 ", and black color dots is represented trit " 1 ", and grey color dot is represented trit " 0 ".

By using dual thresholds to judge, to compare with the LBP feature, the LTP feature can be described pixel (center pixel) texture structure on every side that is encoded with the robustness that improves, and can keep more detailed picture structure.

Summary of the invention

But LTP feature presentation video and on every side 3 ⁸=6561 kinds of patterns (structure), this far more than the LBP feature can present 2 ⁸=256 kinds of patterns.Yet the present inventor finds, the undesirable structure of many expressions in these 6561 kinds of LTP patterns (for example, the structure of noise pattern).Poor efficiency when the too many pattern of storing and using causes describing image.That is to say that though the LTP method can be described image more subtly and with stronger robustness than LBP method, efficient significantly descends.

Therefore, need a kind of new method for generation iamge description vector, this iamge description vector can be described image subtly simultaneously efficiently.

In order to solve above technical matters, the invention provides a kind of for the method that generates the iamge description vector, comprise: coding step, each pixel region in a plurality of pixel regions of image is encoded to M N position binary code, wherein, described M N position binary code corresponds respectively to M classification, each expression adjacent pixel regions adjacent with corresponding pixel area of described N position binary code; Characteristic extraction step, for m N position binary code in each M the N position binary code corresponding with each pixel region in described a plurality of pixel regions: if the special code pattern match of the m classification in this m N position binary code and the described M classification is then extracted this m N position binary code the remarkable N position binary code as this m classification; And the iamge description vector generates step, for each classification in the described M classification, the quantity of remarkable N position binary code counted, and with formation iamge description vector, wherein, M is 3 or bigger integer, and N is 3 or bigger integer, 1≤m≤M.

In addition, in order to solve above technical matters, the invention provides a kind of for the equipment that generates the iamge description vector, comprise: coding unit, be arranged to each pixel region in a plurality of pixel regions of image is encoded to M N position binary code, wherein, described M N position binary code corresponds respectively to M classification, each expression adjacent pixel regions adjacent with corresponding pixel area of described N position binary code; Feature extraction unit, be arranged to for m N position binary code in each M the N position binary code corresponding with each pixel region in described a plurality of pixel regions: if the special code pattern match of the m classification in this m N position binary code and the described M classification is then extracted this m N position binary code the remarkable N position binary code as this m classification; And iamge description vector generation unit, be arranged to for each classification in the described M classification, the quantity of remarkable N position binary code is counted, to form the iamge description vector, wherein, M is 3 or bigger integer, N is 3 or bigger integer, 1≤m≤M.

In addition, the invention provides a kind of method for the iamge description vector that generates multiresolution, comprise: the first iamge description vector generates step, by carry out the above-mentioned method that is used for generating the iamge description vector for input picture, generates the first iamge description vector; The convergent-divergent step is carried out convergent-divergent to described input picture, to generate zoomed image; The second iamge description vector generates step, by carry out the above-mentioned method that is used for generating the iamge description vector for described zoomed image, generates the second iamge description vector; And concatenation step, the described first iamge description vector of cascade and the second iamge description vector are to generate the iamge description vector of multiresolution.

In addition, the invention provides a kind of equipment for the iamge description vector that generates multiresolution, comprise: the first iamge description vector generation unit, be arranged to by carry out the above-mentioned method that is used for generating the iamge description vector for input picture, generate the first iamge description vector; Unit for scaling is arranged to described input picture is carried out convergent-divergent, to generate zoomed image; The second iamge description vector generation unit is arranged to by carry out the above-mentioned method that is used for generating the iamge description vector for described zoomed image, generates the second iamge description vector; And the cascade unit, be arranged to the described first iamge description vector of cascade and the second iamge description vector to generate the iamge description vector of multiresolution.

In addition, the invention provides a kind of image detecting method, comprising: input step, the area image of input object image; The iamge description vector generates step, described area image is carried out above-mentioned being used for generate the method for iamge description vector and generate the iamge description vector, describes vector as area image; Calculation procedure, calculate that described area image is described vector and in database in advance the target image of registration distance between the vector is described, it is to form by target image being carried out the method that above-mentioned being used for generate the iamge description vector in advance that described target image is described vector; Determining step is if described distance less than specific threshold, then is defined as detecting the area image corresponding with target image, otherwise, if described distance is not less than specific threshold, then the position of adjustment region image and/or size are then wanted processed area image with acquisition.

In addition, the invention provides a kind of image detecting apparatus, comprising: input block is arranged to the area image of input object image; Iamge description vector generation unit is arranged to described area image is carried out the above-mentioned method that is used for generation iamge description vector and generated the iamge description vector, describes vector as area image; Computing unit, be arranged to calculate that described area image is described vector and in database in advance the target image of registration distance between the vector is described, it is to form by target image being carried out the method that above-mentioned being used for generate the iamge description vector in advance that described target image is described vector; Determining unit if be arranged to described distance less than specific threshold, then is defined as detecting the area image corresponding with target image, otherwise, if described distance is not less than specific threshold, then the position of adjustment region image and/or size are then wanted processed area image with acquisition.

Have benefited to describe image subtly simultaneously efficiently according to the method and apparatus for generation iamge description vector of the present invention.

From the following description of reference accompanying drawing, other property feature of the present invention and advantage will become clear.

Description of drawings

The accompanying drawing of incorporating the part of instructions and formation instructions into illustrates embodiments of the invention, and is used from explanation principle of the present invention with description one.

Fig. 1 is the synoptic diagram that the principle of LBP descriptor is shown.

Fig. 2 is the synoptic diagram that the principle of LTP descriptor is shown.

Fig. 3 is the schematic block diagram that the hardware configuration of the computer system 1000 that can implement embodiments of the invention is shown.

Fig. 4 is according to the functional block diagram for the equipment that generates the iamge description vector of the present invention.

Fig. 5 is according to the indicative flowchart for the method that generates the iamge description vector of the present invention.

Fig. 6 illustrates some that can be used for pixel region of the present invention may shapes and the example of layout.

Fig. 7 is the functional block diagram according to the equipment that is used for generation iamge description vector of exemplary embodiment of the present invention.

Fig. 8 is the indicative flowchart according to the method that is used for generation iamge description vector of exemplary embodiment of the present invention.

Fig. 9 is be used to the exemplary diagram that the iamge description vector that obtains by iamge description vector generation unit is shown.

Figure 10 is the indicative flowchart according to the method for the iamge description vector that is used for the generation multiresolution of exemplary embodiment of the present invention.

Figure 11 is the functional block diagram according to the equipment of the iamge description vector that is used for the generation multiresolution of exemplary embodiment of the present invention.

Figure 12 is the indicative flowchart according to the exemplary process of carrying out in three-shift coding substep of exemplary embodiment of the present invention.

Figure 13 illustrates the example for diagram processing shown in Figure 12.

Figure 14 illustrates for the example of diagram according to the processing of the code conversion step of exemplary embodiment of the present invention.

Figure 15 illustrates the example according to the specific image feature of the first category of exemplary embodiment of the present invention and the 3rd classification (significantly binary code pattern).

Figure 16 illustrates the example according to the specific image feature of second classification of exemplary embodiment of the present invention (significantly binary code pattern).

Figure 17 (a) and 17 (b) illustrate the illustrative example of the iamge description vector that generates according to the original image of exemplary embodiment of the present invention with for this original image.

Figure 18 illustrates the indicative flowchart of the processing of image detecting method according to an embodiment of the invention.

Figure 19 is the block diagram of schematically illustrated image detecting apparatus according to an embodiment of the invention.

Embodiment

Describe embodiments of the invention in detail hereinafter with reference to accompanying drawing.

Note that similar reference number and letter refer to similar project among the figure, thereby in case in a width of cloth figure, defined a project, just not need after figure in discussed.

Fig. 3 is the block diagram that the hardware configuration of the computer system 1000 that can implement embodiments of the invention is shown.

As shown in Figure 3, computer system comprises computing machine 1110.Computing machine 1110 comprises processing unit 1120, system storage 1130, fixed non-volatile memory interface 1140, removable non-volatile memory interface 1150, user's input interface 1160, network interface 1170, video interface 1190 and the output peripheral interface 1195 that connects via system bus 1121.

System storage 1130 comprises ROM (ROM (read-only memory)) 1131 and RAM (random access memory) 1132.BIOS (Basic Input or Output System (BIOS)) 1133 resides among the ROM1131.Operating system 1134, application program 1135, other program module 1136 and some routine data 1137 reside among the RAM 1132.

Fixed non-volatile memory 1141 such as hard disk is connected to fixed non-volatile memory interface 1140.Fixed non-volatile memory 1141 for example can storage operating system 1144, application program 1145, other program module 1146 and some routine data 1147.

Removable nonvolatile memory such as floppy disk 1151 and CD-ROM drive 1155 is connected to removable non-volatile memory interface 1150.For example, diskette 1 152 can be inserted in the floppy disk 1151, and CD (CD) 1156 can be inserted in the CD-ROM drive 1155.

Input equipment such as mouse 1161 and keyboard 1162 is connected to user's input interface 1160.

Computing machine 1110 can be connected to remote computer 1180 by network interface 1170.For example, network interface 1170 can be connected to remote computer 1180 via LAN (Local Area Network) 1171.Perhaps, network interface 1170 can be connected to modulator-demodular unit (modulator-demodulator) 1172, and modulator-demodular unit 1172 is connected to remote computer 1180 via wide area network 1173.

Remote computer 1180 can comprise the storer 1181 such as hard disk, and it stores remote application 1185.

Video interface 1190 is connected to monitor 1191.

Output peripheral interface 1195 is connected to printer 1196 and loudspeaker 1197.

Computer system shown in Figure 3 only is illustrative and never is intended to invention, its application, or uses are carried out any restriction.

Computer system shown in Figure 3 can be incorporated in any embodiment, can be used as stand-alone computer, perhaps also can be used as the disposal system in the equipment, can remove one or more unnecessary assembly, also can add one or more additional assembly to it.

Fig. 4 illustrates according to the functional block diagram for the equipment 4000 that generates the iamge description vector of the present invention.

As shown in Figure 4, comprise for the equipment 4000 that generates the image descriptor vector according to of the present invention: coding unit 4100, be arranged to each pixel region in a plurality of pixel regions of image is encoded to M N position binary code, wherein, each pixel region in described a plurality of pixel region comprises one or more pixel, described M N position binary code corresponds respectively to M classification, each expression adjacent pixel regions adjacent with corresponding pixel area of described N position binary code; Feature extraction unit 4200, be arranged to for m N position binary code in each M the N position binary code corresponding with each pixel region in described a plurality of pixel regions: if the special code pattern match of the m classification in described M the classification in the middle of this m N position binary code and a plurality of special code pattern, then with the remarkable N position binary code of this m N position binary code extraction as this m classification; And iamge description vector generation unit 4300, be arranged to for each classification in the described M classification, to with described a plurality of special code patterns in the quantity of remarkable N position binary code of each pattern coupling count, to form the iamge description vector.M is 3 or bigger integer, and N is 3 or bigger integer, 1≤m≤M.

Fig. 5 illustrates according to the indicative flowchart for the method that generates the iamge description vector of the present invention.This method that is used for generation iamge description vector can be implemented by equipment shown in Figure 4 4000.

As shown in Figure 5, the method that is used for generation iamge description vector comprises: coding step S5100, each pixel region in a plurality of pixel regions of image is encoded to M N position binary code, wherein, each pixel region in described a plurality of pixel region comprises one or more pixel, described M N position binary code corresponds respectively to M classification, each expression adjacent pixel regions adjacent with corresponding pixel area of described N position binary code; Characteristic extraction step S5200, for m N position binary code in each M the N position binary code corresponding with each pixel region in described a plurality of pixel regions: if the special code pattern match of the m classification in described M the classification in the middle of this m N position binary code and a plurality of special code pattern, then with the remarkable N position binary code of this m N position binary code extraction as this m classification; And the iamge description vector generates step S5300, for each classification in the described M classification, to counting with the quantity of the remarkable N position binary code of each pattern coupling of described a plurality of special code patterns, with formation iamge description vector.M is 3 or bigger integer, and N is 3 or bigger integer, 1≤m≤M.

According to the present invention, can handle each pixel region of image in this manner, to obtain M N position binary code for each pixel region.In addition, if each the N position binary code in the M of a pixel N position binary code is remarkable N position binary code, then this N position binary code can be corresponding to the special code pattern of respective classes, otherwise corresponding to non-remarkable N position binary code.

According to the present invention, owing to only extract remarkable N position binary code with the special code pattern match, can improve the efficient that generates the iamge description vector.

In addition, according to the present invention, because a pixel region is encoded as M the N position binary code of separately handling for the quilt of this pixel region (M 〉=3 and N 〉=3), image can be described subtly.

Preferably, described a plurality of pixel regions of image do not comprise the pixel region that is positioned at image boundary, this be because the quantity of the adjacent pixel regions of borderline pixel region less than N.Yet, also can give borderline pixel region with default pixel region thresholding.

According to an illustrative example, each in described a plurality of pixel regions only is made of a pixel.Can supply alternatively, each in described a plurality of pixel regions also can be made of a plurality of pixels.Described a plurality of pixel region can have the pixel of equal amount or varying number.

Pixel region can be arranged to matrix.In this case, each pixel region can have rectangular shape, and each pixel region has eight adjacent pixel regions (adjacent pixel regions of top, below, the left side, the right, upper left, upper right, lower-left and bottom right).In this case, N is 8.

Pixel region also can be arranged to hexagon.In this case, each pixel region can have hexagonal shape, and each pixel region has six adjacent pixel regions (adjacent pixel regions of top, below, upper left, upper right, lower-left and bottom right).In this case, N is 6.

Fig. 6 illustrates some possibility shapes of pixel region and/or the example of layout.In Fig. 6, white polygon is represented the center pixel zone, and the polygon of drawing oblique line is represented the adjacent pixel regions in center pixel zone.

Though described some object lessons for shape and the layout of pixel region, pixel region can have other shape and/or layout (such as other polygonal other layout).Therefore, N should not be limited to any specific integer, and can be to be equal to or greater than any integer of 3.

In a kind of illustrative embodiments of the present invention, the adjacent pixel regions in and m the pixel region thresholding scope of pixel region thresholding in M pixel region thresholding scope adjacent with corresponding pixel region (center pixel zone) of m N position binary code representation in M N position binary code.Under the situation that each pixel region is made of a pixel, the pixel region thresholding is the pixel value (intensity) of pixel.Under the situation that each pixel region is made of a plurality of pixels, the pixel region thresholding is the combined value of the pixel value of the described a plurality of pixels in this pixel region.Described combined value can for example be but be not limited in arithmetic mean value, geometrical mean, weighted mean value and the median one.For the pixel region thresholding, also any other suitable value of definable.

The pixel value of gray level image can be described with gray level.The pixel value of coloured image can followingly be described.More specifically, if the color model of coloured image is the RGB model, then pixel value can be the value in any passage in red, green and the blue channel.If the color model of coloured image is the HSL model, then pixel value can be the value in any passage in tone, saturation degree and the luminance channel.In addition, because any color in the coloured image can be interpreted as the polynary array of several Color Channels, so pixel value can be the value in any Color Channel of any color model, and described color model includes but not limited to RGB model and HSL model.

In an exemplary embodiment of the present invention, N equal 8 and M equal 3.This is arranged as the situation of matrix corresponding to pixel region.In this case, M N position binary code is three 8 binary codes, and M classification is three classifications.These three classifications for example can comprise high pixel region thresholding classification, intermediate pixel regional value classification and low pixel region thresholding classification.More particularly, the pixel region thresholding of the pixel region corresponding with 8 binary codes of high pixel region thresholding classification is in the first pixel region thresholding scope, the pixel region thresholding of the pixel region corresponding with 8 binary codes of intermediate pixel regional value classification is in the second pixel region thresholding scope, and the pixel region thresholding of the pixel region corresponding with 8 binary codes of low pixel region thresholding classification is in the 3rd pixel region thresholding scope.

In a kind of illustrative embodiments of the present invention, for given pixel region (center pixel zone), the first pixel region thresholding scope can be wherein the pixel region thresholding and the difference between the pixel region thresholding of described given pixel region greater than the pixel region thresholding scope of first threshold, the second pixel region thresholding scope can be that wherein pixel region thresholding and the difference between the pixel region thresholding of described given pixel region is not more than first threshold and is not less than the pixel region thresholding scope of second threshold value, and the 3rd pixel region thresholding scope can be wherein the pixel region thresholding and the difference between the pixel region thresholding of described given pixel region less than the pixel region thresholding scope of second threshold value.In other words, the adjacent pixel regions in first category and the 3rd classification and center pixel zone are significantly different aspect the pixel region thresholding, and the adjacent pixel regions in second classification and center pixel zone are similar or basic identical aspect the pixel region thresholding.

In a kind of illustrative embodiments of the present invention, described a plurality of special code patterns (significantly pattern) can comprise the first special code set of patterns and the second special code set of patterns.The described first special code set of patterns is made of first mode subset and second mode subset, and the described second special code set of patterns is made of the three-mode subclass.

In each 8 binary code of first mode subset, have more now once to low level being converted to from high level, and being converted to from the low level to the high level has more now once.First mode subset can be applicable to wherein, and adjacent pixel regions and center pixel zone are having more different first category and the 3rd classification aspect the pixel region thresholding.Pattern in first mode subset may be corresponding to the circle in the image, arc, straight line or point.

8 binary codes of each of second mode subset are disymmetric 8 binary codes, wherein, the pattern that produces by 8 positions that disymmetric 8 binary codes are arranged in Equal round all is axisymmetric about two axles of two orthogonal directions.Second mode subset can be applicable to wherein, and adjacent pixel regions and center pixel zone are having more different first category and the 3rd classification aspect the pixel region thresholding.Pattern in second mode subset may be corresponding to the parallel lines in the image.

8 binary codes of each of three-mode subclass are 8 binary codes of symmetry, wherein, are arranged in pattern that 8 positions of Equal round produce about the axle of at least one direction and rotational symmetry by 8 binary codes with symmetry.The three-mode subclass can be applicable to wherein, and adjacent pixel regions can be considered to the center pixel zone in similar or essentially identical second classification aspect the pixel region thresholding.Pattern in the three-mode subclass may be corresponding to point, turning, cross spider or the straight line in the image.

Though more than described the exemplary of special code pattern, the special code pattern is not limited thereto object lesson.In fact, how and carry out definition to the special code pattern of remarkable N position binary code the structure most probable that can consider image.That is to say that other special code pattern also is possible, as long as undesirable structure that the typical structure of the image that their reflections will be described and eliminating occur owing to noise than possibility.

Though equal 3 and N some object lessons of equaling 8 at the above M wherein of having described, it should be noted that the value of N and M is not limited to above example.As mentioned above, if pixel region is arranged in the mode that is different from matrix, then N is different from 8 integer.In addition, M also can be 4 or bigger, and in this case, by using three or more thresholdings with respect to the center pixel zone, the pixel region thresholding of adjacent pixel regions can be classified as 4 or more classification.Can correspondingly design corresponding to the special code pattern of these classifications and the concrete corresponding relation between special code pattern and these classifications.

Now with reference to Fig. 7 and 8 one exemplary embodiment of the present invention are described.Fig. 7 is the functional block diagram according to the equipment 4000 that is used for generation iamge description vector of one exemplary embodiment of the present invention.Fig. 8 is the indicative flowchart according to the method that is used for generation iamge description vector of one exemplary embodiment of the present invention.

As shown in Figure 7, coding unit 4100 can comprise: three-shift coding subelement 4110 is arranged in described a plurality of pixel regions each is encoded to 8 ternary codes; With code conversion unit 4120, be arranged to each 8 ternary code is transformed into three 8 binary codes, wherein, each 8 binary code in described three 8 binary codes are corresponding to a level in the three-shift level.

More specifically, the position corresponding to the high level in 8 ternary codes (" 1 ") in first 8 binary codes can be set as " 1 ", the position corresponding to the intermediate level in 8 ternary codes (" 0 ") in second 8 binary code can be set as " 1 ", and the position corresponding to the low level in 8 ternary codes (" 1 ") in the 3rd 8 binary code can be set as " 1 ".In other words, which position in 8 ternary codes of first 8 binary code indications is in high level, which position in 8 ternary codes of second 8 binary code indication level that mediates, and which position in 8 ternary codes of the 3rd 8 binary codes indication is in low level.

For example, a pixel region is encoded as 8 ternary codes 111 (1) (1) 000 by three-shift coding subelement 4110, these 8 ternary codes are transformed into three 8 binary codes by code conversion unit 4120 then, described three 8 binary codes be corresponding to ternary code high level (" 1 ") 11100000, corresponding to the intermediate level (" 0 ") of ternary code 00000111 and corresponding to 00011000 of the low level (" 1 ") of ternary code.

When carrying out the three-shift coding, first level (high level) of three-shift level can be corresponding to high pixel region thresholding classification, second level (intermediate level) of three-shift level can be corresponding to intermediate pixel regional value classification, and the 3rd level (low level) of three-shift level can be corresponding to low pixel region thresholding classification.Be noted that three level " 1 ", " 1 " and " 0 " can be used equally, and can exchange each other, because they are actually three marks for three different classes of or states of expression.

For example, above-mentioned LTP feature can be used as described 8 ternary codes.

As shown in Figure 7, iamge description vector generation unit 4300 can comprise: first subelement 4310, be arranged to respectively for each classification in the described M classification, to with described a plurality of special code patterns in the quantity of remarkable N position binary code of each pattern coupling count, to form the iamge description vector part for respective classes; And second subelement 4320, be arranged to the iamge description vector part of the described M of a cascade classification, to form the iamge description vector.

Fig. 9 is be used to the exemplary diagram that the iamge description vector that obtains by iamge description vector generation unit 4300 is shown.In Fig. 9, the longitudinal axis is represented the counting of pixel region, and transverse axis is represented specific code pattern (the special code pattern P of first category of remarkable N position binary code ₁₁, P ₁₂, P ₁₃, P ₁₄..., the special code pattern P of second classification ₂₁, P ₂₂, P ₂₃, P ₂₄..., and the special code pattern P of the 3rd classification ₃₁, P ₃₂, P ₃₃, P ₃₄..., etc.).For example, the counting of the pixel region of each special code pattern of the first category in the white bars presentation video among Fig. 9, the counting of the pixel region of each special code pattern of second classification in the picture oblique line bar presentation video among Fig. 9, and the counting of the pixel region of each special code pattern of the 3rd classification in the solid bars presentation video among Fig. 9.Each bar is corresponding to the concrete special code pattern (significantly pattern) of respective classes.For example, the N position binary code of the first category in article one presentation video in the white bars and concrete remarkable pattern P ₁₁The counting of the pixel region that is complementary is 110.Similarly, the N position binary code of second classification in the 3rd presentation video of picture oblique line bar and concrete remarkable pattern P ₂₃The counting of the pixel region that is complementary is 40.Iamge description vector among Fig. 9 can be represented as (110,150,180,130 ... 110,70,40,90 ..., 160,190,150,170......).

Can also be further with the bar of the tale of the non-remarkable N of expression position binary code and vector cascade shown in Figure 9.In this case, the summation of the component in the vector can be the pixel region in the image quantity M doubly.

As shown in Figure 7, equipment 4000 can further comprise normalization unit 4400, and this normalization unit 4400 is arranged to the pixel region quantity of described iamge description vector divided by described a plurality of pixel regions, to form normalized iamge description vector.By this way, the iamge description vector can be insensitive for the overall quantity of the pixel region in the image.

Similarly, as shown in Figure 8, coding step S5100 can comprise: three-shift coding substep S5110 is encoded to 8 ternary codes with each pixel region in described a plurality of pixel regions; With code conversion step S5120, each 8 ternary code is transformed into described three 8 binary codes, wherein, each 8 binary code in described three 8 binary codes are corresponding to a level of three-shift level.Can carry out three-shift coding substep S5110 by the three-shift subelement 4110 of encoding, can come actuating code shift step S5120 by code conversion unit 4120.

The iamge description vector generates step S5300 and can comprise: the first substep S5310, respectively for each classification in the described M classification, to with described a plurality of special code patterns in the quantity of remarkable N position binary code of each pattern coupling count, to form the iamge description vector part for respective classes; And the second substep S5320, the iamge description vector part of the described M of a cascade classification is to form the iamge description vector.Can carry out the first substep S5310 by first subelement 4310, can carry out the second substep S5320 by second subelement 4320.

Method according to exemplary embodiment can further comprise normalization step S5400, with the pixel region quantity of described iamge description vector divided by described a plurality of pixel regions, to form normalized iamge description vector.Can carry out normalization step S5400 by normalization unit 4400.

In some cases, the fine microstructure of the coarse texture of image and image may be very different.For example, high-definition picture has many fine textures features but general outline possibility and not obvious.When image was scaled reduced size (having low resolution), detail textures was lost, and the coarse contour of image displays.That is to say that only the iamge description under a kind of resolution may be not comprehensive.Therefore, it will be useful obtaining more fully iamge description by the iamge description vector that uses multiresolution.

Figure 10 is the process flow diagram according to the method for the iamge description vector that is used for the generation multiresolution of exemplary embodiment of the present invention.

As shown in figure 10, the method that is used for the iamge description vector of generation multiresolution comprises: first iamge description vector generation step S6100 by for input picture execution graph 5 or method shown in Figure 8, generates the first iamge description vector; Convergent-divergent step S6200 carries out convergent-divergent to described input picture, to generate zoomed image; The second iamge description vector generates step S6300, by for described zoomed image execution graph 5 or method shown in Figure 8, generates the second iamge description vector; And concatenation step S6400, the described first iamge description vector of cascade and the second iamge description vector are to generate the iamge description vector of multiresolution.

Above-mentioned " convergent-divergent " means the resolution that changes input picture.For example, zoomed image can have than the more pixel of input picture (amplification) or than input picture pixel (compression) still less.Known many image zoom technology in this area, and incite somebody to action more detailed description no longer here.

Can with the different resolution of an image two or three or even more images the vector cascade is described with the iamge description vector of the multiresolution that forms an image.

Figure 11 is the functional block diagram according to the equipment 6000 of the iamge description vector that is used for the generation multiresolution of exemplary embodiment of the present invention.

As shown in figure 11, the equipment 6000 that is used for the iamge description vector of generation multiresolution comprises: the first iamge description vector generation unit 6100, be arranged to by for input picture execution graph 5 or method shown in Figure 8, and generate the first iamge description vector; Unit for scaling 6200 is arranged to described input picture is carried out convergent-divergent, to generate zoomed image; The second iamge description vector generation unit 6300 is arranged to by for described zoomed image execution graph 5 or method shown in Figure 8, generates the second iamge description vector; And cascade unit 6400, be arranged to the described first iamge description vector of cascade and the second iamge description vector to generate the iamge description vector of multiresolution.

Unit described above and the following unit that will describe are for the exemplary and/or preferred module of implementing the processing that the disclosure describes.These modules can be hardware cell (such as field programmable gate array, digital signal processor or special IC etc.) and/or software module (such as computer-readable program).The module that is used for implementing each step is not below at large described.Yet, as long as the step of carrying out certain processing is arranged, just can be useful on functional module or unit (being implemented by hardware and/or software) of the correspondence of implementing same processing.The technical scheme that all combinations by described step and the unit corresponding with these steps limit all is included in the application's the disclosure, as long as these technical schemes that their constitute are complete and applicable.

In addition, the said equipment that is made of various unit can be used as functional module and is incorporated in the hardware unit such as computing machine.Except these functional modules, computing machine can have other hardware or software part certainly.

An exemplary embodiment below will be described, in this exemplary embodiment, pixel region in the image is arranged to matrix, and the adjacent pixel regions of a pixel region comprises centered by this pixel region and form eight pixel regions of 3 * 3 pixel region matrix with this pixel region.In this case, N=8 and M=3.

For the input picture that comprises a plurality of pixel regions, in step S5110, each pixel region in described a plurality of pixel regions is encoded as 8 ternary codes.

Figure 12 is the process flow diagram of the exemplary process carried out in step S5110.Figure 13 illustrates the example for diagram processing shown in Figure 12.

Illustrated in Figure 13, in the time will encoding to pixel region 1220, consider 3 * 3 pixel regions 1210 centered by pixel region 1220.3 * 3 pixel regions 1210 comprise 1220 and eight adjacent pixel regions 1211 in center pixel zone.

In step S5111, use allowance T and set lower threshold 1231 and upper limit threshold 1232 with respect to the pixel region thresholding in center pixel zone 1220.This allowance T can be determined in advance according to concrete application.For example, allowance T can be determined in advance according to the contrast of image or the dynamic range of pixel region thresholding.Under the contrast condition with higher of dynamic range than big or image of pixel region thresholding, allowance T can be set as bigger.Under the lower situation of contrast less in the dynamic range of pixel region thresholding or image, allowance T can be set as less.In addition, can depend on how pixel region to be regarded as to have " similarly " pixel region thresholding and set allowance T.Lower threshold 1231 is set as that (center pixel regional value-T) and upper limit threshold 1232 are set as (center pixel regional value+T).At this, T and-T is respectively the illustrative example of aforesaid first threshold and second threshold value.In example shown in Figure 13, lower threshold 1231 is 54-10=44, and upper limit threshold 1232 is 54+10=64.

In step S5112, each adjacent pixel regions stands upper limit threshold and lower threshold, in order to be assigned with " 1 ", " 1 " or " 0 ".More specifically, if the pixel region thresholding of adjacent pixel regions greater than upper limit threshold (that is, the pixel region thresholding of adjacent pixel regions and the difference between the center pixel regional value are greater than first threshold), then this adjacent pixel regions is assigned with " 1 ".If the pixel region thresholding of adjacent pixel regions is not more than upper limit threshold and is not less than lower threshold (namely, pixel region thresholding and the difference between the center pixel regional value of adjacent pixel regions are not more than first threshold and are not less than second threshold value), then this adjacent pixel regions is assigned with " 0 ".If the pixel region thresholding of adjacent pixel regions is less than lower threshold (that is, the pixel region thresholding of adjacent pixel regions and the difference between the center pixel regional value are less than second threshold value), then this adjacent pixel regions is assigned with " 1 ".In addition, the center pixel zone is assigned with " 0 ".In other words, the pixel region that is assigned with " 1 " is represented the pixel region of " brighter ", the pixel region that is assigned with " 0 " represents to have with the center pixel zone pixel region of the pixel region thresholding of substantially the same (or similar), and the pixel region that is assigned with " 1 " is represented the pixel region of " darker ".

The example of the pixel region that is assigned " 0 ", " 1 " and " 1 " is shown by the Reference numeral 1240 among Figure 13.

In step S5113, one of adjacent pixel regions is selected as starting point, and trit is arranged clockwise or counterclockwise, to obtain the string of eight trits, as ternary code 1250.

After step S5111 to S5113, obtain 8 ternary codes in center pixel zone.By each pixel region in the image is come execution in step S5111 to S5113 as the center pixel zone, each pixel region in the image can have corresponding 8 ternary codes.

Then, in code conversion step S5120, each 8 ternary code is transformed into three 8 binary codes, and each 8 binary code in described three 8 binary codes are corresponding to a level in the three-shift level.

Figure 14 illustrates the example for the processing of diagram code conversion step S5120.

As shown in figure 14,8 ternary codes are transformed to three 8 binary codes of three kinds.Each classification in this three kind is all corresponding to a threshold range.First category is corresponding to the pixel region that is assigned " 1 ", and second classification is corresponding to the pixel region that is assigned " 0 ", and the 3rd classification is corresponding to the pixel region that is assigned " 1 ".8 binary codes of first category are by keeping owning " 1 " and " 0 " is changed in other position in the ternary code to generate.8 binary codes of second classification are to generate by all " 0 " in the ternary code being changed into " 1 " and " 0 " being changed in other position.8 binary codes of the 3rd classification are to generate by all " 1 " in the ternary code being changed into " 1 " and " 0 " being changed in other position, as shown in figure 14.

Figure 14 also illustrates a kind of mode for 8 binary codes of expression.More specifically, 8 binary codes can be represented as eight points arranging in eight positions of Equal round.In Figure 14 and figure afterwards, black color dots is represented position " 1 ", and white point is represented position " 0 ".Each point appears position and the pixel region thresholding with respect to the center pixel zone of corresponding pixel area.

Then, in step S5200, extract remarkable 8 binary codes in 8 binary codes of all that from step S5100, obtain.As mentioned above, 8 binary codes are divided into three classifications, that is, and and first category, second classification and the 3rd classification.

Based on 8 binary coding schemes, always have 256 binary code patterns for each classification.For three classifications, always have 256 * 3=768 binary code pattern.Yet in the middle of 256 binary code patterns, such situation is always arranged: namely, only the sub-fraction in the binary code pattern has significant characteristics of image, and other can be considered to noise.

In this exemplary embodiment, input picture has such characteristic: namely the pixel region thresholding changes gradually, and significantly characteristics of image is continuous.Based on this characteristic, design and select remarkable binary code pattern.The selection of the remarkable binary code pattern in this exemplary embodiment only is an example.It also is possible that image in other example has other characteristic, and, can consider that the concrete property of the image that will describe designs and select remarkable binary code pattern.

For 8 binary codes of first category, if the coupling of the pattern in 8 binary codes and the first special code set of patterns is then extracted these 8 binary codes remarkable 8 binary codes as first category.For 8 binary codes of second classification, if the coupling of the pattern in 8 binary codes and the second special code set of patterns is then extracted these 8 binary codes remarkable 8 binary codes as second classification.For 8 binary codes of the 3rd classification, if the coupling of the pattern in 8 binary codes and the first special code set of patterns is then extracted these 8 binary codes remarkable 8 binary codes as the 3rd classification.

The example of the first special code set of patterns and the second special code set of patterns below will be described in more detail.

In this exemplary embodiment, first category is represented the brighter feature (pixel region with big pixel region thresholding) around the center pixel zone, and the 3rd classification is represented the darker feature (pixel region with less pixel region thresholding) around the center pixel zone.Thereby the pixel region in first category and the 3rd classification has common characteristic, and namely these pixel regions are significantly different with the center pixel zone, and in other words, they all appear the local adjacent microstructures around the center pixel zone.Based on the characteristic of the image in this exemplary embodiment, the specific image feature in these two classifications can be edge, straight line, arc, circle or point etc., as shown in figure 15.Figure 15 illustrates the specific image feature (binary code pattern) of first category and the 3rd classification, and wherein black color dots means that corresponding pixel region is unique point (position in the binary code " 1 ").Remain remarkable pattern by rotating the binary code pattern that remarkable pattern shown in Figure 15 obtains.That is to say that the first special code set of patterns (the remarkable pattern of first category and the 3rd classification) comprises pattern shown in Figure 15 and rotation form thereof.

As can be seen from Figure 15, the first special code set of patterns can be made of first mode subset and second mode subset.In each 8 binary code of first mode subset, have more now once to low level being converted to from high level, and being converted to from the low level to the high level has more now once.In other words, the pattern in first mode subset only has continuous " 1 " position collection at the most and continuous " 0 " position collection is only arranged at the most.8 binary codes of in second mode subset each are 8 binary codes of disymmetry.For 8 binary codes of disymmetry, the pattern that produces by 8 positions that 8 binary codes of disymmetry are arranged in Equal round all is axisymmetric with respect to two axles of two orthogonal directions.

As can be seen, in this exemplary embodiment, in being respectively applied to the first special code set of patterns of first category and the 3rd classification, always have 69 concrete special code patterns (comprising pattern shown in Figure 15 and rotation form thereof).That is to say that 256-69=187 other pattern will be regarded as the noise code pattern.

In this exemplary embodiment, 8 binary code representations of second classification are in the pixel region in the pixel region thresholding scope identical with the center pixel zone.Because the center pixel zone has and the similar pixel region thresholding of the adjacent pixel regions of second classification, so the center pixel zone also can be considered to the unique point of second classification.Based on the characteristic of the image in this exemplary embodiment, the specific image feature in second classification can be piece, spot, turning, cross spider or edge etc., as shown in figure 16.Figure 16 illustrates the specific image feature (significantly binary code pattern) of second classification, and wherein, black color dots means that corresponding pixel region is unique point (position in the binary code " 1 ").Remain remarkable pattern by rotating the binary mode that remarkable pattern shown in Figure 16 obtains.That is to say that the second special code set of patterns (the remarkable pattern of second classification) comprises pattern shown in Figure 16 and rotation form thereof.

The second special code set of patterns can be made of the three-mode subclass.As can be seen, 8 binary codes of each of three-mode subclass are 8 binary codes of symmetry.For 8 binary codes of symmetry, the pattern that produces by 8 positions that 8 binary codes of symmetry are arranged in Equal round is axisymmetric with respect to the axle of at least one direction.

As can be seen, in this exemplary embodiment, always have 152 concrete special code patterns (comprising pattern shown in Figure 16 and rotation form thereof) at the second special code set of patterns that is used for second classification.That is to say that 256-152=104 other pattern will be regarded as the noise code pattern.

As mentioned above and since only significantly pattern be used to describe 8 binary codes in each classification in first to the 3rd classification, the efficient of describing image is enhanced.In addition, owing to describe a pixel region with three 8 different classes of binary codes, and 8 different classes of binary codes are handled dividually, so can be more accurately and describe image more subtly.

After all remarkable 8 binary codes all have been extracted, in step S5310, for the concrete significantly pattern of each corresponding with each classification, the quantity of such other remarkable 8 binary codes is counted.For each classification, the counting by such other each concrete pattern of cascade forms iamge description vector part.

Then, in step S5320, the respective image of these three classifications is described vector and partly is cascaded as the iamge description vector.

Can be histogram with the iamge description vector representation, in this histogram, each residential quarter (bin) represents a concrete remarkable pattern, and these residential quarters are arranged according to different classifications, the count number that each bar (bar) expression is corresponding with this residential quarter.Also cascade can be carried out with the residential quarter of the remarkable pattern of expression in the residential quarter of all non-remarkable 8 binary code patterns of expression, and/or also can carry out normalization to histogram by the quantity of the treated pixel region in the image.

Also remarkable pattern can be quantified as decimal value, so that storage is for using.For example, the remarkable pattern 01100110 of second classification can be quantified as " 102 ".

Can also carry out the convergent-divergent resolution of image (change) and Fig. 5 or processing shown in Figure 8 are applied to image through convergent-divergent image, as shown in Figure 10 described above, in order to obtain the iamge description vector of multiresolution.

Figure 17 (a) illustrates the illustrative example of original image.The remarkable pattern that generates according to an above-mentioned embodiment and iamge description vector are shown in Figure 17 (b).

As mentioned above, can image be described more efficiently with more detailed and accurate image local structure according to Image Description Methods of the present invention.Therefore, this method is applicable to object detection, identification, tracking and the retrieval of image and video.

Now, use is described according to the image detecting method of iamge description vector of the present invention with reference to Figure 18.

Figure 18 illustrates the indicative flowchart of the processing of image detecting method according to an embodiment of the invention.As shown in figure 18, this image detecting method comprises: input step S1910, the area image of input object image; The iamge description vector generates step S1920, to the method shown in the area image execution graph 5,8 or 10, describes vector to generate the iamge description vector as area image; Calculation procedure S1930, zoning iamge description vector and the target image of registering in database are in advance described the distance between the vector, and it is by the method shown in the target image execution graph 5,8 or 10 is formed in advance that described target image is described vector; And determining step S1940, if this distance then is defined as detecting the area image corresponding to this target image less than specific threshold (being "Yes" among the step S1940 among Figure 18); This distance is not less than this specific threshold (being "No" among the step S1940 among Figure 18) else if, and then the position of adjustment region image and/or size are wanted processed area image with after obtaining.

Processing shown in Figure 180 can be used to ferret out image in input picture (for example, face).Can be for the All Ranges image repeating step S1910～S1940 in the input picture.

Figure 19 is the block diagram of schematically illustrated image detecting apparatus according to an embodiment of the invention.This image detecting apparatus comprises: input block 1910 is arranged to the area image of input object image; Iamge description vector generation unit 1920 is arranged to the method shown in the area image execution graph 5,8 or 10, describes vector to generate the iamge description vector as area image; Computing unit 1930, the target image that is arranged to zoning iamge description vector and registers in database is in advance described the distance between the vector, and it is by the method shown in the target image execution graph 5,8 or 10 is formed in advance that described target image is described vector; And determining unit 1940, if be arranged to this distance less than specific threshold, then be defined as detecting the area image corresponding to this target image; This distance is not less than this specific threshold else if, and then the position of adjustment region image and/or size are wanted processed area image with after obtaining.

Above-mentioned distance can be histogram intersection distance (intersection distance) or card side's distance (chi-square distance).Yet, this can use calculate between two histograms or two vectors between any other method of distance (the perhaps degree of approach or similarity).

In order to assess method and apparatus according to the invention, detect the method for having used in (UROD) system for generation iamge description vector at user's registering object.Register the single sample of the image (target image) of destination object by the user, and, create image detecting apparatus and come detected target object adaptively.Use described image detecting apparatus localizing objects image from input video frame.Use three kinds of Image Description Methods in the UROD system dividually, i.e. LBP method, LTP method and the method according to this invention are to create detecting device respectively.

Adopt PASCAL assessment level conduct the assessment level here.Being considered in the frame, comprise that the zone in zone of target image is by correct labeling manually.As (area of the area ∩ surveyed area of marked region)/(area of the area ∪ surveyed area of marked region)＞T _DetectionThe time, surveyed area is considered to correct surveyed area.Symbol " ∩ " means intersection operation and symbol " ∪ " expression and set operation.In this assessment, T _DetectionBe set as 50%.This is in order to carry out balance between few detecting the zone that truly comprises target as much as possible and the error-detecting zone that does not comprise target is remained.

The recall ratio (recall rate) that is used for this assessment is defined as follows with reject rate (reject rate).

N appears in the destination object in registration _OccuranceInferior, and the N in the destination object of these registrations _DetectionUnder the correct detected situation of individual quilt, N _Detection/ N _OccuranceBe defined as recall ratio.That is to say recall ratio is illustrated under the situation that the real goal object occurs 100 times can correctly detect for what real goal objects.

N appears in the destination object in registration _OccuranceInferior, and N _FalseIndividual image-region is by under the situation of destination object of error-detecting for registration, N _False/ N _OccuranceBe defined as reject rate.That is to say reject rate is illustrated under the situation of the every appearance of real goal object 100 times to detect what error image zones.

The software and hardware configuration that is used for this assessment is shown in following table 1.

Table 1

Assessment result for these three kinds of methods is shown in the following table 2.

	Target in the frame	Recall ratio (detecting)	Totalframes	Reject rate (mistake is alert)
					LBP	23620	93.21％	88566	97.56％
LTP	23620	87.67％	88566	48.49％
					The present invention	23620	93.68％	88566	32.56％

Table 2

Can find out from the assessment result shown in the table 2, except reducing owing to the pattern that will store and use the aforesaid greater efficiency that causes, compare with the LTP method with LBP, this method that is used for generation iamge description vector also can reduce reject rate (mistake is alert) and improve recall ratio (detecting) simultaneously.

Can implement method and apparatus of the present invention by many modes.For example, can implement method and apparatus of the present invention by software, hardware, firmware or its any combination.The order of above-mentioned method step only is illustrative, and method step of the present invention is not limited to above specifically described order, unless otherwise offer some clarification on.In addition, in certain embodiments, the present invention can also be implemented as the program that is recorded in the recording medium, and it comprises for the machine readable instructions that realizes the method according to this invention.Thereby the present invention also covers the recording medium that storage is used for the program of realization the method according to this invention.

Though by the example detail display specific embodiments more of the present invention, it will be appreciated by those skilled in the art that above-mentioned example only is intended that illustrative and does not limit the scope of the invention.It should be appreciated by those skilled in the art that above-described embodiment to be modified and do not depart from the scope and spirit of the present invention.Scope of the present invention is to limit by appended claim.

Claims

1. method that be used for to generate the iamge description vector comprises:

Coding step, each pixel region in a plurality of pixel regions of image is encoded to M N position binary code, wherein, described M N position binary code corresponds respectively to M classification, each expression adjacent pixel regions adjacent with corresponding pixel area of described N position binary code;

Characteristic extraction step, for m N position binary code in each M the N position binary code corresponding with each pixel region in described a plurality of pixel regions: if the special code pattern match of the m classification in this m N position binary code and the described M classification is then extracted this m N position binary code the remarkable N position binary code as this m classification; And

The iamge description vector generates step, for each classification in the described M classification, the quantity of remarkable N position binary code counted, and with formation iamge description vector,

Wherein, M is 3 or bigger integer, and N is 3 or bigger integer, 1≤m≤M.

2. the method for claim 1, wherein

N is that 8, M is 3, and described M N position binary code is three 8 binary codes, and a described M classification is three classifications.

3. the method for claim 1, wherein

M N position binary code representation in described M N position binary code and corresponding pixel area is adjacent and the pixel region thresholding is in the interior adjacent pixel regions of m pixel region thresholding scope in the middle of M the pixel region thresholding scope.

4. method as claimed in claim 2, wherein, described coding step comprises:

Three-shift coding substep is encoded to 8 ternary codes with each pixel region in described a plurality of pixel regions;

The code conversion step is transformed into described three 8 binary codes with each 8 ternary code, and wherein, each 8 binary code in described three 8 binary codes are corresponding to a level of three-shift level.

5. method as claimed in claim 2, wherein

Described three classifications comprise high pixel region thresholding classification, intermediate pixel regional value classification and low pixel region thresholding classification, the pixel region thresholding corresponding with 8 binary codes of described high pixel region thresholding classification is in the first pixel region thresholding scope, the pixel region thresholding corresponding with 8 binary codes of described intermediate pixel regional value classification is in the second pixel region thresholding scope, and the pixel region thresholding corresponding with 8 binary codes of described low pixel region thresholding classification is in the 3rd pixel region thresholding scope.

6. method as claimed in claim 4, wherein

Described three classifications comprise high pixel region thresholding classification, intermediate pixel regional value classification and low pixel region thresholding classification, 8 binary codes of described high pixel region thresholding classification are corresponding to first level of three-shift level, 8 binary codes of described intermediate pixel regional value classification are corresponding to second level of three-shift level, and 8 binary codes of described low pixel region thresholding classification are corresponding to the 3rd level of three-shift level.

7. method as claimed in claim 5, wherein:

For given pixel region, the described first pixel region thresholding scope be wherein the pixel region thresholding and the difference between the pixel region thresholding of this given pixel region greater than the pixel region thresholding scope of first threshold, the described second pixel region thresholding scope is that wherein pixel region thresholding and the difference between the pixel region thresholding of this given pixel region is not more than first threshold and is not less than the pixel region thresholding scope of second threshold value, described the 3rd pixel region thresholding scope be wherein the pixel region thresholding and the difference between the pixel region thresholding of this given pixel region less than the pixel region thresholding scope of second threshold value.

8. as each described method in the claim 3～7, wherein

In described a plurality of pixel region each is made of a pixel, and described pixel region thresholding is pixel value.

9. as each described method in the claim 3～7, wherein

In described a plurality of pixel region each is made of a plurality of pixels, and described pixel region thresholding is the combined value of the pixel value of the described a plurality of pixels in this pixel region.

10. method as claimed in claim 9, wherein

Described combined value is one of arithmetic mean value, geometrical mean, weighted mean value and median.

11. the method for claim 1, wherein

Described iamge description vector generates step and comprises:

First substep, respectively for each classification in the described M classification, to described a plurality of special code patterns in the quantity of remarkable N position binary code of each pattern coupling count, to form the iamge description vector part for respective classes; And

Second substep, the iamge description vector part of the described M of a cascade classification is to form the iamge description vector.

12. method as claimed in claim 2, wherein:

Described a plurality of special code pattern comprises the first special code set of patterns and the second special code set of patterns, and the described first special code set of patterns is made of first mode subset and second mode subset, and the described second special code set of patterns is made of the three-mode subclass,

Wherein, in 8 binary codes of each in described first mode subset from high level to low level be converted to have more existing once and being converted to from low level to high level have more existing once,

8 binary codes of in described second mode subset each are 8 binary codes of disymmetry, wherein, the pattern that produces by the place, 8 positions that 8 binary codes of described disymmetry is arranged in Equal round is axisymmetric about two axles of two orthogonal directions

8 binary codes of in the described three-mode subclass each are 8 binary codes of symmetry, wherein, the pattern that produces by the place, 8 positions that described symmetrical 8 binary codes is arranged in Equal round about the axle of at least one direction for axisymmetric, and

In described three classifications, first category and the 3rd classification are all corresponding to the described first special code set of patterns, and second classification is corresponding to the described second special code set of patterns.

13. method as claimed in claim 11 also comprises:

The normalization step is with the pixel region quantity of described iamge description vector divided by described a plurality of pixel regions, to form normalized iamge description vector.

14. the method for claim 1, wherein

Described a plurality of pixel regions of described image do not comprise the pixel region that is positioned at image boundary.

15. the method for claim 1, wherein

Pixel region in the described image is arranged to matrix, the adjacent pixel regions of a pixel region comprise centered by this pixel region and with common eight pixel regions that form 3 * 3 pixel region matrixes of this pixel region.

16. a method that is used for the iamge description vector of generation multiresolution comprises:

The first iamge description vector generates step, by carrying out the method for claim 1 for input picture, generates the first iamge description vector;

The convergent-divergent step is carried out convergent-divergent to described input picture, to generate zoomed image;

The second iamge description vector generates step, by carrying out the method for claim 1 for described zoomed image, generates the second iamge description vector; And

Concatenation step, the described first iamge description vector of cascade and the second iamge description vector are to generate the iamge description vector of multiresolution.

17. an image detecting method comprises:

Input step, the area image of input object image;

The iamge description vector generates step, and described area image is carried out the method for claim 1 and generated the iamge description vector, describes vector as area image;

Calculation procedure, calculate that described area image is described vector and in database in advance the target image of registration distance between the vector is described, it is to form by target image is carried out the method for claim 1 in advance that described target image is described vector;

Determining step is if described distance less than specific threshold, then is defined as detecting the area image corresponding with target image, otherwise, if described distance is not less than specific threshold, then the position of adjustment region image and/or size are then wanted processed area image with acquisition.

18. an equipment that is used for generating the iamge description vector comprises:

Coding unit, be arranged to each pixel region in a plurality of pixel regions of image is encoded to M N position binary code, wherein, described M N position binary code corresponds respectively to M classification, each expression adjacent pixel regions adjacent with corresponding pixel area of described N position binary code;

Feature extraction unit, be arranged to for m N position binary code in each M the N position binary code corresponding with each pixel region in described a plurality of pixel regions: if the special code pattern match of the m classification in this m N position binary code and the described M classification is then extracted this m N position binary code the remarkable N position binary code as this m classification; And

Iamge description vector generation unit is arranged to for each classification in the described M classification, the quantity of remarkable N position binary code counted, and with formation iamge description vector,

Wherein, M is 3 or bigger integer, and N is 3 or bigger integer, 1≤m≤M.

19. equipment as claimed in claim 18, wherein

20. equipment as claimed in claim 18, wherein

21. equipment as claimed in claim 19, wherein, described coding unit comprises:

Three-shift coding subelement is arranged to each pixel region in described a plurality of pixel regions is encoded to 8 ternary codes;

The code conversion unit is arranged to each 8 ternary code is transformed into described three 8 binary codes, and wherein, each 8 binary code in described three 8 binary codes are corresponding to a level of three-shift level.

22. equipment as claimed in claim 19, wherein

23. equipment as claimed in claim 21, wherein

24. equipment as claimed in claim 22, wherein

25. as each described equipment in the claim 20～24, wherein

26. as each described equipment in the claim 20～24, wherein

27. equipment as claimed in claim 26, wherein

28. equipment as claimed in claim 18, wherein

Described iamge description vector generation unit comprises:

First subelement, be arranged to respectively for each classification in the described M classification, to with described a plurality of special code patterns in the quantity of remarkable N position binary code of each pattern coupling count, to form the iamge description vector part for respective classes; And

Second subelement is arranged to the iamge description vector part of the described M of a cascade classification, to form the iamge description vector.

29. equipment as claimed in claim 19, wherein

30. equipment as claimed in claim 28 also comprises:

The normalization unit is arranged to the pixel region quantity of described iamge description vector divided by described a plurality of pixel regions, to form normalized iamge description vector.

31. equipment as claimed in claim 18, wherein

32. equipment as claimed in claim 19, wherein

33. an equipment that is used for the iamge description vector of generation multiresolution comprises:

The first iamge description vector generation unit is arranged to by carrying out the method for claim 1 for input picture, generates the first iamge description vector;

Unit for scaling is arranged to described input picture is carried out convergent-divergent, to generate zoomed image;

The second iamge description vector generation unit is arranged to by carrying out the method for claim 1 for described zoomed image, generates the second iamge description vector; And

The cascade unit is arranged to the described first iamge description vector of cascade and the second iamge description vector to generate the iamge description vector of multiresolution.

34. an image detecting apparatus comprises:

Input block is arranged to the area image of input object image;

Iamge description vector generation unit is arranged to described area image is carried out the method for claim 1 and generated the iamge description vector, describes vector as area image;

Computing unit, be arranged to calculate that described area image is described vector and in database in advance the target image of registration distance between the vector is described, it is to form by target image is carried out the method for claim 1 in advance that described target image is described vector;

Determining unit if be arranged to described distance less than specific threshold, then is defined as detecting the area image corresponding with target image, otherwise, if described distance is not less than specific threshold, then the position of adjustment region image and/or size are then wanted processed area image with acquisition.