CN105740771A - Bulldozing device with target identification function - Google Patents

Bulldozing device with target identification function Download PDF

Info

Publication number
CN105740771A
CN105740771A CN201610046348.3A CN201610046348A CN105740771A CN 105740771 A CN105740771 A CN 105740771A CN 201610046348 A CN201610046348 A CN 201610046348A CN 105740771 A CN105740771 A CN 105740771A
Authority
CN
China
Prior art keywords
image
target
submodule
feature
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610046348.3A
Other languages
Chinese (zh)
Inventor
张健敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201610046348.3A priority Critical patent/CN105740771A/en
Publication of CN105740771A publication Critical patent/CN105740771A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a bulldozing device with a target identification function. The bulldozing device comprises a bulldozer and a surveillance device installed on the bulldozer, wherein the surveillance device specifically comprises a preprocessing module, a detecting and tracking module and an identification and output module; the preprocessing module comprises an image conversion submodule, an image filtering submodule and an image enhancement submodule; and the detecting and tracking module comprises a construction submodule, a loss distinguishing submodule and an updating submodule. A video image technology is applied to the bulldozer, malicious destruction behaviors can be effectively monitored and recorded, and the bulldozing device has the advantages of being good in instantaneity, accurate in positioning, high in adaptive capability, integral in image detail reservation, high in robustness and the like.

Description

A kind of bulldozing device with target recognition function
Technical field
The present invention relates to bulldozing device field, be specifically related to a kind of bulldozing device with target recognition function.
Background technology
Bull-dozer is a kind of engineering truck, and dozer blade, equipped with large-scale metal dozer blade, is put down in front during use, forward backing-off cutting push mud, sand and stone etc., and dozer blade position and angle can adjust.Soil-shifting function individually completes shoveling, muck haulage and unloads geotechnique's work, has flexible operation, rotates the features such as convenient, required work surface is little, travel speed is fast.It is primarily adapted for use in the short fortune of shallow cut of one to three class soil, such as site clearing or smooth, and foundation ditch that cutting depth is little and backfill, push away and build highly little roadbed etc..
Bull-dozer is as a kind of important expensive device, and the safety of bull-dozer is particularly important, it is necessary to can prevent and monitor malicious sabotage behavior.
Summary of the invention
For the problems referred to above, the present invention provides a kind of bulldozing device with target recognition function.
The purpose of the present invention realizes by the following technical solutions:
A kind of bulldozing device with target recognition function, including bull-dozer and the monitoring device being arranged on bull-dozer, monitoring device for carrying out video image monitoring to the activity near bull-dozer, and monitoring device includes pretreatment module, detecting and tracking module, identifies output module;
(1) pretreatment module, for the image received is carried out pretreatment, specifically includes image transformant module, image filtering submodule and image enhaucament submodule:
Image transformant module, for coloured image is converted into gray level image:
H ( x , y ) = max ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) + min ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) 2 + 2 ( max ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) - min ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) )
Wherein, (x, y), (x, y), (x, (x, y) the intensity red green blue value at place, (x y) represents coordinate (x, y) grey scale pixel value at place to H to B to G to R y) to represent pixel respectively;Image is sized to m × n;
Image filtering submodule, for gray level image is filtered:
Adopt Wiener filtering to carry out after first-level filtering removes, define svlm image, be designated as Msvlm(x, y), being specifically defined formula is: Msvlm(x, y)=a1J1(x,y)+a2J2(x,y)+a3J3(x,y)+a4J4(x, y), wherein a1、a2、a3、a4For variable weight,(x, y) for the image after filtered for J;
Image enhaucament submodule:
When | 128 - m | > | ω - 50 | 3 Time, L ( x , y ) = 255 × ( H ( x , y ) 255 ) ψ ( x , y ) , Wherein, (x, y) for enhanced gray value for L;(x y) is the gamma correction coefficient including local message, now to ψα be range for 0 to 1 variable element,
When | 128 - m | ≤ | ω - 50 | 3 And during ω > 50, L ( x , y ) = 255 × ( H ( x , y ) 255 ) ψ ( x , y ) × ( 1 - ω - 50 ω 2 ) , Wherein ψ ( x , y ) = ψ α ( M s v l m ( x , y ) ) , α = 1 - | 128 - m i n ( m L , m H ) 128 | , mHIt is the average of the gray value all pixels higher than 128, m in imageLIt is the average of the gray value all pixels lower than 128, and now m=min (mH, mL), when α value is known, calculates 256 ψ correction coefficients as look-up table, be designated asWherein i is index value, utilizes Msvlm(x, gray value y) is as index, according to ψ (x, y)=ψα(Msvlm(x, y)) quickly obtain each pixel in image gamma correction coefficient ψ (x, y);For template correction factor;
(2) detecting and tracking module, specifically includes structure submodule, loses differentiation submodule and update submodule:
Build submodule, for the structure of visual dictionary:
Obtain the position and yardstick of following the tracks of target at initial frame, choosing positive and negative sample training tracker about, result will be followed the tracks of as training set X={x1,x2,……xN}T;And the every width target image in training set is extracted the SIFT feature of 128 dimensionsWherein StThe number of SIFT feature in t width target image in expression training set;After following the tracks of N frame, by clustering algorithm, these features are divided into K bunch, the center constitutive characteristic word of each bunch, it is designated asThe feature total amount that can extractWherein K < < FN, andAfter visual dictionary builds, every width training image is expressed as the form of feature bag, for representing the frequency that in visual dictionary, feature word occurs, with rectangular histogram h (xt) represent, h (xt) obtain in the following manner: by a width training image XtIn each feature fs (t)Projecting to visual dictionary, the feature word the shortest with projector distance represents this feature, after all Projection Characters, adds up the frequency of occurrences of each feature word, and normalization obtains training image XtFeature histogram h (xt);
Lose and differentiate submodule, for differentiating that the loss of target is whether:
When a new two field picture arrives, from K histogram, randomly select Z < K histogram, and Z=4, form the new sub-rectangular histogram h being sized to Z(z)(xt), sub histogrammic number is up toIndividual;Calculate candidate target region son histogrammic similarity Ф corresponding to certain target area in training sett_z,Wherein t=1,2 ..., N, z=1,2 ..., Ns, then calculate overall similarity Фt=1-∏z(1-Фt_z);Similarity Ф=max{ Ф of candidate target region and targett, t} represents, then track rejection judges that formula is: u = s i g n ( &Phi; ) = 1 &Phi; &GreaterEqual; g s 0 &Phi; < g s , Wherein gs be manually set sentence mistake threshold values;As u=1, target is by tenacious tracking, as u=0, and track rejection;
When track rejection, define affine Transform Model: x t y t = s . c o s ( &mu; 1 &times; &theta; ) s . s i n ( &mu; 1 &times; &theta; ) - s . s i n ( &mu; 1 &times; &theta; ) s . c o s ( &mu; 1 &times; &theta; ) x t - 1 y t - 1 + &mu; 2 e f , Wherein (xt,yt) and (xt-1,yt-1) the respectively position coordinates of certain SITF characteristic point and the position coordinates of Corresponding matching characteristic point in previous frame target in present frame target, both are known quantity;S is scale coefficient, and θ is coefficient of rotary, and e and f represents translation coefficient, &mu; 1 = 1 - | T - T 0 | 1000 T 0 T &GreaterEqual; T 0 1 + | T - T 0 | 1000 T 0 T < T 0 For temperature rotation correction coefficient, &mu; 2 = 1 - | T - T 0 | 1000 T 0 T &GreaterEqual; T 0 1 + | T - T 0 | 1000 T 0 T < T 0 Correction factor, μ is translated for temperature1And μ2For revising because the image rotation that causes of ambient temperature deviation and translation error, T0For the standard temperature being manually set, being set to 20 degree, T is monitored the temperature value obtained in real time by temperature sensor;Adopt Ransac algorithm for estimating to ask for the parameter of affine Transform Model, under new yardstick s and coefficient of rotary θ, finally gather positive negative sample, update grader;
Update submodule, for the renewal of visual dictionary:
After every two field picture obtains target location, the result of calculation according to affine transformation parameter, collect all SIFT feature points meeting result parameterAfter F=3 frame, it is thus achieved that new feature point setWherein St-FRepresent the total characteristic obtained from F two field picture to count;Utilize following formula that new and old characteristic point re-starts K cluster: WhereinRepresenting new visual dictionary, the size of visual dictionary remains unchanged;It is forgetting factor, it was shown that proportion shared by old dictionary,More little, the judgement of track rejection is contributed more many by new feature, takes
(3) output module is identified, identification and output for image: utilize track algorithm to obtain target area in image sequence to be identified, target area is mapped to the subspace that known training data is formed, calculate the distance between target area and training data in subspace, obtain similarity measurement, judge target classification, and export recognition result.
Preferably, adopting Wiener filtering to carry out after first-level filtering removes, now image information also includes the noise of remnants, adopts following two-stage filter to carry out secondary filtering:
J ( x , y ) = &Sigma; i = - m / 2 m / 2 &Sigma; j = - n / 2 n / 2 H ( x , y ) P g ( x + i , y + j )
Wherein, J (x, y) be after filtering after image;Pg(x+i, y+j) represents the function that yardstick is m × n and Pg(x+i, y+j)=q × exp (-(x2+y2)/ω), wherein q is by the coefficient of function normalization, it may be assumed that ∫ ∫ q × exp (-(x2+y2)/ω) dxdy=1.
This bull-dozer have the beneficial effect that at image pre-processing phase, the image strengthened can according to the size adaptation adjustment of template, improve reinforced effects, and can automatically revise at the Rule of judgment when different templates size, and consider visual custom and human eye to non-linear relation with colouring intensity of the perceptibility of different color;M × N number of power exponent computing is reduced to 256, improves computational efficiency;At target detection and tracking phase, the error that different temperatures causes the rotation of image and translation to cause can be eliminated, improve discrimination, image detail after treatment becomes apparent from, and amount of calculation is greatly reduced relative to traditional method, can effectively adapt to target scale change, and can accurately judge whether target loses, can by detection tenacious tracking again after target comes back to visual field.Additionally, this bull-dozer has, real-time is good, the advantage of accurate positioning and strong robustness, and achieves good effect in quickly having the target detection blocked and tracking.
Accompanying drawing explanation
The invention will be further described to utilize accompanying drawing, but the embodiment in accompanying drawing does not constitute any limitation of the invention, for those of ordinary skill in the art, under the premise not paying creative work, it is also possible to obtain other accompanying drawing according to the following drawings.
Fig. 1 is the structured flowchart of a kind of bulldozing device with target recognition function;
Fig. 2 is the outside schematic diagram of a kind of bulldozing device with target recognition function.
Detailed description of the invention
The invention will be further described with the following Examples.
Embodiment 1: as shown in Figure 1-2, a kind of bulldozing device with target recognition function, including bull-dozer 5 and the monitoring device 4 being arranged on bull-dozer 5, monitoring device 4 for carrying out video image monitoring to the activity near bull-dozer, and monitoring device 4 includes pretreatment module 1, detecting and tracking module 2, identifies output module 3.
(1) pretreatment module 1, for the image received is carried out pretreatment, specifically includes image transformant module 11, image filtering submodule 12 and image enhaucament submodule 13:
Image transformant module 11, for coloured image is converted into gray level image:
H ( x , y ) = max ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) + min ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) 2 + 2 ( max ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) - min ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) )
Wherein, (x, y), (x, y), (x, (x, y) the intensity red green blue value at place, (x y) represents coordinate (x, y) grey scale pixel value at place to H to B to G to R y) to represent pixel respectively;Image is sized to m × n;
Image filtering submodule 12, for gray level image is filtered:
Adopt Wiener filtering to carry out after first-level filtering removes, define svlm image, be designated as Msvlm(x, y), being specifically defined formula is: Msvlm(x, y)=a1J1(x,y)+a2J2(x,y)+a3J3(x,y)+a4J4(x, y), wherein a1、a2、a3、a4For variable weight,(x, y) for the image after filtered for J;
Image enhaucament submodule 13:
When | 128 - m | > | &omega; - 50 | 3 Time, L ( x , y ) = 255 &times; ( H ( x , y ) 255 ) &psi; ( x , y ) , Wherein, (x, y) for enhanced gray value for L;(x y) is the gamma correction coefficient including local message, now to ψα be range for 0 to 1 variable element,
When | 128 - m | &le; | &omega; - 50 | 3 And during ω > 50, L ( x , y ) = 255 &times; ( H ( x , y ) 255 ) &psi; ( x , y ) &times; ( 1 - &omega; - 50 &omega; 2 ) , Wherein &psi; ( x , y ) = &psi; &alpha; ( M s v l m ( x , y ) ) , &alpha; = 1 - | 128 - m i n ( m L , m H ) 128 | , mHIt is the average of the gray value all pixels higher than 128, m in imageLIt is the average of the gray value all pixels lower than 128, and now m=min (mH, mL), when α value is known, calculates 256 ψ correction coefficients as look-up table, be designated asWherein i is index value, utilizes Msvlm(x, gray value y) is as index, according to ψ (x, y)=ψα(Msvlm(x, y)) quickly obtain each pixel in image gamma correction coefficient ψ (x, y);For template correction factor;
(2) detecting and tracking module 2, specifically includes structure submodule 21, loses differentiation submodule 22 and update submodule 23:
Build submodule 21, for the structure of visual dictionary:
Obtain the position and yardstick of following the tracks of target at initial frame, choosing positive and negative sample training tracker about, result will be followed the tracks of as training set X={x1,x2,……xN}T;And the every width target image in training set is extracted the SIFT feature of 128 dimensionsWherein StThe number of SIFT feature in t width target image in expression training set;After following the tracks of N frame, by clustering algorithm, these features are divided into K bunch, the center constitutive characteristic word of each bunch, it is designated asThe feature total amount that can extractWherein K < < FN, andAfter visual dictionary builds, every width training image is expressed as the form of feature bag, for representing the frequency that in visual dictionary, feature word occurs, with rectangular histogram h (xt) represent, h (xt) obtain in the following manner: by a width training image XtIn each feature fs (t)Projecting to visual dictionary, the feature word the shortest with projector distance represents this feature, after all Projection Characters, adds up the frequency of occurrences of each feature word, and normalization obtains training image XtFeature histogram h (xt);
Lose and differentiate submodule 22, for differentiating that the loss of target is whether:
When a new two field picture arrives, from K histogram, randomly select Z < K histogram, and Z=4, form the new sub-rectangular histogram h being sized to Z(z)(xt), sub histogrammic number is up toIndividual;Calculate candidate target region son histogrammic similarity Ф corresponding to certain target area in training sett_z,Wherein t=1,2 ..., N, z=1,2 ..., Ns, then calculate overall similarity Фt=1-∏z(1-Фt_z);Similarity Ф=max{ Ф of candidate target region and targett, t} represents, then track rejection judges that formula is: u = s i g n ( &Phi; ) = 1 &Phi; &GreaterEqual; g s 0 &Phi; < g s , Wherein gs be manually set sentence mistake threshold values;As u=1, target is by tenacious tracking, as u=0, and track rejection;
When track rejection, define affine Transform Model: x t y t = s . c o s ( &mu; 1 &times; &theta; ) s . s i n ( &mu; 1 &times; &theta; ) - s . s i n ( &mu; 1 &times; &theta; ) s . c o s ( &mu; 1 &times; &theta; ) x t - 1 y t - 1 + &mu; 2 e f , Wherein (xt,yt) and (xt-1,yt-1) the respectively position coordinates of certain SITF characteristic point and the position coordinates of Corresponding matching characteristic point in previous frame target in present frame target, both are known quantity;S is scale coefficient, and θ is coefficient of rotary, and e and f represents translation coefficient, &mu; 1 = 1 - | T - T 0 | 1000 T 0 T &GreaterEqual; T 0 1 + | T - T 0 | 1000 T 0 T < T 0 For temperature rotation correction coefficient, &mu; 2 = 1 - | T - T 0 | 1000 T 0 T &GreaterEqual; T 0 1 + | T - T 0 | 1000 T 0 T < T 0 Correction factor, μ is translated for temperature1And μ2For revising because the image rotation that causes of ambient temperature deviation and translation error, T0For the standard temperature being manually set, being set to 20 degree, T is monitored the temperature value obtained in real time by temperature sensor;Adopt Ransac algorithm for estimating to ask for the parameter of affine Transform Model, under new yardstick s and coefficient of rotary θ, finally gather positive negative sample, update grader;
Update submodule 23, for the renewal of visual dictionary:
After every two field picture obtains target location, the result of calculation according to affine transformation parameter, collect all SIFT feature points meeting result parameterAfter F=3 frame, it is thus achieved that new feature point setWherein St-FRepresent the total characteristic obtained from F two field picture to count;Utilize following formula that new and old characteristic point re-starts K cluster: WhereinRepresenting new visual dictionary, the size of visual dictionary remains unchanged;It is forgetting factor, it was shown that proportion shared by old dictionary,More little, the judgement of track rejection is contributed more many by new feature, takes
(3) output module is identified, identification and output for image: utilize track algorithm to obtain target area in image sequence to be identified, target area is mapped to the subspace that known training data is formed, calculate the distance between target area and training data in subspace, obtain similarity measurement, judge target classification, and export recognition result.
Preferably, adopting Wiener filtering to carry out after first-level filtering removes, now image information also includes the noise of remnants, adopts following two-stage filter to carry out secondary filtering:
J ( x , y ) = &Sigma; i = - m / 2 m / 2 &Sigma; j = - n / 2 n / 2 H ( x , y ) P g ( x + i , y + j )
Wherein, J (x, y) be after filtering after image;Pg(x+i, y+j) represents the function that yardstick is m × n and Pg(x+i, y+j)=q × exp (-(x2+y2)/ω), wherein q is by the coefficient of function normalization, it may be assumed that ∫ ∫ q × exp (-(x2+y2)/ω) dxdy=1.
The bull-dozer of this embodiment, at image pre-processing phase, the image strengthened can according to the size adaptation adjustment of template, improve reinforced effects, and can automatically revise at the Rule of judgment when different templates size, and consider visual custom and human eye to non-linear relation with colouring intensity of the perceptibility of different color;Take full advantage of local feature and the global characteristics of image, there is adaptivity, it is possible to suppress excessively to strengthen, the image enhancement effects obtained under complex illumination environment is obvious;M × N number of power exponent computing is reduced to 256, improves computational efficiency, Z=4, F=3,Calculating average frame per second is 15FPS, and amount of calculation is less than the dictionary algorithm of same type;At target detection and tracking phase, the error that different temperatures causes the rotation of image and translation to cause can be eliminated, improve discrimination, image detail after treatment becomes apparent from, and amount of calculation is greatly reduced relative to traditional method, it is possible to effectively adapt to target scale change, and can accurately judge whether target loses, can again be detected and tenacious tracking after target comes back to visual field, until remaining to tenacious tracking target after 110 frames.Additionally, this bull-dozer has, real-time is good, the advantage of accurate positioning and strong robustness, and has good effect in quickly having the target detection blocked and tracking, achieves beyond thought effect.
Embodiment 2: as shown in Figure 1-2, a kind of bulldozing device with target recognition function, including bull-dozer 5 and the monitoring device 4 being arranged on bull-dozer 5, monitoring device 4 for carrying out video image monitoring to the activity near bull-dozer 5, and monitoring device 4 includes pretreatment module 1, detecting and tracking module 2, identifies output module 3.
(1) pretreatment module 1, for the image received is carried out pretreatment, specifically includes image transformant module 11, image filtering submodule 12 and image enhaucament submodule 13:
Image transformant module 11, for coloured image is converted into gray level image:
H ( x , y ) = max ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) + min ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) 2 + 2 ( max ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) - min ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) )
Wherein, (x, y), (x, y), (x, (x, y) the intensity red green blue value at place, (x y) represents coordinate (x, y) grey scale pixel value at place to H to B to G to R y) to represent pixel respectively;Image is sized to m × n;
Image filtering submodule 12, for gray level image is filtered:
Adopt Wiener filtering to carry out after first-level filtering removes, define svlm image, be designated as Msvlm(x, y), being specifically defined formula is: Msvlm(x, y)=a1J1(x,y)+a2J2(x,y)+a3J3(x,y)+a4J4(x, y), wherein a1、a2、a3、a4For variable weight,(x, y) for the image after filtered for J;
Image enhaucament submodule 13:
When | 128 - m | > | &omega; - 50 | 3 Time, L ( x , y ) = 255 &times; ( H ( x , y ) 255 ) &psi; ( x , y ) , Wherein, (x, y) for enhanced gray value for L;(x y) is the gamma correction coefficient including local message, now to ψα be range for 0 to 1 variable element,
When | 128 - m | &le; | &omega; - 50 | 3 And during ω > 50, L ( x , y ) = 255 &times; ( H ( x , y ) 255 ) &psi; ( x , y ) &times; ( 1 - &omega; - 50 &omega; 2 ) , Wherein &psi; ( x , y ) = &psi; &alpha; ( M s v l m ( x , y ) ) , &alpha; = 1 - | 128 - m i n ( m L , m H ) 128 | , mHIt is the average of the gray value all pixels higher than 128, m in imageLIt is the average of the gray value all pixels lower than 128, and now m=min (mH, mL), when α value is known, calculates 256 ψ correction coefficients as look-up table, be designated asWherein i is index value, utilizes Msvlm(x, gray value y) is as index, according to ψ (x, y)=ψα(Msvlm(x, y)) quickly obtain each pixel in image gamma correction coefficient ψ (x, y);For template correction factor;
(2) detecting and tracking module 2, specifically includes structure submodule 21, loses differentiation submodule 22 and update submodule 23:
Build submodule 21, for the structure of visual dictionary:
Obtain the position and yardstick of following the tracks of target at initial frame, choosing positive and negative sample training tracker about, result will be followed the tracks of as training set X={x1,x2,……xN}T;And the every width target image in training set is extracted the SIFT feature of 128 dimensionsWherein StThe number of SIFT feature in t width target image in expression training set;After following the tracks of N frame, by clustering algorithm, these features are divided into K bunch, the center constitutive characteristic word of each bunch, it is designated asThe feature total amount that can extractWherein K < < FN, andAfter visual dictionary builds, every width training image is expressed as the form of feature bag, for representing the frequency that in visual dictionary, feature word occurs, with rectangular histogram h (xt) represent, h (xt) obtain in the following manner: by a width training image XtIn each feature fs (t)Projecting to visual dictionary, the feature word the shortest with projector distance represents this feature, after all Projection Characters, adds up the frequency of occurrences of each feature word, and normalization obtains training image XtFeature histogram h (xt);
Lose and differentiate submodule 22, for differentiating that the loss of target is whether:
When a new two field picture arrives, from K histogram, randomly select Z < K histogram, and Z=5, form the new sub-rectangular histogram h being sized to Z(z)(xt), sub histogrammic number is up toIndividual;Calculate candidate target region son histogrammic similarity Ф corresponding to certain target area in training sett_z,Wherein t=1,2 ..., N, z=1,2 ..., Ns, then calculate overall similarity Фt=1-∏z(1-Фt_z);Similarity Ф=max{ Ф of candidate target region and targett, t} represents, then track rejection judges that formula is: u = s i g n ( &Phi; ) = 1 &Phi; &GreaterEqual; g s 0 &Phi; < g s , Wherein gs be manually set sentence mistake threshold values;As u=1, target is by tenacious tracking, as u=0, and track rejection;
When track rejection, define affine Transform Model: x t y t = s . c o s ( &mu; 1 &times; &theta; ) s . s i n ( &mu; 1 &times; &theta; ) - s . s i n ( &mu; 1 &times; &theta; ) s . c o s ( &mu; 1 &times; &theta; ) x t - 1 y t - 1 + &mu; 2 e f , Wherein (xt,yt) and (xt-1,yt-1) the respectively position coordinates of certain SITF characteristic point and the position coordinates of Corresponding matching characteristic point in previous frame target in present frame target, both are known quantity;S is scale coefficient, and θ is coefficient of rotary, and e and f represents translation coefficient, &mu; 1 = 1 - | T - T 0 | 1000 T 0 T &GreaterEqual; T 0 1 + | T - T 0 | 1000 T 0 T < T 0 For temperature rotation correction coefficient, &mu; 2 = 1 - | T - T 0 | 1000 T 0 T &GreaterEqual; T 0 1 + | T - T 0 | 1000 T 0 T < T 0 Correction factor, μ is translated for temperature1And μ2For revising because the image rotation that causes of ambient temperature deviation and translation error, T0For the standard temperature being manually set, being set to 20 degree, T is monitored the temperature value obtained in real time by temperature sensor;Adopt Ransac algorithm for estimating to ask for the parameter of affine Transform Model, under new yardstick s and coefficient of rotary θ, finally gather positive negative sample, update grader;
Update submodule 23, for the renewal of visual dictionary:
After every two field picture obtains target location, the result of calculation according to affine transformation parameter, collect all SIFT feature points meeting result parameterAfter F=4 frame, it is thus achieved that new feature point setWherein St-FRepresent the total characteristic obtained from F two field picture to count;Utilize following formula that new and old characteristic point re-starts K cluster: WhereinRepresenting new visual dictionary, the size of visual dictionary remains unchanged;It is forgetting factor, it was shown that proportion shared by old dictionary,More little, the judgement of track rejection is contributed more many by new feature, takes
(3) output module is identified, identification and output for image: utilize track algorithm to obtain target area in image sequence to be identified, target area is mapped to the subspace that known training data is formed, calculate the distance between target area and training data in subspace, obtain similarity measurement, judge target classification, and export recognition result.
Preferably, adopting Wiener filtering to carry out after first-level filtering removes, now image information also includes the noise of remnants, adopts following two-stage filter to carry out secondary filtering:
J ( x , y ) = &Sigma; i = - m / 2 m / 2 &Sigma; j = - n / 2 n / 2 H ( x , y ) P g ( x + i , y + j )
Wherein, J (x, y) be after filtering after image;Pg(x+i, y+j) represents the function that yardstick is m × n and Pg(x+i, y+j)=q × exp (-(x2+y2)/ω), wherein q is by the coefficient of function normalization, it may be assumed that ∫ ∫ q × exp (-(x2+y2)/ω) dxdy=1.
The bull-dozer of this embodiment, at image pre-processing phase, the image strengthened can according to the size adaptation adjustment of template, improve reinforced effects, and can automatically revise at the Rule of judgment when different templates size, and consider visual custom and human eye to non-linear relation with colouring intensity of the perceptibility of different color;Take full advantage of local feature and the global characteristics of image, there is adaptivity, it is possible to suppress excessively to strengthen, the image enhancement effects obtained under complex illumination environment is obvious;M × N number of power exponent computing is reduced to 256, improves computational efficiency, Z=5, F=4,Calculating average frame per second is 16FPS, and amount of calculation is less than the dictionary algorithm of same type;At target detection and tracking phase, the error that different temperatures causes the rotation of image and translation to cause can be eliminated, improve discrimination, image detail after treatment becomes apparent from, and amount of calculation is greatly reduced relative to traditional method, it is possible to effectively adapt to target scale change, and can accurately judge whether target loses, can again be detected and tenacious tracking after target comes back to visual field, until remaining to tenacious tracking target after 115 frames.Additionally, this bull-dozer has, real-time is good, the advantage of accurate positioning and strong robustness, and has good effect in quickly having the target detection blocked and tracking, achieves beyond thought effect.
Embodiment 3: as shown in Figure 1-2, a kind of bulldozing device with target recognition function, including bull-dozer 5 and the monitoring device 4 being arranged on bull-dozer 5, monitoring device 4 for carrying out video image monitoring to the activity near bull-dozer 5, and monitoring device 4 includes pretreatment module 1, detecting and tracking module 2, identifies output module 3.
(1) pretreatment module 1, for the image received is carried out pretreatment, specifically includes image transformant module 11, image filtering submodule 12 and image enhaucament submodule 13:
Image transformant module 11, for coloured image is converted into gray level image:
H ( x , y ) = max ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) + min ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) 2 + 2 ( max ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) - min ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) )
Wherein, (x, y), (x, y), (x, (x, y) the intensity red green blue value at place, (x y) represents coordinate (x, y) grey scale pixel value at place to H to B to G to R y) to represent pixel respectively;Image is sized to m × n;
Image filtering submodule 12, for gray level image is filtered:
Adopt Wiener filtering to carry out after first-level filtering removes, define svlm image, be designated as Msvlm(x, y), being specifically defined formula is: Msvlm(x, y)=a1J1(x,y)+a2J2(x,y)+a3J3(x,y)+a4J4(x, y), wherein a1、a2、a3、a4For variable weight,(x, y) for the image after filtered for J;
Image enhaucament submodule 13:
When | 128 - m | > | &omega; - 50 | 3 Time, L ( x , y ) = 255 &times; ( H ( x , y ) 255 ) &psi; ( x , y ) , Wherein, (x, y) for enhanced gray value for L;(x y) is the gamma correction coefficient including local message, now to ψα be range for 0 to 1 variable element,
When | 128 - m | &le; | &omega; - 50 | 3 And during ω > 50, L ( x , y ) = 255 &times; ( H ( x , y ) 255 ) &psi; ( x , y ) &times; ( 1 - &omega; - 50 &omega; 2 ) , Wherein &psi; ( x , y ) = &psi; &alpha; ( M s v l m ( x , y ) ) , &alpha; = 1 - | 128 - m i n ( m L , m H ) 128 | , mHIt is the average of the gray value all pixels higher than 128, m in imageLIt is the average of the gray value all pixels lower than 128, and now m=min (mH, mL), when α value is known, calculates 256 ψ correction coefficients as look-up table, be designated asWherein i is index value, utilizes Msvlm(x, gray value y) is as index, according to ψ (x, y)=ψα(Msvlm(x, y)) quickly obtain each pixel in image gamma correction coefficient ψ (x, y);For template correction factor;
(2) detecting and tracking module 2, specifically includes structure submodule 21, loses differentiation submodule 22 and update submodule 23:
Build submodule 21, for the structure of visual dictionary:
Obtain the position and yardstick of following the tracks of target at initial frame, choosing positive and negative sample training tracker about, result will be followed the tracks of as training set X={x1,x2,……xN}T;And the every width target image in training set is extracted the SIFT feature of 128 dimensionsWherein StThe number of SIFT feature in t width target image in expression training set;After following the tracks of N frame, by clustering algorithm, these features are divided into K bunch, the center constitutive characteristic word of each bunch, it is designated asThe feature total amount that can extractWherein K < < FN, andAfter visual dictionary builds, every width training image is expressed as the form of feature bag, for representing the frequency that in visual dictionary, feature word occurs, with rectangular histogram h (xt) represent, h (xt) obtain in the following manner: by a width training image XtIn each feature fs (t)Projecting to visual dictionary, the feature word the shortest with projector distance represents this feature, after all Projection Characters, adds up the frequency of occurrences of each feature word, and normalization obtains training image XtFeature histogram h (xt);
Lose and differentiate submodule 22, for differentiating that the loss of target is whether:
When a new two field picture arrives, from K histogram, randomly select Z < K histogram, and Z=6, form the new sub-rectangular histogram h being sized to Z(z)(xt), sub histogrammic number is up toIndividual;Calculate candidate target region son histogrammic similarity Ф corresponding to certain target area in training sett_z,Wherein t=1,2 ..., N, z=1,2 ..., Ns, then calculate overall similarity Фt=1-∏z(1-Фt_z);Similarity Ф=max{ Ф of candidate target region and targett, t} represents, then track rejection judges that formula is: u = s i g n ( &Phi; ) = 1 &Phi; &GreaterEqual; g s 0 &Phi; < g s , Wherein gs be manually set sentence mistake threshold values;As u=1, target is by tenacious tracking, as u=0, and track rejection;
When track rejection, define affine Transform Model: x t y t = s . c o s ( &mu; 1 &times; &theta; ) s . s i n ( &mu; 1 &times; &theta; ) - s . s i n ( &mu; 1 &times; &theta; ) s . c o s ( &mu; 1 &times; &theta; ) x t - 1 y t - 1 + &mu; 2 e f , Wherein (xt,yt) and (xt-1,yt-1) the respectively position coordinates of certain SITF characteristic point and the position coordinates of Corresponding matching characteristic point in previous frame target in present frame target, both are known quantity;S is scale coefficient, and θ is coefficient of rotary, and e and f represents translation coefficient, &mu; 1 = 1 - | T - T 0 | 1000 T 0 T &GreaterEqual; T 0 1 + | T - T 0 | 1000 T 0 T < T 0 For temperature rotation correction coefficient, &mu; 2 = 1 - | T - T 0 | 1000 T 0 T &GreaterEqual; T 0 1 + | T - T 0 | 1000 T 0 T < T 0 Correction factor, μ is translated for temperature1And μ2For revising because the image rotation that causes of ambient temperature deviation and translation error, T0For the standard temperature being manually set, being set to 20 degree, T is monitored the temperature value obtained in real time by temperature sensor;Adopt Ransac algorithm for estimating to ask for the parameter of affine Transform Model, under new yardstick s and coefficient of rotary θ, finally gather positive negative sample, update grader;
Update submodule 23, for the renewal of visual dictionary:
After every two field picture obtains target location, the result of calculation according to affine transformation parameter, collect all SIFT feature points meeting result parameterAfter F=5 frame, it is thus achieved that new feature point setWherein St-FRepresent the total characteristic obtained from F two field picture to count;Utilize following formula that new and old characteristic point re-starts K cluster: WhereinRepresenting new visual dictionary, the size of visual dictionary remains unchanged;It is forgetting factor, it was shown that proportion shared by old dictionary,More little, the judgement of track rejection is contributed more many by new feature, takes
(3) output module 3 is identified, identification and output for image: utilize track algorithm to obtain target area in image sequence to be identified, target area is mapped to the subspace that known training data is formed, calculate the distance between target area and training data in subspace, obtain similarity measurement, judge target classification, and export recognition result.
Preferably, adopting Wiener filtering to carry out after first-level filtering removes, now image information also includes the noise of remnants, adopts following two-stage filter to carry out secondary filtering:
J ( x , y ) = &Sigma; i = - m / 2 m / 2 &Sigma; j = - n / 2 n / 2 H ( x , y ) P g ( x + i , y + j )
Wherein, J (x, y) be after filtering after image;Pg(x+i, y+j) represents the function that yardstick is m × n and Pg(x+i, y+j)=q × exp (-(x2+y2)/ω), wherein q is by the coefficient of function normalization, it may be assumed that ∫ ∫ q × exp (-(x2+y2)/ω) dxdy=1.
The bull-dozer of this embodiment, at image pre-processing phase, the image strengthened can according to the size adaptation adjustment of template, improve reinforced effects, and can automatically revise at the Rule of judgment when different templates size, and consider visual custom and human eye to non-linear relation with colouring intensity of the perceptibility of different color;Take full advantage of local feature and the global characteristics of image, there is adaptivity, it is possible to suppress excessively to strengthen, the image enhancement effects obtained under complex illumination environment is obvious;M × N number of power exponent computing is reduced to 256, improves computational efficiency, Z=6, F=5,Calculating average frame per second is 17FPS, and amount of calculation is less than the dictionary algorithm of same type;At target detection and tracking phase, the error that different temperatures causes the rotation of image and translation to cause can be eliminated, improve discrimination, image detail after treatment becomes apparent from, and amount of calculation is greatly reduced relative to traditional method, it is possible to effectively adapt to target scale change, and can accurately judge whether target loses, can again be detected and tenacious tracking after target comes back to visual field, until remaining to tenacious tracking target after 120 frames.Additionally, this bull-dozer has, real-time is good, the advantage of accurate positioning and strong robustness, and has good effect in quickly having the target detection blocked and tracking, achieves beyond thought effect.
Embodiment 4: as shown in Figure 1-2, a kind of bulldozing device with target recognition function, including bull-dozer 5 and the monitoring device 4 being arranged on bull-dozer 5, monitoring device 4 for carrying out video image monitoring to the activity near bull-dozer 5, and monitoring device 4 includes pretreatment module 1, detecting and tracking module 2, identifies output module 3.
(1) pretreatment module 1, for the image received is carried out pretreatment, specifically includes image transformant module 11, image filtering submodule 12 and image enhaucament submodule 13:
Image transformant module 11, for coloured image is converted into gray level image:
H ( x , y ) = max ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) + min ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) 2 + 2 ( max ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) - min ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) )
Wherein, (x, y), (x, y), (x, (x, y) the intensity red green blue value at place, (x y) represents coordinate (x, y) grey scale pixel value at place to H to B to G to R y) to represent pixel respectively;Image is sized to m × n;
Image filtering submodule 12, for gray level image is filtered:
Adopt Wiener filtering to carry out after first-level filtering removes, define svlm image, be designated as Msvlm(x, y), being specifically defined formula is: Msvlm(x, y)=a1J1(x,y)+a2J2(x,y)+a3J3(x,y)+a4J4(x, y), wherein a1、a2、a3、a4For variable weight,(x, y) for the image after filtered for J;
Image enhaucament submodule 13:
When | 128 - m | > | &omega; - 50 | 3 Time, L ( x , y ) = 255 &times; ( H ( x , y ) 255 ) &psi; ( x , y ) , Wherein, (x, y) for enhanced gray value for L;(x y) is the gamma correction coefficient including local message, now to ψα be range for 0 to 1 variable element,
When | 128 - m | &le; | &omega; - 50 | 3 And during ω > 50, L ( x , y ) = 255 &times; ( H ( x , y ) 255 ) &psi; ( x , y ) &times; ( 1 &omega; - 50 &omega; 2 ) , Wherein &psi; ( x , y ) = &psi; &alpha; ( M s v l m ( x , y ) ) , &alpha; = 1 - | 128 - m i n ( m L , m H ) 128 | , mHIt is the average of the gray value all pixels higher than 128, m in imageLIt is the average of the gray value all pixels lower than 128, and now m=min (mH, mL), when α value is known, calculates 256 ψ correction coefficients as look-up table, be designated asWherein i is index value, utilizes Msvlm(x, gray value y) is as index, according to ψ (x, y)=ψα(Msvlm(x, y)) quickly obtain each pixel in image gamma correction coefficient ψ (x, y);For template correction factor;
(2) detecting and tracking module 2, specifically includes structure submodule 21, loses differentiation submodule 22 and update submodule 23:
Build submodule 21, for the structure of visual dictionary:
Obtain the position and yardstick of following the tracks of target at initial frame, choosing positive and negative sample training tracker about, result will be followed the tracks of as training set X={x1,x2,……xN}T;And the every width target image in training set is extracted the SIFT feature of 128 dimensionsWherein StThe number of SIFT feature in t width target image in expression training set;After following the tracks of N frame, by clustering algorithm, these features are divided into K bunch, the center constitutive characteristic word of each bunch, it is designated asThe feature total amount that can extractWherein K < < FN, andAfter visual dictionary builds, every width training image is expressed as the form of feature bag, for representing the frequency that in visual dictionary, feature word occurs, with rectangular histogram h (xt) represent, h (xt) obtain in the following manner: by a width training image XtIn each feature fs (t)Projecting to visual dictionary, the feature word the shortest with projector distance represents this feature, after all Projection Characters, adds up the frequency of occurrences of each feature word, and normalization obtains training image XtFeature histogram h (xt);
Lose and differentiate submodule 22, for differentiating that the loss of target is whether:
When a new two field picture arrives, from K histogram, randomly select Z < K histogram, and Z=7, form the new sub-rectangular histogram h being sized to Z(z)(xt), sub histogrammic number is up toIndividual;Calculate candidate target region son histogrammic similarity Ф corresponding to certain target area in training sett_z,Wherein t=1,2 ..., N, z=1,2 ..., Ns, then calculate overall similarity Фt=1-∏z(1-Фt_z);Similarity Ф=max{ Ф of candidate target region and targett, t} represents, then track rejection judges that formula is: u = s i g n ( &Phi; ) = 1 &Phi; &GreaterEqual; g s 0 &Phi; < g s , Wherein gs be manually set sentence mistake threshold values;As u=1, target is by tenacious tracking, as u=0, and track rejection;
When track rejection, define affine Transform Model: x t y t = s . c o s ( &mu; 1 &times; &theta; ) s . s i n ( &mu; 1 &times; &theta; ) - s . s i n ( &mu; 1 &times; &theta; ) s . c o s ( &mu; 1 &times; &theta; ) x t - 1 y t - 1 + &mu; 2 e f , Wherein (xt,yt) and (xt-1,yt-1) the respectively position coordinates of certain SITF characteristic point and the position coordinates of Corresponding matching characteristic point in previous frame target in present frame target, both are known quantity;S is scale coefficient, and θ is coefficient of rotary, and e and f represents translation coefficient, &mu; 1 = 1 - | T - T 0 | 1000 T 0 T &GreaterEqual; T 0 1 + | T - T 0 | 1000 T 0 T < T 0 For temperature rotation correction coefficient, &mu; 2 = 1 - | T - T 0 | 1000 T 0 T &GreaterEqual; T 0 1 + | T - T 0 | 1000 T 0 T < T 0 Correction factor, μ is translated for temperature1And μ2For revising because the image rotation that causes of ambient temperature deviation and translation error, T0For the standard temperature being manually set, being set to 20 degree, T is monitored the temperature value obtained in real time by temperature sensor;Adopt Ransac algorithm for estimating to ask for the parameter of affine Transform Model, under new yardstick s and coefficient of rotary θ, finally gather positive negative sample, update grader;
Update submodule 23, for the renewal of visual dictionary:
After every two field picture obtains target location, the result of calculation according to affine transformation parameter, collect all SIFT feature points meeting result parameterAfter F=6 frame, it is thus achieved that new feature point setWherein St-FRepresent the total characteristic obtained from F two field picture to count;Utilize following formula that new and old characteristic point re-starts K cluster: WhereinRepresenting new visual dictionary, the size of visual dictionary remains unchanged;It is forgetting factor, it was shown that proportion shared by old dictionary,More little, the judgement of track rejection is contributed more many by new feature, takes
(3) output module 3 is identified, identification and output for image: utilize track algorithm to obtain target area in image sequence to be identified, target area is mapped to the subspace that known training data is formed, calculate the distance between target area and training data in subspace, obtain similarity measurement, judge target classification, and export recognition result.
Preferably, adopting Wiener filtering to carry out after first-level filtering removes, now image information also includes the noise of remnants, adopts following two-stage filter to carry out secondary filtering:
J ( x , y ) = &Sigma; i = - m / 2 m / 2 &Sigma; j = - n / 2 n / 2 H ( x , y ) P g ( x + i , y + j )
Wherein, J (x, y) be after filtering after image;Pg(x+i, y+j) represents the function that yardstick is m × n and Pg(x+i, y+j)=q × exp (-(x2+y2)/ω), wherein q is by the coefficient of function normalization, it may be assumed that ∫ ∫ q × exp (-(x2+y2)/ω) dxdy=1.
The bull-dozer of this embodiment, at image pre-processing phase, the image strengthened can according to the size adaptation adjustment of template, improve reinforced effects, and can automatically revise at the Rule of judgment when different templates size, and consider visual custom and human eye to non-linear relation with colouring intensity of the perceptibility of different color;Take full advantage of local feature and the global characteristics of image, there is adaptivity, it is possible to suppress excessively to strengthen, the image enhancement effects obtained under complex illumination environment is obvious;M × N number of power exponent computing being reduced to 256, improves computational efficiency, Z=7, F=6, φ=0.18, calculating average frame per second is 18FPS, and amount of calculation is less than the dictionary algorithm of same type;At target detection and tracking phase, the error that different temperatures causes the rotation of image and translation to cause can be eliminated, improve discrimination, image detail after treatment becomes apparent from, and amount of calculation is greatly reduced relative to traditional method, it is possible to effectively adapt to target scale change, and can accurately judge whether target loses, can again be detected and tenacious tracking after target comes back to visual field, until remaining to tenacious tracking target after 125 frames.Additionally, this bull-dozer has, real-time is good, the advantage of accurate positioning and strong robustness, and has good effect in quickly having the target detection blocked and tracking, achieves beyond thought effect.
Embodiment 5: as shown in Figure 1-2, a kind of bulldozing device with target recognition function, including bull-dozer 5 and the monitoring device 4 being arranged on bull-dozer 5, monitoring device 4 for carrying out video image monitoring to the activity near bull-dozer 5, and monitoring device 4 includes pretreatment module 1, detecting and tracking module 2, identifies output module 3.
(1) pretreatment module 1, for the image received is carried out pretreatment, specifically includes image transformant module 11, image filtering submodule 12 and image enhaucament submodule 13:
Image transformant module 11, for coloured image is converted into gray level image:
H ( x , y ) = max ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) + min ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) 2 + 2 ( max ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) - min ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) )
Wherein, (x, y), (x, y), (x, (x, y) the intensity red green blue value at place, (x y) represents coordinate (x, y) grey scale pixel value at place to H to B to G to R y) to represent pixel respectively;Image is sized to m × n;
Image filtering submodule 12, for gray level image is filtered:
Adopt Wiener filtering to carry out after first-level filtering removes, define svlm image, be designated as Msvlm(x, y), being specifically defined formula is: Msvlm(x, y)=a1J1(x,y)+a2J2(x,y)+a3J3(x,y)+a4J4(x, y), wherein a1、a2、a3、a4For variable weight,(x, y) for the image after filtered for J;
Image enhaucament submodule 13:
When | 128 - m | > | &omega; - 50 | 3 Time, L ( x , y ) = 255 &times; ( H ( x , y ) 255 ) &psi; ( x , y ) , Wherein, (x, y) for enhanced gray value for L;(x y) is the gamma correction coefficient including local message, now to ψα be range for 0 to 1 variable element,
When | 128 - m | &le; | &omega; - 50 | 3 And during ω > 50, L ( x , y ) = 255 &times; ( H ( x , y ) 255 ) &Phi; ( x , y ) &times; ( 1 - &omega; - 50 &omega; 2 ) , Wherein &psi; ( x , y ) = &psi; &alpha; ( M s v l m ( x , y ) ) , &alpha; = 1 - | 128 - m i n ( m L , m H ) 128 | , mHIt is the average of the gray value all pixels higher than 128, m in imageLIt is the average of the gray value all pixels lower than 128, and now m=min (mH, mL), when α value is known, calculates 256 ψ correction coefficients as look-up table, be designated asWherein i is index value, utilizes Msvlm(x, gray value y) is as index, according to ψ (x, y)=ψα(Msvlm(x, y)) quickly obtain each pixel in image gamma correction coefficient ψ (x, y);For template correction factor;
(2) detecting and tracking module 2, specifically includes structure submodule 21, loses differentiation submodule 22 and update submodule 23:
Build submodule 21, for the structure of visual dictionary:
Obtain the position and yardstick of following the tracks of target at initial frame, choosing positive and negative sample training tracker about, result will be followed the tracks of as training set X={x1,x2,……xN}T;And the every width target image in training set is extracted the SIFT feature of 128 dimensionsWherein StThe number of SIFT feature in t width target image in expression training set;After following the tracks of N frame, by clustering algorithm, these features are divided into K bunch, the center constitutive characteristic word of each bunch, it is designated asThe feature total amount that can extractWherein K < < FN, andAfter visual dictionary builds, every width training image is expressed as the form of feature bag, for representing the frequency that in visual dictionary, feature word occurs, with rectangular histogram h (xt) represent, h (xt) obtain in the following manner: by a width training image XtIn each feature fs (t)Projecting to visual dictionary, the feature word the shortest with projector distance represents this feature, after all Projection Characters, adds up the frequency of occurrences of each feature word, and normalization obtains training image XtFeature histogram h (xt);
Lose and differentiate submodule 22, for differentiating that the loss of target is whether:
When a new two field picture arrives, from K histogram, randomly select Z < K histogram, and Z=8, form the new sub-rectangular histogram h being sized to Z(z)(xt), sub histogrammic number is up toIndividual;Calculate candidate target region son histogrammic similarity Ф corresponding to certain target area in training sett_z,Wherein t=1,2 ..., N, z=1,2 ..., Ns, then calculate overall similarity Фt=1-∏z(1-Фt_z);Similarity Ф=max{ Ф of candidate target region and targett, t} represents, then track rejection judges that formula is: u = s i g n ( &Phi; ) = 1 &Phi; &GreaterEqual; g s 0 &Phi; < g s , Wherein gs be manually set sentence mistake threshold values;As u=1, target is by tenacious tracking, as u=0, and track rejection;
When track rejection, define affine Transform Model: x t y t = s . c o s ( &mu; 1 &times; &theta; ) s . s i n ( &mu; 1 &times; &theta; ) - s . s i n ( &mu; 1 &times; &theta; ) s . c o s ( &mu; 1 &times; &theta; ) x t - 1 y t - 1 + &mu; 2 e f , Wherein (xt,yt) and (xt-1,yt-1) the respectively position coordinates of certain SITF characteristic point and the position coordinates of Corresponding matching characteristic point in previous frame target in present frame target, both are known quantity;S is scale coefficient, and θ is coefficient of rotary, and e and f represents translation coefficient, &mu; 1 = 1 - | T - T 0 | 1000 T 0 T &GreaterEqual; T 0 1 + | T - T 0 | 1000 T 0 T < T 0 For temperature rotation correction coefficient, &mu; 2 = 1 - | T - T 0 | 1000 T 0 T &GreaterEqual; T 0 1 + | T - T 0 | 1000 T 0 T < T 0 Correction factor, μ is translated for temperature1And μ2For revising because the image rotation that causes of ambient temperature deviation and translation error, T0For the standard temperature being manually set, being set to 20 degree, T is monitored the temperature value obtained in real time by temperature sensor;Adopt Ransac algorithm for estimating to ask for the parameter of affine Transform Model, under new yardstick s and coefficient of rotary θ, finally gather positive negative sample, update grader;
Update submodule 23, for the renewal of visual dictionary:
After every two field picture obtains target location, the result of calculation according to affine transformation parameter, collect all SIFT feature points meeting result parameterAfter F=7 frame, it is thus achieved that new feature point setWherein St-FRepresent the total characteristic obtained from F two field picture to count;Utilize following formula that new and old characteristic point re-starts K cluster: WhereinRepresenting new visual dictionary, the size of visual dictionary remains unchanged;It is forgetting factor, it was shown that proportion shared by old dictionary,More little, the judgement of track rejection is contributed more many by new feature, takes
(3) output module 3 is identified, identification and output for image: utilize track algorithm to obtain target area in image sequence to be identified, target area is mapped to the subspace that known training data is formed, calculate the distance between target area and training data in subspace, obtain similarity measurement, judge target classification, and export recognition result.
Preferably, adopting Wiener filtering to carry out after first-level filtering removes, now image information also includes the noise of remnants, adopts following two-stage filter to carry out secondary filtering:
J ( x , y ) = &Sigma; i = - m / 2 m / 2 &Sigma; j = - n / 2 n / 2 H ( x , y ) P g ( x + i , y + j )
Wherein, J (x, y) be after filtering after image;Pg(x+i, y+j) represents the function that yardstick is m × n and Pg(x+i, y+j)=q × exp (-(x2+y2)/ω), wherein q is by the coefficient of function normalization, it may be assumed that ∫ ∫ q × exp (-(x2+y2)/ω) dxdy=1.
The bull-dozer of this embodiment, at image pre-processing phase, the image strengthened can according to the size adaptation adjustment of template, improve reinforced effects, and can automatically revise at the Rule of judgment when different templates size, and consider visual custom and human eye to non-linear relation with colouring intensity of the perceptibility of different color;Take full advantage of local feature and the global characteristics of image, there is adaptivity, it is possible to suppress excessively to strengthen, the image enhancement effects obtained under complex illumination environment is obvious;M × N number of power exponent computing is reduced to 256, improves computational efficiency, Z=8, F=7,Calculating average frame per second is 19FPS, and amount of calculation is less than the dictionary algorithm of same type;At target detection and tracking phase, the error that different temperatures causes the rotation of image and translation to cause can be eliminated, improve discrimination, image detail after treatment becomes apparent from, and amount of calculation is greatly reduced relative to traditional method, it is possible to effectively adapt to target scale change, and can accurately judge whether target loses, can again be detected and tenacious tracking after target comes back to visual field, until remaining to tenacious tracking target after 130 frames.Additionally, this bull-dozer has, real-time is good, the advantage of accurate positioning and strong robustness, and has good effect in quickly having the target detection blocked and tracking, achieves beyond thought effect.

Claims (2)

1. a bulldozing device with target recognition function, including bull-dozer and the monitoring device being arranged on bull-dozer, monitoring device, for the activity near bull-dozer is carried out video image monitoring, is characterized in that, monitoring device includes pretreatment module, detecting and tracking module, identifies output module;
(1) pretreatment module, for the image received is carried out pretreatment, specifically includes image transformant module, image filtering submodule and image enhaucament submodule:
Image transformant module, for coloured image is converted into gray level image:
H ( x , y ) = max ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) + m i n ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) 2 + 2 ( max ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) - m i n ( R ( x , y ) , G ( x , y ) , B ( x , y ) ) )
Wherein, (x, y), (x, y), (x, (x, y) the intensity red green blue value at place, (x y) represents coordinate (x, y) grey scale pixel value at place to H to B to G to R y) to represent pixel respectively;Image is sized to m × n;
Image filtering submodule, for gray level image is filtered:
Adopt Wiener filtering to carry out after first-level filtering removes, define svlm image, be designated as Msvlm(x, y), being specifically defined formula is: Msvlm(x, y)=a1J1(x, y)+a2J2(x, y)+a3J3(x, y)+a4J4(x, y), wherein a1、a2、a3、a4For variable weight,I=1,2,3,4;(x, y) for the image after filtered for J;
Image enhaucament submodule:
When | 128 - m | > | &omega; - 50 | 3 Time, L ( x , y ) = 255 &times; ( H ( x , y ) 255 ) &psi; ( x , y ) , Wherein, (x, y) for enhanced gray value for L;(x y) is the gamma correction coefficient including local message, now to ψα be range for 0 to 1 variable element,ω is template scale size parameter, and yardstick is more big, and the neighborhood territory pixel information comprised in template is more many, and input picture is through different scale ωiTemplate, the image J obtainediThe neighborhood information of different range will be comprised;
When | 128 - m | &le; | &omega; - 50 | 3 And during ω > 50, L ( x , y ) = 255 &times; ( H ( x , y ) 255 ) &psi; ( x , y ) &times; ( 1 - &omega; - 50 &omega; 2 ) , Wherein &psi; ( x , y ) = &psi; &alpha; ( M s v l m ( x , y ) ) , &alpha; = 1 - | 128 - m i n ( m L , m H ) 128 | , mHIt is the average of the gray value all pixels higher than 128, m in imageLIt is the average of the gray value all pixels lower than 128, and now m=min (mH, mL), when α value is known, calculates 256 ψ correction coefficients as look-up table, be designated asWherein i is index value, utilizes Msvlm(x, gray value y) is as index, according to ψ (x, y)=ψα(Msvlm(x, y)) quickly obtain each pixel in image gamma correction coefficient ψ (x, y);For template correction factor;
(2) detecting and tracking module, specifically includes structure submodule, loses differentiation submodule and update submodule:
Build submodule, for the structure of visual dictionary:
Obtain the position and yardstick of following the tracks of target at initial frame, choosing positive and negative sample training tracker about, result will be followed the tracks of as training set X={x1, x2... xN}T;And the every width target image in training set is extracted the SIFT feature of 128 dimensionsWherein StThe number of SIFT feature in t width target image in expression training set;After following the tracks of N frame, by clustering algorithm, these features are divided into K bunch, the center constitutive characteristic word of each bunch, it is designated asThe feature total amount that can extractWherein K < < FN, andAfter visual dictionary builds, every width training image is expressed as the form of feature bag, for representing the frequency that in visual dictionary, feature word occurs, with rectangular histogram h (xt) represent, h (xt) obtain in the following manner: by a width training image XtIn each featureProjecting to visual dictionary, the feature word the shortest with projector distance represents this feature, after all Projection Characters, adds up the frequency of occurrences of each feature word, and normalization obtains training image XtFeature histogram h (xt);
Lose and differentiate submodule, for differentiating that the loss of target is whether:
When a new two field picture arrives, from K histogram, randomly select K histogram of Z < and Z=4, form the new sub-rectangular histogram h being sized to Z(z)(xt), sub histogrammic number is up toIndividual;Calculate candidate target region son histogrammic similarity Φ corresponding to certain target area in training sett_z,Wherein t=1,2 ..., N, z=1,2 ..., Ns, then calculate overall similarity Φt=1-Πz(1-Φt_z);Similarity Φ=max{ Φ of candidate target region and targett, t} represents, then track rejection judges that formula is: u = s i g n ( &Phi; ) = 1 &Phi; &GreaterEqual; g s 0 &Phi; < g s , Wherein gs be manually set sentence mistake threshold values;As u=1, target is by tenacious tracking, as u=0, and track rejection;When track rejection, define affine Transform Model: x t y t = s . c o s ( &mu; 1 &times; &theta; ) s . s i n ( &mu; 1 &times; &theta; ) - s . s i n ( &mu; 1 &times; &theta; ) s . c o s ( &mu; 1 &times; &theta; ) x t - 1 y t - 1 + &mu; 2 e f , Wherein (xt, yt) and (xt-1, yt-1) the respectively position coordinates of certain SITF characteristic point and the position coordinates of Corresponding matching characteristic point in previous frame target in present frame target, both are known quantity;S is scale coefficient, and θ is coefficient of rotary, and e and f represents translation coefficient, &mu; 1 = 1 - | T - T 0 | 1000 T 0 T &GreaterEqual; T 0 1 + | T - T 0 | 1000 T 0 T < T 0 For temperature rotation correction coefficient, &mu; 2 = 1 - | T - T 0 | 1000 T 0 T &GreaterEqual; T 0 1 + | T - T 0 | 1000 T 0 T < T 0 Correction factor, μ is translated for temperature1And μ2For revising because the image rotation that causes of ambient temperature deviation and translation error, T0For the standard temperature being manually set, being set to 20 degree, T is monitored the temperature value obtained in real time by temperature sensor;Adopt Ransac algorithm for estimating to ask for the parameter of affine Transform Model, under new yardstick s and coefficient of rotary θ, finally gather positive negative sample, update grader;
Update submodule, for the renewal of visual dictionary:
After every two field picture obtains target location, the result of calculation according to affine transformation parameter, collect all SIFT feature points meeting result parameterAfter F=3 frame, it is thus achieved that new feature point setWherein St-FRepresent the total characteristic obtained from F two field picture to count;Utilize following formula that new and old characteristic point re-starts K cluster: WhereinRepresenting new visual dictionary, the size of visual dictionary remains unchanged;It is forgetting factor, it was shown that proportion shared by old dictionary,More little, the judgement of track rejection is contributed more many by new feature, takes
(3) output module is identified, identification and output for image: utilize track algorithm to obtain target area in image sequence to be identified, target area is mapped to the subspace that known training data is formed, calculate the distance between target area and training data in subspace, obtain similarity measurement, judge target classification, and export recognition result.
2. a kind of bulldozing device with target recognition function according to claim 1, is characterized in that, adopts Wiener filtering to carry out after first-level filtering removes, and now image information also includes the noise of remnants, adopts following two-stage filter to carry out secondary filtering:
J ( x , y ) = &Sigma; i = - m / 2 m / 2 &Sigma; j = - n / 2 n / 2 H ( x , y ) P g ( x + i , y + j )
Wherein, J (x, y) be after filtering after image;Pg(x+i, y+j) represents the function that yardstick is m × n and Pg(x+i, y+j)=q × exp (-(x2+y2)/ω), wherein q is by the coefficient of function normalization, it may be assumed that ∫ ∫ q × exp (-(x2+y2)/ω) dxdy=1.
CN201610046348.3A 2016-01-22 2016-01-22 Bulldozing device with target identification function Pending CN105740771A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610046348.3A CN105740771A (en) 2016-01-22 2016-01-22 Bulldozing device with target identification function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610046348.3A CN105740771A (en) 2016-01-22 2016-01-22 Bulldozing device with target identification function

Publications (1)

Publication Number Publication Date
CN105740771A true CN105740771A (en) 2016-07-06

Family

ID=56246437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610046348.3A Pending CN105740771A (en) 2016-01-22 2016-01-22 Bulldozing device with target identification function

Country Status (1)

Country Link
CN (1) CN105740771A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008987A (en) * 2019-02-20 2019-07-12 深圳大学 Test method, device, terminal and the storage medium of classifier robustness

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810723A (en) * 2014-02-27 2014-05-21 西安电子科技大学 Target tracking method based on inter-frame constraint super-pixel encoding

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810723A (en) * 2014-02-27 2014-05-21 西安电子科技大学 Target tracking method based on inter-frame constraint super-pixel encoding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴京辉: "视频监控目标的跟踪与识别研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008987A (en) * 2019-02-20 2019-07-12 深圳大学 Test method, device, terminal and the storage medium of classifier robustness
CN110008987B (en) * 2019-02-20 2022-02-22 深圳大学 Method and device for testing robustness of classifier, terminal and storage medium

Similar Documents

Publication Publication Date Title
US10217229B2 (en) Method and system for tracking moving objects based on optical flow method
CN101739686B (en) Moving object tracking method and system thereof
CN100474337C (en) Noise-possessing movement fuzzy image restoration method based on radial basis nerve network
CN109147368A (en) Intelligent driving control method device and electronic equipment based on lane line
CN103364410B (en) Crack detection method of hydraulic concrete structure underwater surface based on template search
CN104318258A (en) Time domain fuzzy and kalman filter-based lane detection method
CN101968886B (en) Centroid tracking framework based particle filter and mean shift cell tracking method
CN104134222A (en) Traffic flow monitoring image detecting and tracking system and method based on multi-feature fusion
KR101067437B1 (en) Lane detection method and Detecting system using the same
CN103902985B (en) High-robustness real-time lane detection algorithm based on ROI
CN105809715B (en) A kind of visual movement object detection method adding up transformation matrices based on interframe
CN104616006B (en) A kind of beard method for detecting human face towards monitor video
CN105678213A (en) Dual-mode masked man event automatic detection method based on video characteristic statistics
CN105718896A (en) Intelligent robot with target recognition function
CN105718895A (en) Unmanned aerial vehicle based on visual characteristics
Tang et al. Leaf extraction from complicated background
CN105740768A (en) Unmanned forklift device based on combination of global and local features
CN105740771A (en) Bulldozing device with target identification function
CN106128105A (en) A kind of traffic intersection pedestrian behavior monitoring system
CN104240268A (en) Pedestrian tracking method based on manifold learning and sparse representation
CN105718897A (en) Numerical control lathe based on visual characteristics
CN105574517A (en) Electric vehicle charging pile with stable tracking function
CN112801056B (en) Method and system for determining state of muck truck roof based on local image classification
CN105718894A (en) Sewage treatment device with stable tracking function
CN105718899A (en) Solar water heater based on visual characteristics

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160706

RJ01 Rejection of invention patent application after publication