CN114387308A - Machine vision characteristic tracking system - Google Patents

Machine vision characteristic tracking system Download PDF

Info

Publication number
CN114387308A
CN114387308A CN202210031232.8A CN202210031232A CN114387308A CN 114387308 A CN114387308 A CN 114387308A CN 202210031232 A CN202210031232 A CN 202210031232A CN 114387308 A CN114387308 A CN 114387308A
Authority
CN
China
Prior art keywords
feature
module
target
image
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210031232.8A
Other languages
Chinese (zh)
Inventor
谢益强
尹勇
谢一智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Kelaishi Technology Co ltd
Original Assignee
Zhejiang Kelaishi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Kelaishi Technology Co ltd filed Critical Zhejiang Kelaishi Technology Co ltd
Priority to CN202210031232.8A priority Critical patent/CN114387308A/en
Publication of CN114387308A publication Critical patent/CN114387308A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses a machine vision characteristic tracking system, which comprises a reference target characteristic extraction unit, an undetermined target characteristic extraction unit, a similarity analysis unit, a target characteristic confirmation unit and a characteristic updating unit, wherein the reference target characteristic extraction unit is used for extracting a target characteristic to be determined; the reference target feature extraction unit comprises an image acquisition module, a feature extraction module, a feature association module and a reference target feature output module; the feature extraction module comprises a feature identification submodule, a feature cutting submodule, a first super-resolution reconstruction submodule and a perspective change processing submodule; the undetermined target feature extraction unit comprises an image acquisition processing module and a feature automatic labeling module. The invention can realize the accuracy of target feature identification, and can utilize reinforcement learning and deep neural network technology to automatically label the features in the candidate target image to be determined, so that the model can quickly and accurately realize the automatic labeling of the target features in a learning mode closer to the brain of a human, and the accuracy of moving target tracking can be effectively improved.

Description

Machine vision characteristic tracking system
Technical Field
The invention relates to the technical field of tracking systems, in particular to a machine vision characteristic tracking system.
Background
With the rapid development of moving object detection technology, a variety of methods for detecting a moving object are correspondingly generated, for example, in the prior art, a corresponding detection method is established based on color features, motion information, a motion model and the like of the moving object, and feature detection and tracking of the moving object are important basic and key technologies for research, for example, features of an image sequence shot by a hand and a face of a person in a motion state can be detected and tracked, and then recognition of gestures of the person, the face of the person and the like can be realized.
When tracking a moving target, a moving target tracking algorithm is a comparative basis and is also a relatively important part, and mainly refers to a technology for automatically tracking and detecting the position of the moving target in a video frame sequence by defining a corresponding mathematical model and a detection algorithm. When the moving target is tracked, modeling expression needs to be carried out on the moving target, and the main approaches are to use a point-based model, a contour-based model and a density kernel-based model. The methods firstly construct a model according to a moving target to be tracked, and then carry out model matching on line to obtain the tracking effect of the moving target. With the continuous improvement of the requirement of video intelligent analysis, the requirement of target tracking accuracy is also continuously improved, so that the invention provides a high-precision machine vision characteristic tracking system.
Disclosure of Invention
The present invention is directed to a machine vision feature tracking system, which overcomes the above-mentioned problems of the related art.
Therefore, the invention adopts the following specific technical scheme:
a machine vision characteristic tracking system comprises a reference target characteristic extraction unit, an undetermined target characteristic extraction unit, a similarity analysis unit, a target characteristic confirmation unit and a characteristic updating unit;
the reference target feature extraction unit is used for extracting features in a first frame of a plurality of frames of the detection video to obtain reference target features;
the undetermined target feature extraction unit is used for extracting the features of the tracked undetermined candidate target in the detection video to obtain the characteristics of the undetermined target;
the similarity analysis unit is used for comparing the similarity of the reference target characteristic and the undetermined target characteristic;
the target characteristic confirming unit is used for confirming whether the characteristic of the undetermined target is a reference target characteristic or not according to the result of the similarity comparison, and when the similarity result is larger than or equal to the matching threshold value, confirming that the characteristic of the undetermined target is a target tracking characteristic;
the characteristic updating unit is used for updating the target tracking characteristic into a new reference target characteristic.
Further, the reference target feature extraction unit comprises an image acquisition module, a feature extraction module, a feature association module and a reference target feature output module;
the image acquisition module is used for acquiring a first frame image in a plurality of frames of the detection video;
the feature extraction module is used for extracting features of the first frame image;
the characteristic correlation module is used for describing a random linear relation between the reference target image data and the characteristic data by adopting a multiple linear regression model;
the reference target feature output module is used for outputting the features of the reference target.
Further, the feature extraction module comprises a feature identification sub-module, a feature cutting sub-module, a first super-resolution reconstruction sub-module and a perspective change processing sub-module;
the characteristic identification submodule is used for identifying characteristics in the first frame of image;
the characteristic cutting submodule is used for cutting the characteristic part in the first frame image;
the first super-resolution reconstruction submodule is used for carrying out super-resolution reconstruction processing on the cut low-resolution characteristic image;
and the perspective change processing submodule is used for processing the image with the angle deviation in the cut characteristic image according to the perspective transformation principle.
Further, the feature correlation module, when describing the random linear relationship between the reference target image data and the feature data by using the multiple linear regression model, comprises the following steps:
and describing a random linear relation between the reference target image data and the characteristic data by adopting a multiple linear regression model, wherein the relation is as follows:
yi=β01xi12xi23xi3i,i=1,2,…,n;
wherein y represents a reference target image feature evaluation data index, x1Representing the number of features, x2Indicates the type of feature, x3Presentation feature data packet convergence protocol, beta0,β1,β2,β3Denotes the regression coefficient,. epsiloniRepresents a random error term, and each epsiloniIndependent of each other and obey N (0, sigma)2) Distribution, n represents sample volume, and n sample observations are as follows:
(yi,xi1,xi2,xi3)i=1,2,…,n。
furthermore, the undetermined target feature extraction unit comprises an image acquisition processing module and a feature automatic labeling module;
the image acquisition processing module is used for acquiring a tracked candidate target image to be determined in the detection video and processing the image;
the characteristic automatic labeling module is used for automatically labeling the characteristics in the candidate target image to be determined by utilizing the reinforcement learning and deep neural network technology.
Furthermore, the image acquisition processing module comprises an image acquisition submodule, a resolution verification submodule and a second super-resolution reconstruction submodule;
the image acquisition submodule is used for acquiring a tracked undetermined candidate target image in a detection video;
the resolution verification sub-module is used for verifying the resolution of the candidate target image to be determined;
and the second hyper-resolution reconstruction submodule is used for carrying out hyper-resolution reconstruction processing on the candidate target image to be determined, the resolution of which is lower than a preset threshold.
Further, the automatic feature labeling module, when automatically labeling the features in the candidate target image to be determined by using the reinforcement learning and deep neural network technology, comprises the following steps:
carrying out multi-scale super-pixel division on a reference target image by using an SILC algorithm, and marking a characteristic probability threshold of the super-pixel;
constructing a super-pixel classification training set, and training the marked super-pixels by adopting a machine-based learning method to obtain a learning model;
constructing a training set for labeling a model, and automatically labeling and segmenting a characteristic region based on end-to-end learning of a deep neural network;
constructing an annotation model, and testing the constructed annotation model by using pre-prepared characteristic image data;
automatically labeling the characteristics in the candidate target image to be determined by using the tested labeling model;
the learning model is used for classifying the superpixels in the classification training set, artificially awarding and punishing the classification results, feeding the awarding and punishing results back to the learning model, readjusting the learning model through a awarding and punishing mechanism, and circulating until the learning model reaches the optimal state to obtain the labeling information of the characteristic region in the candidate target image to be determined;
the deep neural network is a ResNet network, and the ResNet network comprises an Identity Block and a Conv Block; the input and output dimensions of the Identity Block are consistent, the input and output dimensions of the Conv Block are inconsistent, and the Identity Block can be connected in series.
Further, the multi-scale superpixel division of the reference target image by using the SILC algorithm comprises the following steps:
initializing a seed point: according to the set number of the super pixels, uniformly distributing initial seed points in the candidate target image to be determined;
reselecting secondary seed points in the n x n neighborhood of the initial seed points;
distributing a class label to each pixel point in the neighborhood around each secondary seed point;
distance measurement: for each searched pixel point, respectively calculating the distance between the pixel point and the secondary seed point;
performing iterative optimization;
enhancing connectivity;
the distance measurement comprises a color distance and a space distance, and the distance calculation method comprises the following steps:
Figure BDA0003466541170000041
Figure BDA0003466541170000042
Figure BDA0003466541170000043
dcindicating the color distance, dsRepresenting spatial distance, NsRepresents the maximum spatial distance within a class, defined as NsS sqrt (N/K), where the metric of color similarity is the L1 norm of the L, a, b color space and the metric of color proximity is the image two-dimensional coordinate space x, y, thus summarizingThe resultant metric is [ l, a, b, x, y]A five-dimensional space;
maximum color distance NcNot only different from picture to picture, but also different from cluster to cluster, a fixed constant m is taken for substitution, and the final distance measurement D' is as follows:
Figure BDA0003466541170000051
further, training the marked superpixels by using a machine learning-based method to obtain a learning model comprises the following steps:
and (3) convolution process: using a trainable filter fxDeconvoluting an input image, obtaining the input image in the first stage and the convolution characteristic map in the later stage, and then adding an offset bxObtaining a convolutional layer Cx
And (3) sub-sampling process: summing four pixels in the neighborhood to obtain a total pixel, weighting by a scalar W, increasing the bias b, and generating a feature mapping image S reduced by four times by a sigmoid activation functionx+1
Wherein, the convolution layer CxThe calculation formula is as follows:
Cx=fx(W,input)+bx
feature map Sx+1The calculation formula is as follows:
Sx+1=sigmoid[W·(xi,j+xi+1,j+xi,j+1+xi+1,j+1)]。
further, the constructing of the annotation model comprises the following steps:
determining mean-IOU as a target function;
solving the objective function to obtain a labeling model which enables the objective function value to be minimum;
the calculation formula of the objective function is as follows:
Figure BDA0003466541170000052
in the formula, IOU is the overlapping ratio between the generated candidate frame and the original mark frame, area (c) is the area of the candidate frame, and area (g) is the area of the original mark frame.
The invention has the beneficial effects that: by arranging the reference target feature extraction unit, the undetermined target feature extraction unit, the similarity analysis unit, the target feature confirmation unit and the feature updating unit, the reference feature extraction of the moving target can be carried out under the action of the reference target feature extraction unit to obtain the reference target feature of the moving target, the features in the undetermined candidate target image can be automatically labeled by utilizing the reinforcement learning and deep neural network technology under the action of the undetermined target feature extraction unit to obtain the undetermined target feature, and whether the undetermined target feature is the target tracking feature can be analyzed and obtained based on the comparison of the similarity of the reference target feature and the target feature to be determined, so that the rapid tracking of the moving target feature is realized, compared with the traditional feature tracking method, the invention combines the hyper-resolution reconstruction and perspective transformation technology with the target feature extraction, the method can realize the improvement of the quality of the input image, thereby realizing the accuracy of target feature recognition, and simultaneously, the method utilizes the reinforcement learning and deep neural network technology to automatically label the features in the candidate target image to be determined, so that the model can quickly and accurately realize the automatic labeling of the target features in a learning mode closer to the brain of a human, thereby effectively improving the accuracy of the tracking of the moving target.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a block diagram of a machine vision feature tracking system according to an embodiment of the present invention;
FIG. 2 is a block diagram of a feature extraction module in a machine vision feature tracking system according to an embodiment of the present invention;
fig. 3 is a block diagram of a chinese image acquisition processing module of a machine vision characteristic tracking system according to an embodiment of the present invention.
In the figure:
1. a reference target feature extraction unit; 11. an image acquisition module; 12. a feature extraction module; 121. a feature identification submodule; 122. a feature clipping submodule; 123. a first hyper-resolution reconstruction sub-module; 124. a perspective change processing submodule; 13. a feature association module; 14. a reference target feature output module; 2. an undetermined target feature extraction unit; 21. an image acquisition processing module; 211. an image acquisition submodule; 212. a resolution verification sub-module; 213. a second super-resolution reconstruction submodule; 22. a characteristic automatic labeling module; 3. a similarity analysis unit; 4. a target feature confirmation unit; 5. and a feature updating unit.
Detailed Description
For further explanation of the various embodiments, the drawings which form a part of the disclosure and which are incorporated in and constitute a part of this specification, illustrate embodiments and, together with the description, serve to explain the principles of operation of the embodiments, and to enable others of ordinary skill in the art to understand the various embodiments and advantages of the invention, and, by reference to these figures, reference is made to the accompanying drawings, which are not to scale and wherein like reference numerals generally refer to like elements.
According to an embodiment of the present invention, a machine vision feature tracking system is provided.
Referring to the drawings and the detailed description, as shown in fig. 1-3, a machine vision feature tracking system according to an embodiment of the present invention includes a reference target feature extraction unit 1, an undetermined target feature extraction unit 2, a similarity analysis unit 3, a target feature confirmation unit 4, and a feature update unit 5;
the reference target feature extraction unit 1 is configured to extract features in a first frame of multiple frames of a detection video to obtain reference target features;
specifically, the reference target feature extraction unit 1 includes an image acquisition module 11, a feature extraction module 12, a feature association module 13, and a reference target feature output module 14;
the image acquisition module 11 is configured to acquire a first frame image of a plurality of frames of a detection video; the feature extraction module 12 is configured to perform feature extraction on the first frame image; the feature association module 13 is configured to describe a random linear relationship between the reference target image data and the feature data by using a multiple linear regression model; the reference target feature output module 14 is configured to output a feature of the reference target.
The feature extraction module 12 includes a feature identification submodule 121, a feature clipping submodule 122, a first super-resolution reconstruction submodule 123 and a perspective change processing submodule 124;
the feature identification submodule 121 is configured to identify features in the first frame image; the feature cropping submodule 122 is configured to crop a feature portion in the first frame image; the first hyper-resolution reconstruction submodule 123 is configured to perform hyper-resolution reconstruction processing on the clipped low-resolution feature image; the perspective change processing submodule 124 is configured to process an image with an angle deviation in the cropped feature image according to a perspective transformation principle.
The first hyper-resolution reconstruction sub-module 123, when performing the hyper-resolution reconstruction processing on the clipped low-resolution feature image, includes the following steps:
importing the characteristic image with the resolution ratio lower than a preset threshold value into an SRGAN network;
setting core operation parameters of amplification factor, learning rate and iteration times in the SRGAN network;
processing step by step through three convolution layers in the SRGAN network to generate a high-resolution image;
specifically, the step-by-step processing of the three convolutional layers in the SRGAN network includes the following steps:
firstly, extracting feature points of the feature image by using a first convolution layer, then carrying out nonlinear mapping on the feature points by using a second convolution layer to realize the prediction of missing details of each feature point, and finally, combining the mapped image by using a third layer of convolution layer to generate a high-resolution feature image.
And obtaining the output high-resolution characteristic image.
The perspective change processing sub-module 124, when processing the image with the angle deviation in the cut feature image according to the perspective transformation principle, includes the following steps:
calling a getPerspectivetTransform function in OpenCV to process the cut characteristic image to obtain a perspective transformation matrix;
and calling a warpPeractive function in OpenCV to execute perspective transformation to obtain a feature image after the perspective transformation.
The feature association module 13, when describing the random linear relationship between the reference target image data and the feature data by using the multiple linear regression model, includes the following steps:
and describing a random linear relation between the reference target image data and the characteristic data by adopting a multiple linear regression model, wherein the relation is as follows:
yi=β01xi12xi23xi3i,i=1,2,…,n;
wherein y represents a reference target image feature evaluation data index, x1Representing the number of features, x2Indicates the type of feature, x3Presentation feature data packet convergence protocol, beta0,β1,β2,β3Denotes the regression coefficient,. epsiloniRepresents a random error term, and each epsiloniIndependent of each other and obey N (0, sigma)2) Distribution, n represents sample volume, and n sample observations are as follows:
(yi,xi1,xi2,xi3)i=1,2,…,n。
the undetermined target feature extraction unit 2 is used for extracting the features of the tracked undetermined candidate target in the detection video to obtain the characteristics of the undetermined target;
specifically, the undetermined target feature extraction unit 2 comprises an image acquisition processing module 21 and a feature automatic labeling module 22;
the image acquisition processing module 21 is configured to acquire a to-be-determined candidate target image tracked in the detection video and perform image processing; the automatic feature labeling module 22 is configured to perform automatic labeling on features in the candidate target image to be determined by using reinforcement learning and deep neural network technology.
The image acquisition processing module 21 comprises an image acquisition submodule 211, a resolution verification submodule 212 and a second super-resolution reconstruction submodule 213;
the image acquisition submodule 211 is used for acquiring a tracked pending candidate target image in the detection video; the resolution verification sub-module 212 is used for verifying the resolution of the candidate target image to be determined; the second hyper-resolution reconstruction submodule 213 is configured to perform hyper-resolution reconstruction processing on the candidate target image to be determined whose resolution is lower than the preset threshold.
The automatic feature labeling module 22, when using reinforcement learning and deep neural network technology to automatically label the features in the candidate target image to be determined, includes the following steps:
s1, carrying out multi-scale superpixel division on the reference target image by adopting an SILC algorithm, and marking a characteristic probability threshold of the superpixel;
specifically, the multi-scale superpixel division of the reference target image by using the SILC algorithm comprises the following steps:
initializing a seed point: according to the set number of the super pixels, uniformly distributing initial seed points in the candidate target image to be determined;
reselecting secondary seed points in the n x n neighborhood of the initial seed points;
distributing a class label to each pixel point in the neighborhood around each secondary seed point;
distance measurement: for each searched pixel point, respectively calculating the distance between the pixel point and the secondary seed point;
performing iterative optimization; theoretically, the steps are iterated continuously until the error is converged, and practice shows that 10 iterations can obtain a relatively ideal effect on most pictures, so that the general iteration number is 10;
enhancing connectivity; the following defects may occur through the iterative optimization: multiple connectivity situations, super-pixel undersize, single super-pixel being cut into multiple discontinuous super-pixels, etc., occur, which can be addressed by enhancing connectivity. The main idea is as follows: and (3) newly building a mark table, wherein the elements in the table are all-1, the discontinuous superpixels and the oversize superpixels are redistributed to the adjacent superpixels according to the Z-shaped trend (from left to right and from top to bottom), and the traversed pixel points are distributed to the corresponding labels until all the points are traversed.
The distance measurement comprises a color distance and a space distance, and the distance calculation method comprises the following steps:
Figure BDA0003466541170000101
Figure BDA0003466541170000102
Figure BDA0003466541170000103
dcindicating the color distance, dsRepresenting spatial distance, NsRepresents the maximum spatial distance within a class, defined as NsS sqrt (N/K), where the metric of color similarity is the L1 norm of the L, a, b color space, and the metric of color proximity is the image two-dimensional coordinate space x, y, so that the overall metric is [ L, a, b, x, y]A five-dimensional space;
maximum color distance NcNot only different from picture to picture, but also different from cluster to cluster, a fixed constant m is taken for substitution, and the final distance measurement D' is as follows:
Figure BDA0003466541170000104
s2, constructing a super-pixel classification training set, and training the marked super-pixels by adopting a machine-based learning method to obtain a learning model;
the learning model is used for classifying the superpixels in the classification training set, awarding and punishing are manually carried out on the classification results, the awarding and punishing results are fed back to the learning model, the learning model is adjusted again through a awarding and punishing mechanism, and the process is circulated until the learning model reaches the optimal state, so that the labeling information of the characteristic region in the candidate target image to be determined is obtained;
specifically, superpixels obtained by superpixel division are irregular in shape, and the size of the superpixels is unified by the following method:
acquiring a circumscribed rectangle of the super-pixel; the method specifically comprises the following steps:
performing topology analysis on the binary image of the superpixel, determining the surrounding relation of the boundaries, and finding the outermost boundary which is the outline of the superpixel;
and (4) according to a cv2. boundingRef (c) function, taking the contour obtained from the i as a parameter, obtaining an upper left point and a lower right point of the rectangle, and determining the external rectangle of the super pixel.
Calculating the geometric center of the super-pixel circumscribed rectangle; the method specifically comprises the following steps:
intercepting a superpixel block with a specified size in a superpixel external rectangle, and calculating the coordinates (x _, y _) of the upper left corner of the needed superpixel block;
x_=x-round[roi_size-x_len)/2];
y_=y-round[(roi_size-y_len)/2];
where, roi _ size is the super-pixel size predefined by us, and is 128 × 128, x and y are the coordinates of the upper left corner of the super-pixel bounding rectangle, and x _ len and y _ len are the side lengths of the super-pixel bounding rectangle, respectively.
Taking a square from the geometric center of the superpixel to the periphery, and taking the square in the opposite direction when a boundary is met; the method specifically comprises the following steps:
if x _ + roi _ size and y _ + roi _ size do not exceed the image boundary, directly using the center of the super pixel to horizontally and vertically take out the pixel with the size of roi _ size;
if x _ + roi _ size or y _ + roi _ size exceeds the image boundary, directly fetching the roi _ size pixel from the boundary;
if x _ or y _ itself has a pixel size beyond the image boundary, the roi _ size is taken directly from the starting boundary to the horizontal.
The method for training the marked superpixels to obtain the learning model by adopting a machine learning-based method comprises the following steps of:
and (3) convolution process: using a trainable filter fxDeconvoluting an input image, obtaining the input image in the first stage and the convolution characteristic map in the later stage, and then adding an offset bxObtaining a convolutional layer Cx
And (3) sub-sampling process: summing four pixels in the neighborhood to obtain a total pixel, weighting by a scalar W, increasing the bias b, and generating a feature mapping image S reduced by four times by a sigmoid activation functionx+1
Wherein, the convolution layer CxThe calculation formula is as follows:
Cx=fx(W,input)+bx
feature map Sx+1The calculation formula is as follows:
Sx+1=sigmoid[W·(xi,j+xi+1,j+xi,j+1+xi+1,j+1)]。
s3, constructing a training set for labeling the model, and automatically labeling and segmenting the characteristic regions based on end-to-end learning of the deep neural network;
specifically, the deep neural network is a ResNet network, and the ResNet network comprises an Identity Block and a Conv Block; the input and output dimensions of the Identity Block are consistent, the input and output dimensions of the Conv Block are inconsistent, and the Identity Block can be connected in series.
S4, constructing an annotation model, and testing the constructed annotation model by using the pre-prepared characteristic image data;
specifically, the construction of the annotation model comprises the following steps:
determining mean-IOU as a target function;
solving the objective function to obtain a labeling model which enables the objective function value to be minimum;
the calculation formula of the objective function is as follows:
Figure BDA0003466541170000121
in the formula, IOU is the overlapping ratio between the generated candidate frame and the original mark frame, area (c) is the area of the candidate frame, and area (g) is the area of the original mark frame.
S5, automatically labeling the features in the candidate target image to be determined by using the tested labeling model;
the similarity analysis unit 3 is used for comparing the similarity of the reference target feature and the undetermined target feature;
specifically, the similarity analysis unit 3 includes a reference target feature obtaining module for obtaining a reference target feature, an undetermined target feature obtaining module for obtaining an undetermined target feature, and a similarity comparison module for performing similarity comparison between the reference target feature and the undetermined target feature.
The target feature confirming unit 4 is configured to confirm whether the undetermined target feature is a reference target feature according to a result of the similarity comparison, and when the similarity result is greater than or equal to the matching threshold, confirm that the undetermined target feature is a target tracking feature;
the feature updating unit 5 is configured to update the target tracking feature to a new reference target feature.
In summary, by means of the above technical solution of the present invention, through the arrangement of the reference target feature extraction unit 1, the undetermined target feature extraction unit 2, the similarity analysis unit 3, the target feature determination unit 4, and the feature update unit 5, not only the reference feature extraction of the moving target can be performed under the action of the reference target feature extraction unit 1 to obtain the reference target feature of the moving target, but also the features in the undetermined candidate target image can be automatically labeled by using the reinforcement learning and deep neural network technology under the action of the undetermined target feature extraction unit 2 to obtain the undetermined target feature, and then based on the comparison between the similarity of the reference target feature and the target feature to be determined, whether the undetermined target feature is the target tracking feature can be analyzed, so as to implement the fast tracking of the moving target feature, compared with the conventional feature tracking method, according to the invention, the quality of the input image can be improved by combining the hyper-resolution reconstruction and perspective transformation technology with the target feature extraction, so that the accuracy of target feature identification is realized, and meanwhile, the features in the candidate target image to be determined are automatically labeled by utilizing the reinforcement learning and deep neural network technology, so that the model can quickly and accurately realize the automatic labeling of the target features in a learning mode closer to the human brain, and the accuracy of moving target tracking can be effectively improved.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. The machine vision characteristic tracking system is characterized by comprising a reference target characteristic extraction unit (1), an undetermined target characteristic extraction unit (2), a similarity analysis unit (3), a target characteristic confirmation unit (4) and a characteristic updating unit (5);
the reference target feature extraction unit (1) is used for extracting features in a first frame of a plurality of frames of a detection video to obtain reference target features;
the undetermined target feature extraction unit (2) is used for extracting the features of the tracked undetermined candidate target in the detection video to obtain the characteristics of the undetermined target;
the similarity analysis unit (3) is used for comparing the similarity of the reference target feature and the undetermined target feature;
the target feature confirming unit (4) is used for confirming whether the undetermined target feature is a reference target feature according to a similarity comparison result, and when the similarity result is larger than or equal to a matching threshold value, confirming that the undetermined target feature is a target tracking feature;
the characteristic updating unit (5) is used for updating the target tracking characteristic into a new reference target characteristic.
2. The machine vision feature tracking system of claim 1, characterized in that the reference target feature extraction unit (1) comprises an image acquisition module (11), a feature extraction module (12), a feature association module (13) and a reference target feature output module (14);
the image acquisition module (11) is used for acquiring a first frame image in a plurality of frames of the detection video;
the feature extraction module (12) is used for extracting features of the first frame image;
the characteristic correlation module (13) is used for describing a random linear relation between the reference target image data and the characteristic data by adopting a multiple linear regression model;
the reference target feature output module (14) is used for outputting the feature of the reference target.
3. The machine-vision feature tracking system of claim 2, wherein the feature extraction module (12) comprises a feature recognition sub-module (121), a feature cropping sub-module (122), a first hyper-resolution reconstruction sub-module (123), and a perspective change processing sub-module (124);
wherein the feature identification submodule (121) is used for identifying features in the first frame image;
the feature cropping sub-module (122) is used for performing cropping processing on a feature part in the first frame image;
the first super-resolution reconstruction submodule (123) is used for carrying out super-resolution reconstruction processing on the cut low-resolution feature image;
the perspective change processing submodule (124) is used for processing the images with the angle deviation in the cut feature images according to the perspective transformation principle.
4. A machine vision feature tracking system according to claim 2, characterized in that the feature correlation module (13) when using a multiple linear regression model to describe a random linear relationship between the reference target image data and the feature data comprises the steps of:
and describing a random linear relation between the reference target image data and the characteristic data by adopting a multiple linear regression model, wherein the relation is as follows:
yi=β01xi12xi23xi3i,i=1,2,…,n;
wherein y represents a reference target image feature evaluation data index, x1Representing the number of features, x2Indicates the type of feature, x3Presentation feature data packet convergence protocol, beta0,β1,β2,β3Denotes the regression coefficient,. epsiloniRepresents a random error term, and each epsiloniIndependent of each other and obey N (0, sigma)2) Distribution, n represents sample volume, and n sample observations are as follows:
(yi,xi1,xi2,xi3)i=1,2,…,n。
5. the machine vision feature tracking system of claim 1, characterized in that the unit (2) for extracting the features to be targeted comprises an image acquisition processing module (21) and an automatic feature labeling module (22);
the image acquisition processing module (21) is used for acquiring a tracked candidate target image to be determined in a detection video and processing the image;
the automatic feature labeling module (22) is used for automatically labeling the features in the candidate target image to be determined by utilizing the reinforcement learning and deep neural network technology.
6. The machine-vision feature tracking system of claim 5, wherein the image acquisition processing module (21) comprises an image acquisition sub-module (211), a resolution verification sub-module (212), and a second hyper-molecular reconstruction sub-module (213);
wherein the image acquisition sub-module (211) is used for acquiring a tracked pending candidate target image in the detection video;
the resolution verification sub-module (212) is used for verifying the resolution of the candidate target image to be determined;
the second hyper-resolution reconstruction submodule (213) is used for carrying out hyper-resolution reconstruction processing on the candidate target image to be determined, the resolution of which is lower than a preset threshold.
7. The machine-vision feature tracking system of claim 5, wherein the automatic feature labeling module (22) comprises the following steps when automatically labeling features in a candidate target image to be determined by using reinforcement learning and deep neural network technology:
carrying out multi-scale super-pixel division on a reference target image by using an SILC algorithm, and marking a characteristic probability threshold of the super-pixel;
constructing a super-pixel classification training set, and training the marked super-pixels by adopting a machine-based learning method to obtain a learning model;
constructing a training set for labeling a model, and automatically labeling and segmenting a characteristic region based on end-to-end learning of a deep neural network;
constructing an annotation model, and testing the constructed annotation model by using pre-prepared characteristic image data;
automatically labeling the characteristics in the candidate target image to be determined by using the tested labeling model;
the learning model is used for classifying the superpixels in the classification training set, artificially giving rewards and punishment to classification results, feeding the reward and punishment results back to the learning model, adjusting the learning model again through a reward and punishment mechanism, circulating until the learning model reaches the optimal value, and obtaining the labeling information of the characteristic region in the candidate target image to be determined;
the deep neural network is a ResNet network, and the ResNet network comprises an Identity Block and a Conv Block; the input and output dimensions of the Identity Block are consistent, the input and output dimensions of the Conv Block are inconsistent, and the Identity Block can be connected in series.
8. The machine vision feature tracking system of claim 7, wherein the multi-scale superpixel division of the reference target image using SILC algorithm comprises the steps of:
initializing a seed point: uniformly distributing initial seed points in the candidate target image to be determined according to the set number of the super pixels;
reselecting secondary seed points within an n x n neighborhood of the initial seed points;
distributing a class label to each pixel point in the neighborhood around each secondary seed point;
distance measurement: for each searched pixel point, respectively calculating the distance between the pixel point and the secondary seed point;
performing iterative optimization;
enhancing connectivity;
the distance measurement comprises a color distance and a space distance, and the distance calculation method comprises the following steps:
Figure FDA0003466541160000041
Figure FDA0003466541160000042
Figure FDA0003466541160000043
dcindicating the color distance, dsRepresenting spatial distance, NsRepresents the maximum spatial distance within a class, defined as NsS sqrt (N/K), where the metric of color similarity is the L1 norm of the L, a, b color space, and the metric of color proximity is the image two-dimensional coordinate space x, y, so that the overall metric is [ L, a, b, x, y]A five-dimensional space;
maximum color distance NcNot only different from picture to picture, but also different from cluster to cluster, a fixed constant m is taken for substitution, and the final distance measurement D' is as follows:
Figure FDA0003466541160000044
9. the machine vision feature tracking system of claim 7, wherein the training of the labeled superpixels to obtain a learning model using a machine learning-based method comprises the steps of:
and (3) convolution process: using a trainable filter fxDeconvoluting an input image, obtaining the input image in the first stage and the convolution characteristic map in the later stage, and then adding an offset bxObtaining a convolutional layer Cx
And (3) sub-sampling process: summing four pixels in the neighborhood to obtain a total pixel, weighting by a scalar W, increasing the bias b, and generating a feature mapping image S reduced by four times by a sigmoid activation functionx+1
Wherein, the convolution layer CxThe calculation formula is as follows:
Cx=fx(W,input)+bx
feature map Sx+1The calculation formula is as follows:
Sx+1=sigmoid[W·(xi,j+xi+1,j+xi,j+1+xi+1,j+1)]。
10. the machine-vision feature tracking system of claim 6, wherein the building an annotation model comprises the steps of:
determining mean-IOU as a target function;
solving the objective function to obtain a labeling model with the minimum objective function value;
wherein, the calculation formula of the objective function is as follows:
Figure FDA0003466541160000051
in the formula, IOU is the overlapping ratio between the generated candidate frame and the original mark frame, area (c) is the area of the candidate frame, and area (g) is the area of the original mark frame.
CN202210031232.8A 2022-01-12 2022-01-12 Machine vision characteristic tracking system Pending CN114387308A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210031232.8A CN114387308A (en) 2022-01-12 2022-01-12 Machine vision characteristic tracking system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210031232.8A CN114387308A (en) 2022-01-12 2022-01-12 Machine vision characteristic tracking system

Publications (1)

Publication Number Publication Date
CN114387308A true CN114387308A (en) 2022-04-22

Family

ID=81201902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210031232.8A Pending CN114387308A (en) 2022-01-12 2022-01-12 Machine vision characteristic tracking system

Country Status (1)

Country Link
CN (1) CN114387308A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612503A (en) * 2022-05-13 2022-06-10 深圳市巨力方视觉技术有限公司 Image processing type motion monitoring system based on machine vision

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612503A (en) * 2022-05-13 2022-06-10 深圳市巨力方视觉技术有限公司 Image processing type motion monitoring system based on machine vision

Similar Documents

Publication Publication Date Title
Sankaranarayanan et al. Learning from synthetic data: Addressing domain shift for semantic segmentation
Žbontar et al. Stereo matching by training a convolutional neural network to compare image patches
JP4234381B2 (en) Method and computer program product for locating facial features
Gavrila A bayesian, exemplar-based approach to hierarchical shape matching
US8175412B2 (en) Method and apparatus for matching portions of input images
Ruan et al. Multi-correlation filters with triangle-structure constraints for object tracking
CN110363116B (en) Irregular human face correction method, system and medium based on GLD-GAN
CN109410168B (en) Modeling method of convolutional neural network for determining sub-tile classes in an image
CN109685078B (en) Infrared image identification method based on automatic annotation
CN113160192A (en) Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background
CN108269266A (en) Segmentation image is generated using Markov random field optimization
CN108230330B (en) Method for quickly segmenting highway pavement and positioning camera
CA3136674C (en) Methods and systems for crack detection using a fully convolutional network
CN113177592B (en) Image segmentation method and device, computer equipment and storage medium
CN112085534B (en) Attention analysis method, system and storage medium
CN110827304A (en) Traditional Chinese medicine tongue image positioning method and system based on deep convolutional network and level set method
Bellavia et al. HarrisZ+: Harris corner selection for next-gen image matching pipelines
CN110287798B (en) Vector network pedestrian detection method based on feature modularization and context fusion
CN116152266A (en) Segmentation method, device and system for ultrasonic image of puncture needle
WO2022247126A1 (en) Visual localization method and apparatus, and device, medium and program
CN114387308A (en) Machine vision characteristic tracking system
CN107423771B (en) Two-time-phase remote sensing image change detection method
Zhang et al. TPMv2: An end-to-end tomato pose method based on 3D key points detection
CN113496148A (en) Multi-source data fusion method and system
CN115995017A (en) Fruit identification and positioning method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination