CN117437406A - Multi-target detection method and device - Google Patents

Multi-target detection method and device Download PDF

Info

Publication number
CN117437406A
CN117437406A CN202311428780.5A CN202311428780A CN117437406A CN 117437406 A CN117437406 A CN 117437406A CN 202311428780 A CN202311428780 A CN 202311428780A CN 117437406 A CN117437406 A CN 117437406A
Authority
CN
China
Prior art keywords
target
point
region
interest
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311428780.5A
Other languages
Chinese (zh)
Inventor
林守金
鲍克鹏
许涛
林鑫
程文发
王君毅
代阳
周昌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongshan Mltor Cnc Technology Co ltd
Original Assignee
Zhongshan Mltor Cnc Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongshan Mltor Cnc Technology Co ltd filed Critical Zhongshan Mltor Cnc Technology Co ltd
Priority to CN202311428780.5A priority Critical patent/CN117437406A/en
Publication of CN117437406A publication Critical patent/CN117437406A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/457Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by analysing connectivity, e.g. edge linking, connected component analysis or slices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a method and a device for multi-target detection, wherein the method comprises the following steps: preprocessing video data acquired by a video acquisition device in a region to be detected; screening a target region of interest according to the processed video data; judging the target condition through a classifier according to a first preset algorithm and a target region of interest, and determining the class probability of the target region; screening a target area with highest category probability, deducing the displacement and speed of the target by extracting key points and optical flow estimation, and updating the state information of the target interested area; recording the tracking result of the state information, and outputting the tracking result to a storage position designated by a user. The device uses the method, can complete the target detection task in a short time, is suitable for real-time application scenes, has strong robustness to the changes of the shape, the size, the gesture and the like of the target object, can rapidly process a large amount of data and ensures the detection precision.

Description

Multi-target detection method and device
Technical Field
The present disclosure relates to the field of computer vision, and in particular, to a method and apparatus for multi-target detection.
Background
Along with the progress and development of science, the development of artificial intelligence is driven to rapidly develop, the industrialization of China is advanced towards a more intelligent direction, in the intelligent development process, a visual technology is used as a key technology for guaranteeing production safety, key parts of visual detection important equipment are used as important components in guaranteeing production safety, the motion track and abnormal jitter of the important equipment can be monitored under a video monitoring scene, the abnormal activity of the mechanical equipment in the production process is monitored and analyzed in real time, and the motion track and abnormal jitter of the important equipment in the production process are analyzed by means of the technologies of artificial intelligence, deep learning and the like, so that various safety production risks can be effectively reduced.
However, the real-time analysis generally needs to perform complex algorithm operation and large-scale data processing, but in the operation process of the existing visual monitoring device of the equipment, the problem of insufficient calculation power often occurs, a large amount of data cannot be rapidly processed and the detection precision is ensured, so that in the visual detection processing process of the existing equipment, the detection precision and the real-time performance cannot be both achieved.
Disclosure of Invention
In order to solve the problems that when the existing important equipment adopts a visual detection scheme, a large amount of data cannot be processed quickly and the detection accuracy is guaranteed due to the problem of insufficient calculation power, so that the displacement change condition of an object in video can be detected accurately, and video monitoring and analysis can be realized better.
The invention provides the following scheme:
a method of multi-target detection, comprising:
preprocessing video data acquired by a video acquisition device in a region to be detected;
screening a target region of interest according to the processed video data;
judging the target condition through a classifier according to a first preset algorithm and a target region of interest, and determining the class probability of the target region;
screening a target area with highest category probability, deducing the displacement and speed of the target by extracting key points and optical flow estimation, and updating the state information of the target interested area;
recording the tracking result of the state information, and outputting the tracking result to a storage position designated by a user.
The method for multi-target detection as described above, the step of preprocessing the video data collected by the video collecting device in the area to be detected includes:
acquiring video data of an object to be detected, which is acquired in a region to be detected by a video acquisition device;
decoding the video data to generate an image sequence;
determining a histogram of the corresponding dimension based on a second preset algorithm and the image sequence;
normalizing the histogram, and generating a feature vector corresponding to the normalized histogram.
The method for multi-target detection as described above, the step of screening the target region of interest according to the processed video data includes:
Extracting feature vector data corresponding to the histogram, and generating a feature vector data set;
calculating the distance between each data point in the feature vector data set and the nearest neighbor point, and determining the neighborhood radius and the minimum neighborhood number;
acquiring the number of data points in the neighborhood of each data point according to the neighborhood radius and the minimum neighborhood number;
marking each data point as a core point or a noise point according to the number of the data points in the neighborhood;
if the data point is a core point, clustering neighbor points in the neighbor regions of the data point to generate a corresponding data cluster;
and generating a corresponding region of interest according to the core points, the neighbor points and the data clusters.
The method for multi-target detection as described above, after the step of generating the corresponding region of interest according to the core point, the neighbor point and the data cluster, further includes:
determining a target region of interest according to the actual area of the region of interest and the area of the preset region;
dividing the target region of interest based on a watershed algorithm, traversing boundary pixels of the divided target region of interest, and determining pixel points of the cavity;
and filling the pixel points of the cavity based on a filling algorithm, and repairing connectivity of the target region of interest.
The method for multi-target detection as described above, wherein the step of determining the class probability of the target region by determining the target condition through the classifier according to the first preset algorithm and the target region of interest includes:
Generating candidate areas of the video data according to a first preset algorithm and the target area of interest;
predicting boundary frame coordinates and class probabilities of candidate areas;
screening out repeated bounding boxes based on the bounding box coordinates and a third preset algorithm;
and determining the category probability of the target area according to the category probability of the candidate area and the filtered bounding box.
In the above method for multi-target detection, the step of screening the target region with the highest class probability, deducing the displacement and speed of the target by extracting the key points and the optical flow estimation, and updating the state information of the target region of interest comprises the following steps:
screening a target area with highest category probability;
determining a target strength key point based on a key point extraction algorithm and a target area with highest category probability;
predicting the movement amount of the pixel point based on an optical flow estimation algorithm, the target intensity key point and the key point of the previous frame;
calculating the movement amount of the target strength key point and the pixel point, and determining the displacement amount and the running speed of the target strength key point;
and updating the state information of the target region of interest according to the displacement and the running speed of the target strength key points.
An apparatus for multi-target detection, comprising:
The processing module is used for preprocessing video data acquired by the video acquisition device in the region to be detected;
the screening module is used for screening the target region of interest according to the processed video data;
the determining module is used for determining the category probability of the target region according to a first preset algorithm and the target region of interest and judging the target condition through the classifier;
the updating module is used for screening a target area with highest category probability, deducing the displacement and speed of the target by extracting key points and optical flow estimation, and updating the state information of the target interested area;
and the output module is used for recording the tracking result of the state information and outputting the tracking result to a storage position designated by a user.
An apparatus for multi-target detection as described above, the processing module comprising:
the first acquisition unit is used for acquiring video data of the object to be detected, which is acquired by the video acquisition device in the area to be detected;
a first generation unit configured to decode the video data to generate an image sequence;
the first determining unit is used for determining a histogram of the corresponding dimension based on a second preset algorithm and the image sequence;
the second generating unit is used for normalizing the histogram and generating a feature vector corresponding to the normalized histogram;
The screening module comprises:
the third generating unit is used for extracting the feature vector data corresponding to the histogram and generating a feature vector data set;
the second determining unit is used for calculating the distance between each data point in the feature vector data set and the nearest neighbor point and determining the neighborhood radius and the minimum neighborhood number;
the second acquisition unit is used for acquiring the number of data points in the neighborhood of each data point according to the neighborhood radius and the minimum neighborhood number;
the marking unit is used for marking each data point as a core point or a noise point according to the number of the data points in the neighborhood;
a fourth generating unit, configured to cluster neighboring points in the neighboring domain of the data point if the data point is a core point, and generate a corresponding data cluster;
a fifth generating unit, configured to generate a corresponding region of interest according to the core point, the neighbor point and the data cluster;
the third determining unit is used for determining a target region of interest according to the actual area of the region of interest and the area of the preset region;
a fourth determining unit, configured to segment the target region of interest based on a watershed algorithm, traverse boundary pixels of the segmented target region of interest, and determine pixel points of the hole;
the repairing unit is used for filling the pixel points of the cavity based on a filling algorithm and repairing the connectivity of the target region of interest;
The determining module includes:
a sixth generation unit, configured to generate a candidate region of video data according to a first preset algorithm and a target region of interest;
the first prediction unit is used for predicting the boundary frame coordinates and the category probability of the candidate region;
the screening unit is used for screening out repeated boundary frames based on the boundary frame coordinates and a third preset algorithm;
a fifth determining unit, configured to determine a category probability of the target area according to the category probability of the candidate area and the filtered bounding box;
the updating module comprises:
the screening unit is used for screening the target area with the highest category probability;
the sixth determining unit is used for determining the target strength key points based on the key point extraction algorithm and the target area with the highest category probability;
the second prediction unit is used for predicting the movement amount of the pixel point based on the optical flow estimation algorithm, the target intensity key point and the key point of the previous frame;
a seventh determining unit, configured to calculate a target intensity key point and a movement amount of the pixel point, and determine a displacement amount and an operation speed of the target intensity key point;
and the updating unit is used for updating the state information of the target region of interest according to the displacement and the running speed of the target strength key point.
A computer readable storage medium having stored thereon a computer program which, when executed by a multi-object detection apparatus, implements a multi-object detection method as described above.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements a method of multi-object detection as described above when executing the computer program.
According to the embodiment of the invention, the video data acquired by the video acquisition device is preprocessed, the target region of interest is screened, the target condition is judged through the classifier, the class probability of the target region is determined, the target region with the highest class probability is screened, the displacement and speed of the target are estimated and inferred through extracting key points and optical flow, the state information of the target region of interest is updated, the tracking result of the state information is recorded, the tracking result is output to a storage position appointed by a user, so that important equipment can accelerate through a complex parallel computing structure and a high-performance GPU in the process of visual detection, the target detection task can be completed in a shorter time, the method is suitable for a real-time application scene, the end-to-end training can be realized through deep learning, the characteristics can be automatically learned, the method has strong robustness on the shape, the size, the gesture and other changes of a target object, and a large amount of data can be rapidly processed, and the detection precision is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method of multi-target detection according to a first embodiment of the present invention;
fig. 2 is a detailed flowchart of step S11 in fig. 1;
FIG. 3 is a detailed flowchart of step S12 of FIG. 1;
fig. 4 is a detailed flowchart of step S13 in fig. 1;
fig. 5 is a detailed flowchart of step S14 in fig. 1;
FIG. 6 is a block diagram of a multi-object detection apparatus according to a second embodiment of the present invention;
FIG. 7 is a detailed block diagram of the processing module of FIG. 6;
FIG. 8 is a detailed block diagram of the screening module of FIG. 6;
FIG. 9 is a detailed block diagram of the determination module of FIG. 6;
FIG. 10 is a detailed block diagram of the update module of FIG. 6;
fig. 11 is a block diagram of a computer device according to yet another embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention, and that well-known modules, units and their connections, links, communications or operations with each other are not shown or described in detail. Also, the described features, architectures, or functions may be combined in any manner in one or more implementations. It will be appreciated by those skilled in the art that the various embodiments described below are for illustration only and are not intended to limit the scope of the invention. It will be further appreciated that the modules or units or processes of the embodiments described herein and illustrated in the drawings may be combined and designed in a wide variety of different configurations. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The definitions of the various terms or methods set forth in the following embodiments are generally based on the broad concepts that may be practiced with the disclosure in the examples except where logically no such definitions are set forth, and in the following understanding, each specific lower specific definition of a term or method is to be considered an inventive subject matter and should not be interpreted as a narrow sense or as a matter of prejudice to the contrary that the specification does not disclose such a specific definition. Similarly, the order of the steps in the method is flexible and variable on the premise that the steps can be logically implemented, and specific lower limits in various nouns or generalized concepts of the method are within the scope of the invention.
First embodiment:
referring to fig. 1 to 5, the present embodiment provides a multi-target detection method, which includes S11-S15, wherein:
s11, preprocessing video data acquired by the video acquisition device in the region to be detected.
In the embodiment, the mark is designed and stuck in the region to be detected of the object to be detected, and the video data is acquired by utilizing a video acquisition device such as a camera and processed, so that the detection work of the target is better executed.
As a preferred embodiment, but not particularly limited thereto, step S11 includes S111-S114, wherein:
s111, acquiring video data of the object to be detected, which are acquired by the video acquisition device in the area to be detected.
According to the embodiment, the mark is designed and stuck in the to-be-detected area of the to-be-detected object, the video data are collected by the video collecting device such as a camera, and the data can be better analyzed by collecting the video information in the to-be-detected area.
S112, decoding the video data to generate an image sequence.
In this embodiment, the acquired video file is decoded into an image sequence for subsequent processing, and in order to reduce the calculation amount and improve the real-time performance, the video is sampled in frames, and a certain processing is performed on the image.
S113, determining a histogram of the corresponding dimension based on a second preset algorithm and the image sequence.
The image sequence is processed based on the second preset algorithm, and in the factory environment, common noise can be classified into mechanical noise, air noise, pressure noise, electrical noise or human noise, and the noise types may exist in the factory environment at the same time and may interfere with video monitoring. Therefore, when video preprocessing is performed, proper filtering and denoising processing needs to be adopted to reduce the influence of noise on image quality, improve the accuracy of monitoring and analysis, and because the noise and the spatial distribution possibly have different frequency characteristics and spatial distribution, a mode of combining multiple algorithms can be considered when a filtering algorithm is selected so as to obtain better denoising effect.
As a preferred scheme, but not particularly limited, in practical application, the median filtering and gaussian filtering algorithm can be combined according to the characteristics and intensity of noise so as to achieve a good filtering effect. The image or signal is gaussian filtered by inputting the image to be filtered. The Gaussian filter performs weighted average on pixels around each pixel point according to Gaussian distribution through convolution operation, so that the influence of noise is reduced. And carrying out secondary processing on the Gaussian filtered image based on median filtering, sequencing surrounding pixels of each pixel point, and taking the intermediate value as a new value of the pixel point, so that isolated noise points are effectively removed, and the combined filtered image is output.
After image filtering, importing a filtered image to be scaled, and determining the size of the scaled image, wherein the scaling method specifically comprises the following steps:
calculating a width scale between the original image and the target image:
scale_x=target image width/original image width;
height scaling:
scale_y=target image height/original image height;
creating a new target image according to the target size, storing the zoomed image data, calculating the corresponding pixel value of the original image of each pixel in the target image by a bilinear interpolation method, and calculating the floating point coordinate of each pixel position (x ', y') in the target image in the original image:
x=x'/scale_x,y=y'/scale_y;
From the positions of (x, y), the positions of four neighboring pixels on the original image are found: (x 1, y 1), (x 2, y 1), (x 1, y 2), (x 2, y 2), wherein (x 1, y 1) is (floor (x), floor (y)), (x 2, y 2) is (ceil (x), ceil (y)).
Calculating an interpolation coefficient of each pixel according to (x, y) and the distances between four adjacent pixels, and calculating the value of each pixel in the target image by using a bilinear interpolation formula to obtain:
target_pixel_value=(1-fractional_x)*(1-fractional_y)*original_pixel_value(x1,y1)
+fractional_x*(1-fractional_y)*original_pixel_value(x2,y1)
+(1-fractional_x)*fractional_y*original_pixel_value(x1,y2)
+fractional_x*fractional_y*original_pixel_value(x2,y2);
as a preferred solution, but not particularly limited thereto, the YCbCr brightness enhancement algorithm is used to adjust the brightness and contrast of the image, each pixel value of the RGB image being converted into a corresponding Y, cb, cr component value by the following conversion formula:
Y=0.299*R+0.587*G+0.114*B
Cb=0.564*(B-Y)
Cr=0.713*(R-Y)
an image of three channels of Y, cb and Cr can be obtained.
Traversing all pixels of the color image, recording the minimum and maximum pixel values of the red, green and blue channels respectively, determining a desired minimum range and a desired maximum range according to the minimum pixel value and the maximum pixel value, and calculating a stretching function for each channel respectively, namely mapping the pixel value of the current channel into the linear transformation of the desired range. Again mapping the minimum range to 0 and the maximum range to 255, the stretch function can be expressed as:
stretching function= (255/(max-min)) (original pixel value-min);
Mapping the pixel value of each channel through a corresponding stretching function for each pixel, and determining the stretched pixel value; after the above steps are performed, the contrast of the color image will be enhanced. Contrast stretching expands the brightness range and improves the visual effect of the image by linearly remapping the image pixel values to within a desired range so that smaller pixel values are closer to 0 and larger pixel values are closer to 255.
And recombining the Y channel subjected to brightness enhancement with the original Cb and Cr channels to obtain an enhanced YCbCr image, and converting the enhanced YCbCr image into an RGB image through the following reverse conversion formula.
R=Y+1.403*Cr;
G=Y-0.344*Cb-0.714*Cr;
B=Y+1.773*Cb;
The image processed by the YCbCr brightness enhancement algorithm will have an enhanced brightness effect while maintaining the color information of the original image.
As a preferable embodiment, not particularly limited, the compensation process is performed according to the illumination condition of the image, ensuring the visibility of the object. The original color image is decomposed into different RGB color channels. Dividing each color channel into a plurality of small areas, respectively carrying out self-adaptive histogram correction on the color channels in each small area, carrying out histogram adjustment according to the statistical information of pixels in the areas, merging each corrected small area, and reconstructing the corrected small areas into a final color image.
The method comprises the steps of loading a color image to be converted into a computer memory, and converting the color image from an RGB color space into an HSV color space, wherein the HSV color space comprises three components of hue (H), saturation (S) and brightness (V). The required components are selected for the separation operation according to specific requirements.
The converted image is separated into corresponding components according to the desired color channels, the separated color channels are displayed as images, and the separated color channels can be individually displayed as images for observing and analyzing the characteristics of each color channel.
The dimensions (number of bins) and pixel value ranges of the histogram are reasonably determined according to specific requirements. For each color channel, a blank histogram of the corresponding dimension is created, for each pixel in the image, the value of the corresponding color channel is obtained, and the count in the corresponding histogram is incremented by 1 according to the color channel value of the current pixel.
S114, normalizing the histogram, and generating a feature vector corresponding to the normalized histogram.
The present embodiment scales the values in the histogram after the end of the calculation process so that the values of the histogram range between 0 and 1, and visualizes the histogram as a histogram or graph in order to intuitively show the results of the histogram.
And carrying out normalization processing on the calculated histogram to enable the histogram to have consistent expression under different image sizes or brightness changes, combining the normalized histogram into a feature vector which is used as a color feature representation of the image, wherein the dimension of the feature vector depends on the selected color channel and the dimension of the histogram.
S12, screening the target region of interest according to the processed video data.
In the embodiment, the pre-processed video data is subjected to rescreening and determination, the target region of interest in the target region of interest is screened out, the pixel points of the holes are determined, and then the pixel points of the holes are filled based on a filling algorithm, so that the connectivity of the target region of interest is restored, and the data can meet the requirement of detection.
As a preferred embodiment, but not particularly limited thereto, step S12 includes S121-S129, wherein:
s121, extracting feature vector data corresponding to the histogram, and generating a feature vector data set.
According to the embodiment, the region of interest is generated based on DBSCAN cluster analysis, the data after the color histogram feature extraction is organized into a data set, wherein each data point represents a color histogram feature vector, and the data content can be determined more clearly and reliably.
S122, calculating the distance between each data point in the feature vector data set and the nearest neighbor point, and determining the neighborhood radius and the minimum neighborhood number.
According to the embodiment, the distance between each data point and the K nearest neighbor point, namely the K-distance, is calculated, a distance-point diagram is drawn according to the distance between the K nearest neighbor point, and inflection points in the distance-point diagram are selected as reference values of the neighborhood radius.
First, a relation curve between the distance and the number of data points is drawn according to the calculated distance-point diagram, wherein the horizontal axis represents the distance, and the vertical axis represents the number of data points corresponding to the distance.
The curve is observed to find an inflection point on the curve, which refers to the location on the curve where a larger turn occurs, indicating a transition from a region of lower density to a region of higher density. With this inflection point, the minimum neighborhood number is estimated preliminarily. Observing the position of the inflection point, if the number of data points corresponding to the position is lower, the number of the data points where the inflection point is located can be used as the minimum neighborhood number, and if the number of the data points corresponding to the position of the inflection point is higher, the number of data points slightly larger than the minimum neighborhood number can be selected.
S123, acquiring the number of data points in the neighborhood of each data point according to the neighborhood radius and the minimum neighborhood number.
In this embodiment, for each data point, the number of data points in the field is calculated, so that subsequent processing and classification can be better performed, and thus, the corresponding region of interest can be better generated.
S124, marking each data point as a core point or a noise point according to the number of the data points in the neighborhood.
In this embodiment, if the number of data points in the neighborhood is greater than or equal to the point threshold, i.e., minPts, then it is marked as a core point. If the number of data points in the neighborhood is less than MinPts, the noise point is marked.
And S125, if the data point is a core point, clustering neighbor points in the neighbor regions of the data point to generate a corresponding data cluster.
In this embodiment, for each core point, a cluster is constructed by continuously expanding its domain, starting from a core point, finding all neighbor points within its domain radius, if a neighbor point is a core point, adding it to the current cluster, continuing to expand from the core point, if a point in the neighbor point is neither a core point nor a noise point, marking it as a boundary point, and adding it to the current cluster as needed.
When no neighbor point can be added, the current cluster construction is completed, the steps are repeated repeatedly until all core points are traversed, points which are not distributed to any cluster in the process of completing the clustering are marked as noise points, the final clustering result is a group of clusters, and each cluster comprises a group of data points which are similar to each other.
And S126, generating a corresponding region of interest according to the core points, the neighbor points and the data clusters.
According to the embodiment, a plurality of corresponding interested areas are selected in the area according to the core points, the neighbor points and the data clusters, so that screening and judging can be better carried out, and errors are avoided.
S127, determining a target region of interest according to the actual area of the region of interest and the area of the preset region.
Screening the clustering result according to the area of the region, removing the region which is too small or too large, only reserving the region with a proper range, judging and filtering according to the boundary characteristics of the region, and finally determining the target region of interest, wherein the detection process is accurate and reliable and has higher stability.
S128, segmenting the target region of interest based on a watershed algorithm, traversing boundary pixels of the segmented target region of interest, and determining pixel points of the holes.
In this embodiment, for the region to be processed, a watershed algorithm is used to segment the region according to the color information of the image. The watershed algorithm treats the image as a terrain based on gradient information of the image, wherein high gradient values represent peaks and low gradient values represent valleys. The marker image is generated from the image using the concept of watershed transformation. The marker image defines an initial segmentation of the region by running a watershed algorithm and simulating the flow of water from the marked place to the low point, filling the region and separating the different connected regions. According to the result of the watershed algorithm, the position of the cavity is determined and extracted by detecting the area, the shape and other characteristics of the communication area, and the method is accurate and reliable.
And traversing the pixel points on the boundary, and judging connectivity of each pixel point. Whether a pixel belongs to a hole or not can be determined according to the neighborhood of the pixel. If a foreground pixel exists in the neighborhood of the pixel, the pixel is not in a cavity; if the adjacent areas are all background pixel points, the point is indicated to belong to the cavity.
S129, filling the pixel points of the cavity based on a filling algorithm, and repairing connectivity of the target region of interest.
In this embodiment, when a hole or an incomplete area exists, a hole filling algorithm may be used to repair self connectivity to ensure that the area is complete and continuous, and for the pixel points determined to be holes, the hole is filled by adopting different filling strategies, wherein one common filling strategy is to use a seed filling Flood Fill algorithm. Specifically, a pixel point is selected from the hole as a seed point, a neighborhood pixel thereof is accessed, and whether the pixel can be filled as a foreground pixel is judged. If the pixel point can be filled, marking the pixel point as a foreground pixel, and adding the neighborhood pixels thereof into a queue to be filled. And repeating the process until no pixel point to be filled is in the queue, and connecting the filled cavity area to the original area after filling is completed, thereby realizing the restoration of connectivity.
S13, judging the condition of the target through a classifier according to a first preset algorithm and the target region of interest, and determining the class probability of the target region.
According to the method, the target region of interest is detected through the first preset algorithm, the class probability of the target region is predicted, the displacement and the speed of the target are estimated through extracting key points and optical flow estimation, and the state information of the target region of interest is updated, so that the information of the detection region can be determined more accurately and reliably.
According to the marked noise points in the DBSCAN algorithm, the embodiment can select to keep or remove the noise points, and if the clustering result relates to video data of a time sequence or continuous frames, spatial consistency check can be performed by using methods such as optical flow estimation and the like. Tracking and consistency checking are carried out on the region of interest according to the displacement change rule between the front frame and the rear frame, and the region is further screened and optimized.
As a preferred embodiment, but not particularly limited thereto, step S13 includes S131 to S134, wherein:
s131, generating candidate areas of the video data according to the first preset algorithm and the target region of interest.
The present embodiment adjusts the video frame to a size suitable for YOLO algorithm model input, scales the pixel values in the video frame to between 0 and 1 by dividing each pixel value by 255, and then performs the addition of channels: according to the actual running condition, an additional channel is added to the image to meet the requirement of the YOLO algorithm. For color images, the usual channels are RGB channels, wherein the red, green and blue channels of each pixel are taken as input, respectively. For gray scale images, the image may be duplicated three times to generate an image with three channels, thus conforming to the input requirements of the YOLO algorithm.
The feature extractor in the YOLO algorithm is used to perform a convolution operation, map the video frames into feature space, extract the feature representation, and first define the convolution kernel, also known as the filter, of the convolution operation.
And applying the defined convolution kernel to the input image, and traversing the image by utilizing a sliding window scanning mode. At each location, a local region of the input image is element-wise multiplied and summed with the convolution kernel.
The result of the convolution operation is mapped into a feature space. After the convolution operation has been scanned through the sliding window, a new feature map is obtained which may be different in size from the input image, but which retains the local visual features in the input image at different locations. After the result of the convolution operation, a LeakyReLU activation function is applied to introduce nonlinear transformations to increase the expressive power of the network, and after the convolution operation, an averaging pooling operation is performed to reduce the size of the feature map and extract more salient features.
The method comprises the steps of generating candidate areas on a feature map by using anchor boxes, detecting targets with different scales and length-width ratios through convolution operation, classifying the generated candidate areas, judging whether target objects exist in each area, extracting feature representations from each candidate area, and preprocessing the extracted features, such as normalization, scale adjustment and the like, to ensure that input features are suitable for classification tasks.
The method comprises the steps of classifying preprocessed features by a lightweight Convolutional Neural Network (CNN) model classifier, training the lightweight Convolutional Neural Network (CNN) model classifier, learning parameters of a classification model by using a marked data set, classifying and predicting new candidate areas by the trained lightweight Convolutional Neural Network (CNN) model classifier, applying a prediction result to each candidate area, and judging whether a target object exists in the candidate areas.
S132, predicting the boundary frame coordinates and the category probability of the candidate region.
The present embodiment predicts the class and the bounding box for each candidate region, and predicts the bounding box coordinates of each candidate box, including the position and the size, and the probability of each class by applying the corresponding convolution layer and full connection layer.
And performing further feature extraction and combination on the extracted feature images through corresponding convolution layers, wherein each convolution layer consists of a plurality of convolution kernels, each convolution kernel performs convolution operation on the input feature images to obtain new feature images, and after the convolution operation, the obtained feature images are usually a multidimensional array, and each position corresponds to a feature value.
In order to use these features for classification and bounding box prediction, it is necessary to flatten the feature map into one-dimensional vectors and to perform subsequent classification and bounding box prediction through a series of fully connected layers, each consisting of neurons, each connected to all neurons of the previous layer, and the fully connected operation performs nonlinear transformation on the flattened features by learning weights and biases to achieve classification of the object and prediction of the bounding box.
And processing the flattened features by applying the full connection layer, and predicting the boundary frame coordinates of each candidate frame. These predicted values may be real numbers or suitably transformed values to ensure a reasonable range and shape of the bounding box.
The fully connected layer of the last layer is usually divided into two branches, one for classification prediction and the other for position and size prediction of the bounding box, the output is converted into a class probability distribution by adopting a softmax function to perform classification prediction, and the position and size information of the target is obtained by linear transformation to realize the bounding box prediction.
S133, screening out repeated bounding boxes based on the bounding box coordinates and a third preset algorithm.
According to the embodiment, candidate frames are screened according to the predicted category probability and the boundary frame coordinates through an NMS algorithm, overlapped candidate frames are removed, and only the frame with the highest probability is reserved as a final detection result.
In the continuous video frames, according to the target detection result of the previous frame, a Kalman filter is used for continuously tracking the target so as to improve the detection stability and accuracy, the area around the target is defined by the characteristics of the target, the edge characteristic is extracted from the selected area by a Canny edge detection method, the shielded part of the target is analyzed and speculated by the extracted contextual characteristic, the shielded part is speculated and complemented by the edge propagation technology, the generated content is fused with the original image by the image processing technologies such as weighted fusion, edge smoothing and the like so as to ensure the consistency of the complemented part and the surrounding environment, and the detected target is further subjected to attribute analysis such as extraction and analysis of the size, the moving speed, the direction and the like of the target.
S134, determining the category probability of the target area according to the category probability of the candidate area and the filtered bounding box.
The embodiment accurately determines the class probability of the target area through the class probability of the candidate area and the filtered bounding box, wherein the class probability is more approximate to a true value.
S14, screening a target area with highest category probability, estimating and deducing the displacement and speed of the target by extracting key points and optical flow, and updating the state information of the target region of interest.
In the embodiment, the target area with the highest class probability is screened, then the target area is extracted based on a key point extraction algorithm, the movement amount of the pixel point is predicted based on an optical flow estimation algorithm, the target strength key point and the key point of the previous frame, and finally the displacement amount and the running speed of the target strength key point are determined.
As a preferred embodiment, but not particularly limited thereto, step S14 includes S141-S145, wherein:
s141, screening a target area with highest category probability.
According to the method and the device, the target area with the highest category probability is screened, so that the target area with the highest category probability can be used for extracting the key points better, and the accuracy of the result is ensured.
S142, determining the target strength key points based on the key point extraction algorithm and the target area with the highest category probability.
According to the method, position information of object areas is obtained according to the result of a target detection algorithm, each object area is expressed into a boundary box or polygon and the like, the position of an object is determined, a FAST corner detection algorithm is applied to each object area to detect corners in a local area, a round window with a fixed size is selected for each object area, and brightness changes of pixel values are calculated on pixel points in the window. If the brightness of the continuous n pixel values in the window relative to the central pixel point is higher or lower than the threshold value, the angular point of the central pixel point is determined. The above calculation is repeated for each pixel point in the window to detect all corner points. And detecting the determined corner points as key points.
According to the result of the FAST algorithm, a key point with higher angular point intensity is selected as a key point set, for each key point, the angular point intensity is calculated, the intensity can be evaluated according to the brightness change of the pixel point in a window, key points with higher angular point intensity are selected from all key points and added into the key point set, a key point with higher stability is selected from the key point set in the previous frame as a target point of optical flow estimation, a displacement vector between each key point in the previous frame and a corresponding point in the current frame is calculated, the stability of the key point is evaluated by examining the size or continuity of the displacement vector, and the key point with higher stability is selected as the target point of optical flow estimation.
S143, predicting the movement amount of the pixel point based on the optical flow estimation algorithm, the target intensity key point and the key point of the previous frame.
In this embodiment, for each target point, a Lucas-Kanade algorithm is used to calculate a displacement vector, that is, infer a displacement between a position of the point in a current frame and a position of a corresponding point in a previous frame, and by applying the algorithm to each point in a set of key points, a position estimate of the key point is obtained, and for each target point, a key point with higher stability is selected from the set of key points of the previous frame as a target point for optical flow estimation, where a specific method is as follows:
A1. A displacement vector is calculated between each keypoint in the previous frame and the corresponding point in the current frame.
B1. The stability of the key points is evaluated by examining the magnitude or continuity of the displacement vector.
C1. And selecting a key point with higher stability as a target point of optical flow estimation.
For each target point, calculating a displacement vector by using a Lucas-Kanade algorithm, wherein the specific method is as follows:
A2. in the previous frame, a local window area is defined for the target point.
B2. In the current frame, searching for a point corresponding to the target point in the vicinity of the corresponding position according to the position of the target point in the previous frame, and obtaining a window area of the current frame.
C2. And calculating a displacement vector between the window area of the previous frame and the window area of the current frame through a Lucas-Kanade algorithm, namely deducing the displacement between the position of the target point in the current frame and the position of the corresponding point in the previous frame.
Repeating the steps A1-C1 and A2-C2, and applying the Lucas-Kanade algorithm to each point in the key point set to obtain the position estimation of the key point.
S144, calculating the movement amount of the target intensity key point and the pixel point, and determining the displacement amount and the running speed of the target intensity key point.
In this embodiment, the displacement vectors of all the key points are averaged and integrated, the overall displacement vector of the target is calculated, the overall motion direction and speed are represented, for each target point in the set of key points, the displacement vector of the target point is calculated by using Lucas-Kanade algorithm, for the displacement vector of each target point, the average displacement vector is obtained by averaging the displacement vectors of each target point, and the direction and the size of the average displacement vector are calculated to represent the overall motion direction and speed of the target.
S145, updating the state information of the target region of interest according to the displacement and the running speed of the target strength key points.
In the embodiment, in an initial frame, according to the result of YOLO, initializing the ID of each target, saving the position information of each target, selecting a key point with higher stability as a target point for optical flow estimation, and for a subsequent frame, detecting the object area in the video frame by using a YOLO target detection algorithm, and acquiring the position information and class labels of the object area.
In the current frame, the targets in the current frame and the targets in the previous frame are associated by calculating the position estimation of the key points and using the Hungary algorithm, in the association process, the target positions and the category information are used for matching, and the motion consistency of the targets is considered by using the result of the optical flow estimation.
For the associated object, by calculating the displacement vector and the movement speed in step S143, the state of the object, such as position, speed, acceleration, etc., is updated, and for the newly detected object, a new identifier may be assigned according to its position and category information, and the key point may be extracted in step S142.
If no object region matching the previously associated object is detected in the current frame, the object may be lost due to occlusion or tracking. In the case of loss, information such as the historical track and appearance characteristics of the target can be retrieved by using an appearance model matching method.
The position and velocity state vectors of the object and the state transition model are defined describing how the state of the object evolves in time, and the state transition model matrix a is used to predict the next state of the object when updating the state of the object in each frame.
An observation model is defined, describing how the state of the target is mapped to the observation space, and the predicted state of the target is compared with the actually observed target position by using an observation model matrix H, so that state updating and association are carried out.
The estimated value of the last state and the control vector are used to predict the next state of the object according to the state transition model.
The state covariance matrix P of the prediction target is used to quantify the uncertainty of the state estimation.
And mapping the predicted target state to an observation space according to the observation model to obtain a predicted observation result.
By comparing the predicted observation with the actual observed target position, an observation error is calculated.
And calculating the Kalman gain according to the covariance matrix of the observation error and the observation noise.
And updating the predicted target state to the latest target state estimation according to the Kalman gain.
The target state covariance matrix P is updated reflecting the uncertainty of the updated state estimate.
When tracking starts, initializing the Kalman filter, using initial target position and speed as initial state of the Kalman filter, setting proper initial state covariance matrix and observation noise covariance matrix, combining target detection result in current video frame and prediction result of tracker, and improving tracking accuracy and stability by updating target state and adjusting parameters.
And according to the target position predicted by the tracker and the target detection result in the current video frame, performing target matching by using a greedy matching algorithm.
And associating each tracker with the detected target according to the target matching result, so as to realize continuous tracking of multiple targets.
In each frame, the prediction and update operations of the Kalman filter are performed according to the steps to obtain the latest target position and speed estimation, multi-target association is performed according to the target position estimation, and the target state is continuously tracked and updated in the next frame.
If a new target appears or disappears, the tracker is updated in time.
And outputting a position tracking result of each target in the continuous video frames.
In the initial frame, according to the result of YOLO, the ID of each target is initialized and its position information is saved, and for each target, the nearest key point is selected as the target point and its position is saved according to the detected position information and descriptor.
In subsequent frames, the displacement of the key points between successive frames is estimated by the Lucas-Kanade optical flow algorithm. The specific operation is as follows:
a. and extracting stable key points in the current frame and calculating descriptors thereof.
b. Optical flow vectors between the initial keypoints and keypoints in the current frame are estimated using the Lucas-Kanade algorithm, which estimates displacement based on pixel value variations near the keypoints.
c. By comparing the distances between the key points in the current frame and the initial key points, unstable key points are screened out.
d. And matching key points by using consistency of optical flow vectors and similarity of descriptors, matching initial key points with key points in the current frame, updating position information of targets by using an optical flow matching result, and calculating a new position of the targets in the current frame according to the matched key points and the optical flow vectors for each target.
For the subsequent frames, a YOLO target detection algorithm is used to detect the object regions in the video frames, obtain their position information and category labels in the current frame, and a step S141 of an optical flow estimation module is used to calculate the position estimation optical flow algorithm of the key points, which can estimate the motion of the target according to the displacement of the key points between the previous frame and the current frame, and the hungarian algorithm is used to correlate the target in the current frame with the target in the previous frame, and in the correlation process, the position and category information of the target are used to match, and the result of the optical flow estimation is used to consider the motion consistency of the target.
If the target is associated, the state of the target, including information such as position, speed, acceleration, etc., can be updated by calculating the displacement vector and the motion speed of the optical flow estimation. The optical flow displacement calculation provides a pixel matching based target displacement estimate, for which a new identifier may be assigned based on its position and class information, and key points are extracted in step S141, creating a new target initial displacement estimate.
If no object region matching the previously associated object is detected in the current frame, the object may be lost due to object occlusion or tracking. In the case of loss, the target can be retrieved by using information such as the historical track and appearance characteristics of the target and combining an appearance model matching method.
S15, recording a tracking result of the state information, and outputting the tracking result to a storage position designated by a user.
In this embodiment, the positions of the targets are tracked between consecutive frames according to the color feature descriptors of the targets, displacement data is recorded, and the displacement data of each target is integrated by an average value method.
And integrating displacement data for a plurality of targets respectively, and determining the overall displacement result of each target.
According to the actual demand, the integrated data is further processed and analyzed, the related information such as the speed, the acceleration, the displacement change rate and the like of the target can be calculated, or the data is visualized and displayed, the integrated, processed and analyzed data is output and is stored in a file form, or the real-time display and monitoring are carried out.
And dividing the video frame into different areas, determining the dividing mode and the size of the areas according to the actual application requirement, calculating the mapping position or shape of each screened area on the target image or video, and marking, drawing or otherwise displaying the mapped area on the target image or video.
And adjusting mapping parameters according to the accuracy and definition of the mapped region to obtain a better mapping effect, and storing or transmitting the output of the mapped region to a subsequent processing step.
According to the requirements, proper data formats and output modes, such as images, mark files and the like, can be selected, the obtained target areas, displacement information, integral histogram information and the like are sorted, the results are analyzed, and key information and characteristics, such as the number, the positions, the motion rules and the like of the targets, are extracted.
According to the requirements, the results are displayed in a visual mode, which can be in the forms of images, charts and the like, so that the users can observe and analyze conveniently, the results are stored in the designated positions of cloud storage, databases and the like so as to be convenient for subsequent inquiry and use, and finally, the results are transmitted to places where the results are required, which can be other systems or equipment, and further data refinement processing and application can be realized.
And feeding back the result to the corresponding user or system according to the requirements of the user and the formulated standard so as to meet the requirements and support decision making and application.
In the embodiment, a deep learning FasterR-CNN algorithm is used for image preprocessing, identifying and extracting target objects to be detected, determining the positions and the number of marking points, positioning and identifying the marking points, and obtaining accurate position information of the marks; tracking the motion trail of the mark in the image sequence by using an optical flow method and a Canny operator, and calculating the displacement, speed and other information of the mark point in the time sequence; and (3) fusing analysis results of the FasterR-CNN technology and an optical flow method, classifying and analyzing information such as the motion state, the morphological change and the like of the mark points through a machine learning algorithm to obtain a multi-point detection result, and finally realizing multi-point displacement detection of the target object in the image by matching and tracking the motion track and the state among a plurality of mark points.
The method can improve the accuracy and the practicability of video monitoring, further improve the accuracy and the robustness of image multipoint detection, can be applied to various fields needing to detect a plurality of objects or object parts, simplifies the detection process to a certain extent, and improves the accuracy of target detection.
According to the method, the device and the system, the video data acquired by the video acquisition device are preprocessed, the target region of interest is screened, the target condition is judged through the classifier, the class probability of the target region is determined, the target region with the highest class probability is screened, the displacement and speed of the target are estimated and inferred through extracting key points and optical flow, the state information of the target region of interest is updated, the tracking result of the state information is recorded, the tracking result is output to a storage position appointed by a user, important equipment can accelerate through a complex parallel computing structure and a high-performance GPU in the process of visual detection, the target detection task can be completed in a short time, the method is suitable for a real-time application scene, end-to-end training can be achieved through deep learning, the characteristics can be learned automatically, the method has strong robustness on changes of the shape, the size, the gesture and the like of a target object, a large amount of data can be processed rapidly, and the detection accuracy is guaranteed.
Second embodiment:
referring to fig. 6 to 10, the present embodiment provides a multi-target detection apparatus 100, which includes a processing module 110, a screening module 120, a determining module 130, an updating module 140, and an output module 150, wherein:
the processing module 110 is connected to the screening module 120, and is used for preprocessing video data acquired by the video acquisition device in the region to be detected.
As a preferred solution, but not particularly limited, the processing module 110 includes a first obtaining unit 111, a first generating unit 112, a first determining unit 113, and a second generating unit 114, where:
the first obtaining unit 111 is connected to the first generating unit 112, and is configured to obtain video data of the object to be tested, which is collected by the video collecting device in the area to be tested.
The first generating unit 112 is connected to the first determining unit 113 for decoding the video data to generate a sequence of images.
The first determining unit 113 is connected to the second generating unit 114 and is configured to determine a histogram of the corresponding dimension based on the second preset algorithm and the image sequence.
The second generating unit 114 is configured to normalize the histogram and generate a feature vector corresponding to the normalized histogram.
And the screening module 120 is connected with the determining module 130 and is used for screening the target region of interest according to the processed video data.
As a preferred solution, but not particularly limited, the screening module 120 includes a third generating unit 121, a second determining unit 122, a second obtaining unit 123, a marking unit 124, a fourth generating unit 125, a fifth generating unit 126, a third determining unit 127, a fourth determining unit 128, and a repairing unit 129, wherein:
the third generating unit 121 is connected to the second determining unit 122, and is configured to extract feature vector data corresponding to the histogram, and generate a feature vector data set.
The second determining unit 122 is connected to the second obtaining unit 123, and is configured to calculate a distance between each data point in the feature vector data set and a nearest neighbor point, and determine a neighborhood radius and a minimum neighborhood number.
The second obtaining unit 123 is connected to the marking unit 124, and is configured to obtain the number of data points in the neighborhood of each data point according to the neighborhood radius and the minimum neighborhood number.
The marking unit 124 is connected to the fourth generating unit 125, and is configured to mark each data point as a core point or a noise point according to the number of data points in the neighborhood.
The fourth generating unit 125 is connected to the fifth generating unit 126, and is configured to cluster neighboring points in the neighboring region of the data point if the data point is a core point, and generate a corresponding data cluster.
The fifth generating unit 126 is connected to the third determining unit 127, and is configured to generate a corresponding region of interest according to the core point, the neighboring point, and the data cluster.
The third determining unit 127 is connected to the fourth determining unit 128, and is configured to determine the target region of interest according to the actual area of the region of interest and the preset area of the region.
And a fourth determining unit 128, connected to the repairing unit 129, configured to segment the target region of interest based on a watershed algorithm, traverse boundary pixels of the segmented target region of interest, and determine pixel points of the hole.
And the repairing unit 129 is used for repairing connectivity of the target region of interest based on the pixel points filled in the cavity by the filling algorithm.
The determining module 130 is connected to the updating module 140, and is configured to determine, according to a first preset algorithm and the target region of interest, a class probability of the target region according to the target condition determined by the classifier.
As a preferred solution, but not particularly limited, the determining module 130 includes a sixth generating unit 131, a first predicting unit 132, a screening unit 133, and a fifth determining unit 134, wherein:
the sixth generating unit 131 is connected to the first predicting unit 132, and is configured to generate a candidate region of the video data according to the first preset algorithm and the target region of interest.
The first prediction unit 132 is connected to the screening unit 133, and predicts the boundary frame coordinates and the class probability of the candidate region.
And a screening unit 133 connected to the fifth determining unit 134 for screening out the repeated bounding boxes based on the bounding box coordinates and a third preset algorithm.
And a fifth determining unit 134, configured to determine the class probability of the target area according to the class probability of the candidate area and the filtered bounding box.
The updating module 140 is connected to the output module 150, and is configured to screen the target area with the highest category probability, infer the displacement and speed of the target by extracting the key points and the optical flow estimation, and update the state information of the target area of interest.
As a preferred solution, but not particularly limited, the updating module 140 includes a filtering unit 141, a sixth determining unit 142, a second predicting unit 143, a seventh determining unit 144, and an updating unit 145, wherein:
the screening unit 141 is connected to the sixth determining unit 142, and is configured to screen the target area with the highest category probability.
And a sixth determining unit 142, coupled to the second predicting unit 143, for determining a target intensity key point based on the key point extraction algorithm and the target region with the highest category probability.
The second prediction unit 143 is connected to the seventh determination unit 144, and is configured to predict the amount of movement of the pixel point based on the optical flow estimation algorithm, the target intensity key point, and the key point of the previous frame.
Seventh determining unit 144, connected to updating unit 145, is used for calculating the movement amount of the target intensity key point and the pixel point, and determining the displacement amount and the running speed of the target intensity key point.
The updating unit 145 is configured to update the state information of the target region of interest according to the displacement amount and the running speed of the target intensity key point.
And the output module 150 is used for recording the tracking result of the state information and outputting the tracking result to a storage position designated by the user.
According to the method, the device and the system, the video data acquired by the video acquisition device are preprocessed, the target region of interest is screened, the target condition is judged through the classifier, the class probability of the target region is determined, the target region with the highest class probability is screened, the displacement and speed of the target are estimated and inferred through extracting key points and optical flow, the state information of the target region of interest is updated, the tracking result of the state information is recorded, the tracking result is output to a storage position appointed by a user, important equipment can accelerate through a complex parallel computing structure and a high-performance GPU in the process of visual detection, the target detection task can be completed in a short time, the method is suitable for a real-time application scene, end-to-end training can be achieved through deep learning, the characteristics can be learned automatically, the method has strong robustness on changes of the shape, the size, the gesture and the like of a target object, a large amount of data can be processed rapidly, and the detection accuracy is guaranteed.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.
The embodiments of the present invention also provide a computer storage medium having a computer program stored thereon, which when executed by a processor, implements a method of multi-object detection as in the embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above-described embodiment methods may be accomplished by way of a computer program, which may be stored on a non-volatile computer readable storage medium, which when executed may include the steps of embodiments of a multi-object detection method as described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
Alternatively, the above-described integrated units of the present invention may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the embodiments of the present invention may be essentially or part contributing to the related art, and the computer software product may be stored in a storage medium, and include several instructions to cause a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program code, such as a removable storage device, RAM, ROM, magnetic or optical disk. Corresponding to the above-mentioned computer storage medium, in one embodiment there is also provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements a method of multi-object detection as in the above-mentioned embodiments when the processor executes the program.
The computer device may be a terminal, and its internal structure may be as shown in fig. 11. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of multi-object detection. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
According to the method, the device and the system, the video data acquired by the video acquisition device are preprocessed, the target region of interest is screened, the target condition is judged through the classifier, the class probability of the target region is determined, the target region with the highest class probability is screened, the displacement and speed of the target are estimated and inferred through extracting key points and optical flow, the state information of the target region of interest is updated, the tracking result of the state information is recorded, the tracking result is output to a storage position appointed by a user, important equipment can accelerate through a complex parallel computing structure and a high-performance GPU in the process of visual detection, the target detection task can be completed in a short time, the method is suitable for a real-time application scene, end-to-end training can be achieved through deep learning, the characteristics can be learned automatically, the method has strong robustness on changes of the shape, the size, the gesture and the like of a target object, a large amount of data can be processed rapidly, and the detection accuracy is guaranteed.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (10)

1. A method of multi-target detection, comprising:
preprocessing video data acquired by a video acquisition device in a region to be detected;
screening a target region of interest according to the processed video data;
judging the target condition through a classifier according to a first preset algorithm and a target region of interest, and determining the class probability of the target region;
screening a target area with highest category probability, deducing the displacement and speed of the target by extracting key points and optical flow estimation, and updating the state information of the target interested area;
recording the tracking result of the state information, and outputting the tracking result to a storage position designated by a user.
2. The method of multi-target detection according to claim 1, wherein the step of preprocessing video data acquired by the video acquisition device in the region to be detected comprises:
Acquiring video data of an object to be detected, which is acquired in a region to be detected by a video acquisition device;
decoding the video data to generate an image sequence;
determining a histogram of the corresponding dimension based on a second preset algorithm and the image sequence;
normalizing the histogram, and generating a feature vector corresponding to the normalized histogram.
3. The method of multi-object detection according to claim 2, wherein the step of screening the object region of interest based on the processed video data comprises:
extracting feature vector data corresponding to the histogram, and generating a feature vector data set;
calculating the distance between each data point in the feature vector data set and the nearest neighbor point, and determining the neighborhood radius and the minimum neighborhood number;
acquiring the number of data points in the neighborhood of each data point according to the neighborhood radius and the minimum neighborhood number;
marking each data point as a core point or a noise point according to the number of the data points in the neighborhood;
if the data point is a core point, clustering neighbor points in the neighbor regions of the data point to generate a corresponding data cluster;
and generating a corresponding region of interest according to the core points, the neighbor points and the data clusters.
4. The method of multi-target detection according to claim 3, wherein after the step of generating the corresponding region of interest from the core point, the neighbor point and the data cluster, further comprising:
Determining a target region of interest according to the actual area of the region of interest and the area of the preset region;
dividing the target region of interest based on a watershed algorithm, traversing boundary pixels of the divided target region of interest, and determining pixel points of the cavity;
and filling the pixel points of the cavity based on a filling algorithm, and repairing connectivity of the target region of interest.
5. The method of multi-target detection according to claim 1, wherein the step of determining the class probability of the target region by determining the target condition through the classifier according to the first preset algorithm and the target region of interest includes:
generating candidate areas of the video data according to a first preset algorithm and the target area of interest;
predicting boundary frame coordinates and class probabilities of candidate areas;
screening out repeated bounding boxes based on the bounding box coordinates and a third preset algorithm;
and determining the category probability of the target area according to the category probability of the candidate area and the filtered bounding box.
6. The method according to claim 1, wherein the step of screening the target region with the highest class probability, deducing the displacement and speed of the target by extracting key points and optical flow estimation, and updating the state information of the target region of interest comprises:
Screening a target area with highest category probability;
determining a target strength key point based on a key point extraction algorithm and a target area with highest category probability;
predicting the movement amount of the pixel point based on an optical flow estimation algorithm, the target intensity key point and the key point of the previous frame;
calculating the movement amount of the target strength key point and the pixel point, and determining the displacement amount and the running speed of the target strength key point;
and updating the state information of the target region of interest according to the displacement and the running speed of the target strength key points.
7. An apparatus for multi-target detection, comprising:
the processing module is used for preprocessing video data acquired by the video acquisition device in the region to be detected;
the screening module is used for screening the target region of interest according to the processed video data;
the determining module is used for determining the category probability of the target region according to a first preset algorithm and the target region of interest and judging the target condition through the classifier;
the updating module is used for screening a target area with highest category probability, deducing the displacement and speed of the target by extracting key points and optical flow estimation, and updating the state information of the target interested area;
And the output module is used for recording the tracking result of the state information and outputting the tracking result to a storage position designated by a user.
8. The apparatus for multi-target detection according to claim 7, wherein the processing module comprises:
the first acquisition unit is used for acquiring video data of the object to be detected, which is acquired by the video acquisition device in the area to be detected;
a first generation unit configured to decode the video data to generate an image sequence;
the first determining unit is used for determining a histogram of the corresponding dimension based on a second preset algorithm and the image sequence;
the second generating unit is used for normalizing the histogram and generating a feature vector corresponding to the normalized histogram;
the screening module comprises:
the third generating unit is used for extracting the feature vector data corresponding to the histogram and generating a feature vector data set;
the second determining unit is used for calculating the distance between each data point in the feature vector data set and the nearest neighbor point and determining the neighborhood radius and the minimum neighborhood number;
the second acquisition unit is used for acquiring the number of data points in the neighborhood of each data point according to the neighborhood radius and the minimum neighborhood number;
the marking unit is used for marking each data point as a core point or a noise point according to the number of the data points in the neighborhood;
A fourth generating unit, configured to cluster neighboring points in the neighboring domain of the data point if the data point is a core point, and generate a corresponding data cluster;
a fifth generating unit, configured to generate a corresponding region of interest according to the core point, the neighbor point and the data cluster;
the third determining unit is used for determining a target region of interest according to the actual area of the region of interest and the area of the preset region;
a fourth determining unit, configured to segment the target region of interest based on a watershed algorithm, traverse boundary pixels of the segmented target region of interest, and determine pixel points of the hole;
the repairing unit is used for filling the pixel points of the cavity based on a filling algorithm and repairing the connectivity of the target region of interest;
the determining module includes:
a sixth generation unit, configured to generate a candidate region of video data according to a first preset algorithm and a target region of interest;
the first prediction unit is used for predicting the boundary frame coordinates and the category probability of the candidate region;
the screening unit is used for screening out repeated boundary frames based on the boundary frame coordinates and a third preset algorithm;
a fifth determining unit, configured to determine a category probability of the target area according to the category probability of the candidate area and the filtered bounding box;
The updating module comprises:
the screening unit is used for screening the target area with the highest category probability;
the sixth determining unit is used for determining the target strength key points based on the key point extraction algorithm and the target area with the highest category probability;
the second prediction unit is used for predicting the movement amount of the pixel point based on the optical flow estimation algorithm, the target intensity key point and the key point of the previous frame;
a seventh determining unit, configured to calculate a target intensity key point and a movement amount of the pixel point, and determine a displacement amount and an operation speed of the target intensity key point;
and the updating unit is used for updating the state information of the target region of interest according to the displacement and the running speed of the target strength key point.
9. A computer readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by a multi-object detection apparatus, implements the multi-object detection method according to any of claims 1-6.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of multi-object detection according to any of claims 1-6 when executing the computer program.
CN202311428780.5A 2023-10-30 2023-10-30 Multi-target detection method and device Pending CN117437406A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311428780.5A CN117437406A (en) 2023-10-30 2023-10-30 Multi-target detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311428780.5A CN117437406A (en) 2023-10-30 2023-10-30 Multi-target detection method and device

Publications (1)

Publication Number Publication Date
CN117437406A true CN117437406A (en) 2024-01-23

Family

ID=89553030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311428780.5A Pending CN117437406A (en) 2023-10-30 2023-10-30 Multi-target detection method and device

Country Status (1)

Country Link
CN (1) CN117437406A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118134923A (en) * 2024-05-07 2024-06-04 青岛众屹科锐工程技术有限公司 High-speed article visual detection method based on artificial intelligence

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118134923A (en) * 2024-05-07 2024-06-04 青岛众屹科锐工程技术有限公司 High-speed article visual detection method based on artificial intelligence

Similar Documents

Publication Publication Date Title
Dornaika et al. Building detection from orthophotos using a machine learning approach: An empirical study on image segmentation and descriptors
US7983486B2 (en) Method and apparatus for automatic image categorization using image texture
CN108537239B (en) Method for detecting image saliency target
CN112395957B (en) Online learning method for video target detection
CN107633226B (en) Human body motion tracking feature processing method
CN112200121B (en) Hyperspectral unknown target detection method based on EVM and deep learning
CN108629286B (en) Remote sensing airport target detection method based on subjective perception significance model
CN108596195B (en) Scene recognition method based on sparse coding feature extraction
Tsintotas et al. DOSeqSLAM: Dynamic on-line sequence based loop closure detection algorithm for SLAM
CN111274964B (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN110751619A (en) Insulator defect detection method
Tiwari et al. A survey on shadow detection and removal in images and video sequences
CN117437406A (en) Multi-target detection method and device
Avola et al. A shape comparison reinforcement method based on feature extractors and f1-score
CN112037230B (en) Forest image segmentation method based on superpixels and hyper-metric profile map
Wang et al. Video background/foreground separation model based on non-convex rank approximation RPCA and superpixel motion detection
CN113657225B (en) Target detection method
CN117037049B (en) Image content detection method and system based on YOLOv5 deep learning
Ju et al. A novel fully convolutional network based on marker-controlled watershed segmentation algorithm for industrial soot robot target segmentation
Peng et al. Hers superpixels: Deep affinity learning for hierarchical entropy rate segmentation
Zhang et al. Saliency detection via image sparse representation and color features combination
Hassan et al. Salient object detection based on CNN fusion of two types of saliency models
Lezoray Supervised automatic histogram clustering and watershed segmentation. Application to microscopic medical color images
CN114581475A (en) Laser stripe segmentation method based on multi-scale saliency features
Venkatesan et al. Experimental research on identification of face in a multifaceted condition with enhanced genetic and ANT colony optimization algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination