CN116258741A

CN116258741A - Target tracking method and device, electronic equipment and medium

Info

Publication number: CN116258741A
Application number: CN202111506010.9A
Authority: CN
Inventors: 欧卓樾; 孙文超
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Chengdu ICT Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Chengdu ICT Co Ltd
Priority date: 2021-12-10
Filing date: 2021-12-10
Publication date: 2023-06-13

Abstract

The disclosure discloses a target tracking method and device, electronic equipment and a storage medium. The target tracking method comprises the following steps: based on the gray value of the pixel in the target image, carrying out intensity layering on the gray value to obtain one or more gray images; extracting the multi-scale structural features from the gray level map to obtain the multi-scale structural features; and tracking the target according to the multi-scale structural features.

Description

Target tracking method and device, electronic equipment and medium

Technical Field

The disclosure relates to the field of information technology, and in particular, to a target tracking method and device, an electronic device and a storage medium.

Background

The method has wide application in the fields of security and/or automatic driving and the like by target tracking.

The target tracking may be based on acquired images.

In the related art, a tracking algorithm based on partial image areas in an image often cannot process complex image scenes, and cannot realize target detection on videos with serious partial occlusion; feature-based tracking methods often have difficulty distinguishing features of multiple similar objects when there are more objects in the video image, resulting in failure.

The tracking method based on the model needs to carry out a large amount of training in advance to build the model, and when the target changes, the original model cannot be suitable, so that the applicability is narrower; the tracking algorithm based on detection needs to update own detector at any time, so that the algorithm occupies more resources and has lower operation efficiency.

Disclosure of Invention

The embodiment of the disclosure provides a target tracking method and device, electronic equipment and a storage medium, so that the loss rate of target tracking is reduced under the condition that a large amount of data model training is not needed.

A first aspect of an embodiment of the present disclosure provides a target tracking method, including:

based on the gray value of the pixel in the target image, carrying out intensity layering on the gray value to obtain one or more gray images;

extracting the multi-scale structural features from the gray level map to obtain the multi-scale structural features;

and tracking the target according to the multi-scale structural features.

Based on the above scheme, the target tracking according to the multi-scale structural feature includes:

a candidate region in the (m+1) -th frame image based on the nth object of the (m) -th frame image; wherein m and n are positive integers;

determining a similarity between the multi-scale structural feature of the nth object of the mth frame image and the multi-scale structural feature extracted from a candidate region of the (m+1) th frame image;

And determining the position of the nth target in the (m+1) th frame image according to the similarity.

Based on the above scheme, the candidate region of the (m+1) -th frame image of the (n) -th target based on the (m) -th frame image comprises at least one of the following:

determining a candidate region of an mth target in the (m+1) -th frame image based on optical flow estimation of the mth target of the mth frame image;

dividing the (m+1) th frame image into a plurality of image areas based on an s-th sliding window, and determining the candidate area based on pixel similarity between the image areas and an area where an n-th target of the m-th frame image is located; wherein S is a positive integer less than or equal to S; s is a positive integer not less than 2; the sliding window is sized differently from one sliding window to another.

Based on the above scheme, the step of performing intensity layering of the gray values based on the gray values of the pixels in the target image to obtain one or more gray images includes:

intensity layering is carried out according to the gray value of the pixel of the target image to obtain Pm-1 gray images,

wherein Pm is the number of local minima of the gray value of the pixel of the target image, the gray value of the pixel of different gray maps is located between different local minima groups, and one of the local minima groups includes: adjacent two local minima.

Based on the above scheme, the performing intensity layering according to the gray value of the pixel of the target image to obtain Pm-1 gray maps includes:

dividing the target image into a plurality of layered images according to the gray values of the pixels; the gray values of the pixels contained in different layered images are different;

determining the number of local minima contained in pixels of two adjacent layered images;

and when the number of the local minima is equal to a preset value, respectively determining the two adjacent layered images as the gray level map.

And carrying out intensity layering on the gray values based on the gray values of the pixels in the target image to obtain one or more gray maps.

inserting layered images into the two adjacent layered images when the number of the local minima is smaller than the preset value;

and/or the number of the groups of groups,

and when the number of the local minima is larger than the preset value, merging the two adjacent layered images.

Based on the above scheme, the method further comprises:

performing jumping convolution processing on an image to be processed to obtain the target image; wherein the number of pixels of the target image is less than the number of pixels of the image to be processed.

Based on the above scheme, the method further comprises:

dividing an image area of the original image into a first type area and a second type area; wherein, the difference value between the target and the background in the first type area is larger than the difference value between the target and the background in the second type area;

and carrying out target enhancement processing on the second type region of the original image to obtain the image to be processed.

Based on the above scheme, the performing the gray map to perform multi-scale structural feature extraction to obtain multi-scale structural features includes:

and processing each gray level map based on gray level values to obtain one or more of multi-dimensional gray level characteristics, multi-dimensional gray level change characteristics, multi-dimensional effective block ratios, block edge change characteristics and target scale distribution characteristics.

A second aspect of an embodiment of the present disclosure provides a target tracking method, including:

dividing the target image into a first type region and a second type region; wherein, the difference value between the target and the background in the first type area is larger than the difference value between the target and the background in the second type area;

performing target enhancement processing on the second type region of the target image to obtain the image to be processed;

And tracking the target based on the image to be processed.

Based on the above scheme, the performing target enhancement processing on the second type region of the target image to obtain the image to be processed includes:

and performing high-cap transformation on the second type region of the target image to obtain the image to be processed of the target enhancement processing.

A third aspect of an embodiment of the present disclosure provides a target tracking apparatus, including:

the layering module is used for layering the intensity of the gray values based on the gray values of the pixels in the target image to obtain one or more gray images;

the extraction module is used for extracting the multi-scale structural features of the gray level diagram to obtain the multi-scale structural features;

and the tracking module is used for tracking the target according to the multi-scale structural characteristics.

Based on the above scheme, the tracking module is specifically configured to be based on a candidate region of an nth target of an mth frame image in an mth+1th frame image; wherein m and n are positive integers; determining a similarity between the multi-scale structural feature of the nth object of the mth frame image and the multi-scale structural feature extracted from a candidate region of the (m+1) th frame image; and determining the position of the nth target in the (m+1) th frame image according to the similarity.

Based on the above scheme, the tracking module is specifically configured to perform at least one of the following:

Based on the above scheme, the layering module is specifically configured to perform intensity layering according to the gray values of the pixels of the target image to obtain Pm-1 gray maps, where Pm is the number of local minima of the gray values of the pixels of the target image, the gray values of the pixels of different gray maps are located between different local minima groups, and one of the local minima groups includes: adjacent two local minima.

Based on the above scheme, the layering module is further specifically configured to divide the target image into a plurality of layered images according to the gray value of the pixel; the gray values of the pixels contained in different layered images are different; determining the number of local minima contained in pixels of two adjacent layered images; and when the number of the local minima is equal to a preset value, respectively determining the two adjacent layered images as the gray level map.

Based on the above scheme, the layering module is specifically configured to insert layering images into the two adjacent layering images when the number of the local minima is smaller than the preset value; and/or when the number of the local minima is larger than the preset value, merging the two adjacent layered images.

Based on the above scheme, the device further comprises:

the jumping convolution module is used for performing jumping convolution processing on the image to be processed to obtain the target image; wherein the number of pixels of the target image is less than the number of pixels of the image to be processed.

Based on the above scheme, the device further comprises:

the dividing module is used for dividing the image area of the original image into a first type area and a second type area; wherein, the difference value between the target and the background in the first type area is larger than the difference value between the target and the background in the second type area;

and the enhancement module is used for carrying out target enhancement processing on the second type region of the original image to obtain the image to be processed.

Based on the above scheme, the extraction module is specifically configured to perform gray value-based processing on each gray map to obtain one or more of a multi-dimensional gray feature, a multi-dimensional gray change feature, a multi-dimensional effective tile ratio, a tile edge change feature, and a target scale distribution feature.

A fourth aspect of the disclosed embodiments provides a target tracking apparatus, including:

the region module is used for dividing the image region of the target image into a first type region and a second type region; wherein, the difference value between the target and the background in the first type area is larger than the difference value between the target and the background in the second type area;

the image module to be processed is used for carrying out target enhancement processing on the second type region of the target image to obtain the image to be processed;

and the target module is used for tracking the target based on the image to be processed.

Based on the above scheme, the image module to be processed is specifically configured to perform high-cap transformation on the second type region of the original image, so as to obtain the image to be processed for target enhancement processing.

A fifth aspect of an embodiment of the present disclosure provides an electronic device, including:

a memory;

and a processor, connected to the memory, configured to implement the object tracking method provided by any of the foregoing first and/or second aspects by executing computer-executable instructions stored on the memory.

A sixth aspect of the disclosed embodiments provides a computer storage medium, wherein the computer storage medium stores computer executable instructions; the computer executable instructions, when executed by a processor, enable the object tracking method provided by any of the foregoing first aspect and/or second aspect solutions.

According to the technical scheme provided by the embodiment of the disclosure, the intensity layering is performed based on the gray values of the pixels to obtain a plurality of gray images, and the gray images can be simply and conveniently extracted to obtain multi-scale structural features with more dimensions, so that the accurate tracking of the target is completed under the condition that a model is trained without using more sample data, and the method has the characteristic of less following loss of the target.

According to the technical scheme provided by the embodiment of the disclosure, the target image is subjected to region division, the region with small difference between the target and the background is subjected to enhancement processing, the image to be processed is obtained, then the image to be processed is subjected to target tracking, the target follow-up loss phenomenon caused by background interference is reduced, and the training data of a tracking model is not required to be additionally added.

Drawings

Fig. 1 is a schematic flow chart of a target tracking method according to an embodiment of the disclosure;

fig. 2 is a schematic flow chart of a target tracking method according to an embodiment of the disclosure;

fig. 3 is a schematic flow chart of a target tracking method according to an embodiment of the disclosure;

fig. 4 is a schematic flow chart of a target tracking method according to an embodiment of the disclosure;

fig. 5 is a schematic structural diagram of a video acquisition encoding system for target tracking according to an embodiment of the present disclosure;

Fig. 6 is a schematic structural diagram of a target tracking apparatus according to an embodiment of the disclosure;

FIG. 7 is a schematic diagram of a target tracking apparatus according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

So that the manner in which the features and aspects of the present disclosure can be understood in more detail, a more particular description of the invention, briefly summarized above, may be had by reference to the appended drawings, which are not intended to be limiting of the present disclosure.

As shown in fig. 1, an embodiment of the present disclosure provides a target tracking method, including:

s110: based on the gray value of the pixel in the target image, carrying out intensity layering on the gray value to obtain one or more gray images;

s120: extracting the multi-scale structural features from the gray level map to obtain the multi-scale structural features;

s130: and tracking the target according to the multi-scale structural features.

The object tracking method provided by the embodiment of the disclosure can be applied to various electronic devices, including but not limited to: a terminal and/or a server.

The terminals include, but are not limited to: image devices for capturing images, such as monitoring devices on both sides of a road.

The server includes, but is not limited to: monitoring servers, etc.

In the embodiment of the disclosure, after the electronic device obtains a target image to be processed, intensity layering of gray values is performed according to gray values of pixels of the target image. The intensity corresponding to the different gray values is different. A gray scale map including the included gray scale values is obtained.

And when the features are extracted, the multi-scale structural features of the gray level diagram are extracted respectively. In this way, one target image is converted into a plurality of gray level images, and compared with the process of directly binarizing the target image to obtain one image, the process of extracting the feature image can keep more features of the tracked target, so that the multi-scale structural features can be obtained.

Then, based on the multi-scale structural features, target tracking is performed, and more features can be referred, so that the phenomenon of target tracking loss is reduced, and training data of a model is not increased.

For example, in some embodiments, the generation of the grayscale map will be performed using various types of machine learning models; extracting the characteristics of the gray level map to obtain multi-scale structural characteristics; the final machine learning model performs multi-scale structural feature extraction to make a determination of which region or not in the image the target is in.

In the embodiment of the disclosure, the target tracking can be single target tracking or multi-target tracking.

If the target image contains a plurality of tracked targets, the multi-target tracking is performed, and if the target image contains only one target, the single-target tracking is performed.

As shown in fig. 2, the S130 may include:

s131: a candidate region in the (m+1) -th frame image based on the nth object of the (m) -th frame image; wherein m and n are positive integers;

s132: determining a similarity between the multi-scale structural feature of the nth object of the mth frame image and the multi-scale structural feature extracted from a candidate region of the (m+1) th frame image;

s133: and determining the position of the nth target in the (m+1) th frame image according to the similarity.

The candidate region may be a potential region of the nth object at the m+1st frame of image.

In the embodiment of the present disclosure, the multi-scale structural features of each candidate region are extracted in the manner of step S110 to step S120, and similarity calculation is performed with the multi-scale structural features of the n-th target of the m-th frame image, so as to determine, according to the similarity, which candidate region is the region where the n-th target is located in the m+1th frame image.

For example, a candidate region whose similarity is greater than a preset threshold and whose maximum similarity corresponds is determined as the position of the nth object in the (m+1) -th frame image.

The S131 may include at least one of:

Based on the optical flow characteristics obtained by the optical flow estimation, a candidate region is obtained, so that the nth object may be in the first candidate region.

Meanwhile, a sliding window is set, the sliding window is moved to traverse the target image, then whether the region is a candidate region or not is determined based on the overlapping number of pixel values of pixels contained in the sliding window and pixel values of an image region where an n-th target of an n-th frame image is located, if the overlapping number is larger than a preset value, the region can be considered as the candidate region, and otherwise, the region can be considered as the candidate region.

In addition, since the tracked object may be close to or far from the image capturing apparatus during movement, the number of pixels occupied in the mth frame image and the (m+1) th frame image is different.

In view of this, the size of the window is changed to traverse the target image for the purpose of comprehensively locating the candidate region.

For example, according to the image size of the nth target in the mth frame of image, determining the size of the initial sliding window, and then scaling the initial window according to a certain scaling ratio to obtain sliding windows of different scanned images.

The candidate region obtained based on the sliding window may be a second type of candidate region.

Steps S110 to S120 are performed based on both the first type candidate region and the second type candidate region, so that the phenomenon of target tracking loss caused by the fact that the candidate region is selected is reduced, the target tracking success rate is improved, and no additional sample data training model is needed.

In an embodiment of the present disclosure, the S110 may include:

For example, the local minimum of the gray values of the pixels in the target image is found by performing histogram statistics on the gray values of all the pixels in the target image. Dividing pixels corresponding to gray values between two adjacent local minima into a gray map by taking the local minima as a demarcation point, wherein the gray values are not pixels between the two adjacent local minima, and the pixel value in the gray map is 0.

In this way, it is achieved that the target image is divided according to its gray value distribution, and the gray map is divided according to its own gray value distribution.

In one embodiment, when layering the gray scale image, the layering can be performed after the distribution statistics of the gray scale values of the target image are directly performed.

In other embodiments, the layering may be performed by a machine learning model such as a neural network, and the layering of the intensity of the gray values of the target image by the machine learning model may be performed in the following manner.

Intensity layering is carried out according to the gray value of the pixel of the target image to obtain Pm-1 gray images, and the method comprises the following steps:

First, the target image may be preliminarily layered into a plurality of layered images, for example, randomly divided into a plurality of layered images, according to the gray value of the pixel. The gray values of the pixels of different hierarchical image treasures are different.

And traversing gray values of pixels contained in the two adjacent layered images to the layered image obtained by layering, and determining the number of local minima contained in the two adjacent layered images. If the number of the local minima is a preset notice, the gray value separation of two adjacent layered images is reasonable, and the two layered images can be determined as gray images.

The preset value may be 3.

Illustratively, the performing intensity layering according to the gray value of the pixel of the target image to obtain Pm-1 gray maps includes: inserting layered images into the two adjacent layered images when the number of the local minima is smaller than the preset value; and/or when the number of the local minima is larger than the preset value, merging the two adjacent layered images.

In some embodiments, the method further comprises:

In some embodiments, the target image may be an original image captured, or may be a preprocessed image obtained by performing preprocessing such as noise reduction on the original image. In some embodiments, it is also possible to provide an image in which the number of pixels obtained by the skip convolution process is reduced but the features are not reduced.

The jumping convolution processing can reduce the number of pixels of the image to be processed, and the characteristics of the target in the image to be processed are reserved as far as possible.

Because the size (the number of pixels) of the target image is reduced relative to the size (the number of pixels) of the image to be processed, the calculation amount is reduced when gray value intensity layering, feature extraction and target tracking based on multi-scale structural features are carried out later, and therefore the target tracking speed is improved.

In some embodiments, the method further comprises:

In the embodiment of the disclosure, the image area is divided into two types, wherein the pixel difference between one type of object and the background is small, and the difference value between the other background and the object is large.

Such a difference value may be embodied by a difference value between pixel values, for example, a difference value between average values of pixel values between a target and a background, or the like.

And carrying out target enhancement on the second type region with the difference between the target and the background, wherein the difference between the target and the background is increased in the second type region after the target enhancement processing. And forming the second type region and the first type region after target enhancement into the image to be processed. Therefore, the second type region with small difference between the background and the target is enhanced, and then the multi-scale structural features are extracted, so that the phenomenon of tracking loss caused by small difference between part of the target and the background at the position of the target can be reduced, and the target tracking success rate is improved.

In some embodiments, the performing the gray map to perform multi-scale structural feature extraction to obtain multi-scale structural features includes:

As shown in fig. 3, an embodiment of the present disclosure provides a target tracking method, including:

s210: dividing the target image into a first type region and a second type region; wherein, the difference value between the target and the background in the first type area is larger than the difference value between the target and the background in the second type area;

s220: performing target enhancement processing on the second type region of the target image to obtain the image to be processed;

s230: and tracking the target based on the image to be processed.

In the embodiment of the disclosure, before the image is processed by using a machine learning model such as a neural network, an image area of the target image is divided into a first type area and a second type area. The specific division of the first type of region and the second type of region is determined according to the difference value between the background and the target in the corresponding image region.

And performing target enhancement processing on the second type region with small difference value between the target and the background, wherein the target is more prominent relative to the background through the target enhancement processing, so that the tracking success rate can be improved when the target tracking is performed by using a machine learning model subsequently.

In some embodiments, the performing target enhancement processing on the second type of region of the target image to obtain the image to be processed includes:

In some embodiments, the S220 may include: and performing high-cap transformation on the second type region of the target image to obtain the image to be processed of the target enhancement processing.

Thus, through the high cap transformation, the brightness of the target can be increased, and the enhancement of the target is realized.

Aiming at the video shot by the camera, if target tracking detection processing is carried out on the video, in the existing algorithm: the tracking algorithm based on the region often cannot process complex image scenes, and cannot realize target detection on the video with serious partial shielding; when more targets exist in the video image, the feature-based tracking method often has difficulty in distinguishing the features of a plurality of similar targets, so that failure is caused; the tracking method based on the model needs to carry out a large amount of training in advance to build the model, and when the target changes, the original model cannot be suitable, so that the applicability is narrower; the tracking algorithm based on detection needs to update own detector at any time, so that the algorithm occupies more resources and has lower operation efficiency.

According to the embodiment of the disclosure, firstly, a deep learning method is adopted to classify the regional image and the non-regional image, and then the non-regional image is preprocessed to make the target more prominent; and entering a training subunit, acquiring a multi-size full-feature minimum image by using a self-adaptive jump convolution method, extracting image features by using an intensity layering method, and extracting multi-scale structural features of the image by using the intensity layering image features. And finally, separating one target or a plurality of targets by using a support vector machine method.

The target tracking module comprises a tracking unit, a detecting unit and a learning unit, and is used for detecting whether a new target is a certain target at the last moment.

The result storage and display module receives the path sequence returned by the target tracking module and stores the path sequence into a cache or a database of a designated storage structure, so that the subsequent searching and display are convenient.

The technical scheme is that a neural network classification method is adopted firstly, so that whether an object to be detected exists or not can be judged rapidly; and then, a full-feature minimum image is quickly obtained by using self-adaptive jump convolution, and multi-size structural features are obtained by using intensity layering and feature extraction, so that extremely small feature input is provided for machine learning, and a target detection task is quickly realized.

The embodiment of the invention provides a target tracking detection method for rapidly extracting characteristics, which mainly comprises the following components: the video receiving and encoding module receives the video stream captured by the camera or the data source, and performs receiving and encoding and decoding on the captured video stream as shown in the overall scheme frame diagram of the following figure 4; the target detection module is used for detecting targets aiming at images in the video; the target tracking module is used for tracking the detected target and correcting the next detection process; and the result storage and display module is used for receiving the detection and tracking result and storing and displaying the detection and tracking result.

Referring to fig. 5, the video receiving and encoding module receives and encodes a video stream captured by a camera, and connects with a POE (active ethernet) power supply port through an IPCAM (network camera based on a network protocol) to pull an RTSP (real time streaming protocol) video stream from the IPCAM. After the RTSP video stream is pulled, it is encoded to achieve high compression ratio, high image quality and good network adaptability of the video. After the encoding is finished, the video stream is decoded before being transmitted to a subsequent video parsing and reasoning module. The block diagram of the module is shown in the block diagram of the video receiving and encoding module of fig. 2.

The image classification unit is used for classifying the video images and mainly classifying the video images into two types. The first type is an area type image, and the second type is a non-area type image; there are two subclasses under each class: single-target images and multi-target images. Wherein, the regional class image is defined as: the occlusion area in the image is more (higher than the set threshold), the target is not highlighted, and the fusion degree of the target and the background is high (higher than the set threshold). The non-region class image is defined as: the occlusion area in the image is less (below the set threshold), the object is prominent, and the contrast between the object and the background is high (above the set threshold). The single target image is defined as: the number of targets in the image is less than a set threshold; the multi-target image is defined as: the number of targets in the image is above a set threshold.

The image preprocessing subunit performs image preprocessing on the classified images belonging to the non-regional class. The image preprocessing step mainly comprises the following steps: image noise reduction, image compensation, and image enhancement.

The training subunit generates full-feature small-size images of the images preprocessed by the image preprocessing subunit to obtain a training set, and the following steps are executed:

full feature small-size images are obtained by adaptive jump convolution (AJN, adaptive jump convolution) of the images with a window function W (m, n), AJN is calculated as follows:

Calculating the jump coefficient Rw of a line _(i，j) ：

Wherein Tr is a set fluctuation threshold, s _(i，j) For the row-up fluctuation quantization function of the ith row and the jth column, the calculation method is as follows:

wherein the method comprises the steps of

Calculating the jump coefficient Cw of a column _(i，j) ：

Wherein Tc is a set threshold, s _(i，j) For the column fluctuation quantization function of the ith row and the jth column, the calculation method is as follows:

wherein the method comprises the steps of

/>

Computing a multiscale image as

Wherein the method comprises the steps of

Illustratively, q _(0，0) ＝(Rw _(0，0) &Rw _(1，0) &Rw _(2，0) &...Rw _(R，0) )&Cw _(0，2) &Cw _(0，2) &Cw _(0，2) &...Cw _(0，C) ),&Is the operator of the device.

r＝min(r ₀ ，r ₁ ，...，r _j ，...，r _m ) Wherein r is _j Satisfy the following requirements

c＝min(c ₀ ，c ₁ ，...，c _j ，...，c _n ) Wherein c _i Satisfy the following requirements

Equalizing the normalized image, namely:

wherein i is more than or equal to 0 and less than or equal to C, j is more than or equal to 0 and less than or equal to R, C represents the number of columns of the image, R represents the number of rows of the image, and W (m, n) is a window function of m multiplied by n.

The image extraction subunit is in signal connection with the training subunit, and comprises: an intensity layering subunit and an extraction subunit; an intensity layering subunit, configured to intensity layer the image; the extraction subunit is used for extracting the color patch graph obtained by intensity layering, and the method for intensity layering the image comprises the following steps:

setting different layer images to be expressed as three-dimensional functions: (x, y, f (x, y)) where x and y determine the position of the pixel value in the gray scale image corresponding to the image, f (x, y) is the gray scale value in the gray scale image corresponding to the image, let the gray scale level of the gray scale image be [0, L ] _max -1]，L _max Taking a plane axis formed by gray scales as a gray scale axis, wherein the plane axis is the maximum gray scale value; let P planes perpendicular to the gray axis be defined as { l }, respectively ₁ ，l ₂ ，l ₃ ，...，l _p }，0＜p＜L _max The method comprises the steps of carrying out a first treatment on the surface of the The P planes divide the gray axis into P+1 intensities { V ] ₁ ，V ₂ ，V ₃ ，...，V _k ，...，V _p+1 -1.ltoreq.k.ltoreq.p+1, and f (x, y) is represented as: f (x, y) =c·v _k Wherein c is the k-th level intensity V _k Is used for the image display, and the related gray coefficient of the same.

Setting the interval layer number as s.P, s as interval coefficient, s E [0,1 ]]The local minimum value point sequences in the gray scale axis and the two intensity planes are ordered from small gray scale to large gray scale, and the total number of the local minimum value point sequences is set as P _m The intensity layer number is P _l The following steps are:

when P _l ＝P _m When the characteristic of the image is consistent with the intensity layering, the intensity layering is directly carried out on the image;

when P _l ＜P _m At-1, features representing the image are coarsened to a higher degree of refinement than the number of intensity layers, and selected from a sequence of minutiae points (P _m -P _l +1) intensity layering is carried out on local minimum value points with larger gray scale intervals;

when P _l ＞P _m When-1, the feature refinement degree of the image is smaller than the intensity layering number, and the supplement of the hooks between two extreme values with larger gray scale interval is selected in the local minimum value sequence (P _l -P _m +1) point.

The feature extraction subunit is used for extracting multi-scale structural features of the image, wherein the multi-scale structural features of the image comprise multi-dimensional gray scale change features of the image, multi-dimensional effective block ratios of the image, block edge change features of the image and target body scale distribution features of the image.

Before extracting the features, threshold segmentation is carried out on each image to obtain a binarized image img _B Image img _B Performing normalization multidimensional feature operation to obtain a segmented image X; the specific method comprises the following steps:

multidimensional gray scale feature DE:

multidimensional gray scale variation feature DEB:

wherein g (x, y) is img _B The gray value of the pixel (X, y), N represents the total number of pixels equal to 1 in the divided image X, i.e., the number of target points, N represents the total number of divided images, and f represents the slope function of the fitting straight line.

And the detection subunit is used for detecting the training set by using a support vector machine, and detecting the structural characteristics of the training set to separate one target or a plurality of targets, thereby obtaining a final detection result.

The target tracking mainly comprises a tracking unit, a detection unit and a learning unit. The method for tracking the target mainly comprises the following steps:

on the basis of the target detected by the target detection part, the tracking unit adopts a median optical flow method to predict and determine a predicted target according to target region information in the previous frame of image as a target tracking result.

Judging whether the number of pixel points occupied by a target area in a previous frame image in a current frame image is larger than a pixel threshold value or not;

if so, reducing the target area in the previous frame image and the current frame image according to a certain length-width ratio, and performing global traversal on the current frame image in a sliding window mode to obtain the target to be selected determined by the detection unit.

If not, carrying out local area division on the current frame image, and carrying out local traversal in a sliding window mode to obtain a target to be selected determined by the detection unit, wherein the pixel threshold value is T _pix 。

Processing at least one target to be selected by adopting a variance classifier, a random classifier and a nearest neighbor classifier of a detection unit to form a target detection result; and determining a target tracking area according to the target tracking result and the target detection result to finally finish target tracking of the current frame.

Initializing and selecting a target area from a first frame image, wherein the method comprises the following steps of: at the beginning frame of the video or picture sequence, a rectangle containing the target is manually determined and the initial coordinates and width and height information of the rectangle are obtained as initialized target area information.

The sliding window mode comprises the following steps: setting a scanning strategy of a sliding window in a current frame image; the scanning strategy is: the displacement of the sliding window in the horizontal direction and the vertical direction is the quantification (Ration) of the width and the height of the sliding window, the size of the sliding window is enlarged or reduced by t times as required after each scanning, and the size of the minimum sliding window is n.

The local area division of the current frame image comprises the following steps: the target center position of the previous frame is used as a local traversing area dividing center, and an image area formed by the width and the height of t times of the target of the previous frame is used as the dividing of the local area in the current frame.

When the target detection is carried out, a technical scheme of image classification is adopted first, and the classification process is different from the classification in the traditional meaning. This classification process aims at classifying according to image quality, and preprocessing the image according to the classification result.

Therefore, the situation that the follow-up detection and tracking accuracy is reduced due to the fact that the background contrast of the original image is low, the shielding is more or targets are more can be reduced. The final classification result is more accurate, and meanwhile, the other part of images with higher image background contrast degree are not processed, and the other part of images are directly subjected to target detection, so that the technical effect of not reducing the detection efficiency can be ensured.

The method is characterized in that a scheme of carrying out target detection based on intensity layering and multi-scale features of images is adopted, and a minimum value method based on an intensity plane is provided for carrying out feature classification; the extraction subunit is established and used for extracting the color spot graph obtained by intensity layering, and the algorithm is completely different from the existing algorithm, so that the technical effect that the accuracy of the final detection result is greatly improved is achieved.

The self-adaptive jump convolution method is provided, the self-adaptive jump convolution is carried out on the image after the pre-processing, the minimum-size characteristic image containing the full characteristic is obtained, and the obtained smaller-size image can improve the characteristic extraction efficiency.

The whole system is formed by the following steps: the embodiment of the invention acquires the video image from the camera. Firstly, after the video is acquired by a video receiving and encoding module, a video analyzing and reasoning module is used for detecting and tracking targets aiming at images in the video, so that the automation and the intellectualization of video processing are realized.

The target detection process comprises the following steps: when the embodiment of the invention is used for detecting the target, firstly, the image classification is carried out, and the classification process is different from the classification in the traditional meaning. This classification process aims at classifying according to image quality, and preprocessing the image according to the classification result. Therefore, the situation that the follow-up detection and tracking accuracy is reduced due to the fact that the background contrast of the original image is low, the shielding is more or targets are more can be reduced. The final classification result is more accurate, and meanwhile, the image with higher background contrast of the other part of the image is not processed, and the target detection is directly carried out on the image, so that the detection efficiency is not reduced.

The target detection algorithm used in the embodiment of the invention is used for detecting targets based on the intensity layering and multi-scale characteristics of the images, and is completely different from the existing algorithm. And the algorithm greatly improves the accuracy of the final detection result.

As shown in fig. 6, an embodiment of the present disclosure provides a target tracking apparatus, including:

a layering module 110, configured to perform intensity layering of the gray values based on the gray values of the pixels in the target image to obtain one or more gray maps;

the extracting module 120 is configured to perform multi-scale structural feature extraction on the gray level map to obtain multi-scale structural features;

and the tracking module 130 is used for tracking the target according to the multi-scale structural features.

The object tracking device can be applied to various electronic equipment, including but not limited to: a terminal and/or a server, etc.

In one embodiment, the layering module 110, the extracting module 120, and the tracking module 130 may be program modules; the program modules may implement the functions of any of the modules described above when executed by a processor.

In another embodiment, the layering module 110, the extracting module 120, and the tracking module 130 may be soft and hard aggregation modules; the soft and hard combined module comprises but is not limited to various programmable arrays; the programmable array includes, but is not limited to: a field programmable array and/or a complex programmable array.

In still other embodiments, the layering module 110, the extraction module 120, and the tracking module 130 may be pure hardware modules; the pure hardware modules include, but are not limited to, application specific integrated circuits.

In some embodiments, the tracking module 130 is specifically configured to determine, based on the nth object of the mth frame image, a candidate region of the mth+1th frame image; wherein m and n are positive integers; determining a similarity between the multi-scale structural feature of the nth object of the mth frame image and the multi-scale structural feature extracted from a candidate region of the (m+1) th frame image; and determining the position of the nth target in the (m+1) th frame image according to the similarity.

In some embodiments, the tracking module 130 is specifically configured to perform at least one of:

In some embodiments, the layering module 110 is specifically configured to perform intensity layering according to a gray value of a pixel of the target image to obtain Pm-1 gray maps, where Pm is a local minimum number of gray values of the pixel of the target image, gray values of pixels of different gray maps are located between different local minimum groups, and one of the local minimum groups includes: adjacent two local minima.

In some embodiments, the layering module 110 is further specifically configured to divide the target image into a plurality of layered images according to the gray values of the pixels; the gray values of the pixels contained in different layered images are different; determining the number of local minima contained in pixels of two adjacent layered images; and when the number of the local minima is equal to a preset value, respectively determining the two adjacent layered images as the gray level map.

In some embodiments, the layering module 110 is specifically configured to insert a layered image in the two adjacent layered images when the number of the local minima is smaller than the preset value; and/or when the number of the local minima is larger than the preset value, merging the two adjacent layered images.

In some embodiments, the apparatus further comprises:

In some embodiments, the extracting module 120 is specifically configured to perform gray value-based processing on each gray map to obtain one or more of a multi-dimensional gray feature, a multi-dimensional gray variation feature, a multi-dimensional effective tile ratio, a tile edge variation feature, and a target scale distribution feature.

As shown in fig. 7, an embodiment of the present disclosure provides a target tracking apparatus, including:

the region module 210 is configured to divide an image region of the target image into a first type region and a second type region; wherein, the difference value between the target and the background in the first type area is larger than the difference value between the target and the background in the second type area;

The image to be processed module 220 is configured to perform target enhancement processing on the second type region of the target image, so as to obtain the image to be processed;

and the target module 230 is used for tracking a target based on the image to be processed.

In some embodiments, the region module 210, the image to be processed module 220, and the object module 230 may be program modules; the program modules are capable of implementing the functions of the respective modules described above when executed by a processor.

In other embodiments, the region module 210, the module to be processed, and the target module 230 may be a soft-hard combined module; the soft and hard set modules include, but are not limited to, various programmable arrays; the programmable array includes, but is not limited to: a field programmable array and/or a complex programmable array.

In some embodiments, the image to be processed module 220 is specifically configured to perform high-hat transformation on the second type region of the original image to obtain the image to be processed for the target enhancement processing.

As shown in fig. 8, an embodiment of the present disclosure provides an electronic device including:

a memory;

and a processor, coupled to the memory, for implementing the object tracking method provided in any of the foregoing embodiments by executing computer-executable instructions stored on the memory, for example, executing the text processing method as shown in any of fig. 1 to 4.

The electronic device may be a terminal device and/or a server in a service platform.

As shown in fig. 8, the electronic device may also include a network interface that may be used to interact with a peer device over a network.

Embodiments of the present disclosure provide a computer storage medium having stored thereon computer-executable instructions; the computer-executable instructions, when executed by a processor, enable the text processing method provided by any of the foregoing embodiments, for example, performing the object tracking method as shown in any of fig. 1 to 4.

The computer storage medium is a non-transitory storage medium.

The technical schemes described in the embodiments of the present disclosure may be arbitrarily combined without any conflict.

In several embodiments provided in the present disclosure, it should be understood that the disclosed method and intelligent device may be implemented in other manners. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.

The units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated in one second processing unit, or each unit may be separately used as one unit, or N or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.

The foregoing is merely specific embodiments of the disclosure, but the protection scope of the disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the disclosure, and it is intended to cover the scope of the disclosure.

Claims

1. A target tracking method, comprising:

and tracking the target according to the multi-scale structural features.

2. The method of claim 1, wherein said performing object tracking from said multi-scale structural features comprises:

3. The method according to claim 2, wherein the mth target based on the mth frame image is in a candidate region of the (m+1) th frame image, comprising at least one of:

4. The method of claim 1, wherein the performing intensity layering of the gray values based on the gray values of pixels within the target image results in one or more gray maps, comprising:

5. The method of claim 4, wherein the intensity layering according to the gray values of the pixels of the target image to obtain Pm-1 gray scale maps comprises:

6. The method of claim 5, wherein the intensity layering according to the gray values of the pixels of the target image to obtain Pm-1 gray scale maps comprises:

and/or the number of the groups of groups,

7. The method according to claim 1, wherein the method further comprises:

8. The method of claim 7, wherein the method further comprises:

9. The method according to any one of claims 1 to 7, wherein the performing the gray scale map for multi-scale structural feature extraction, to obtain multi-scale structural features, includes:

10. A target tracking method, comprising:

and tracking the target based on the image to be processed.

11. The method according to claim 10, wherein performing the target enhancement processing on the second type region of the target image to obtain the image to be processed includes:

12. An object tracking device, comprising:

13. The apparatus according to claim 12, wherein the tracking module is configured to specifically determine a candidate region in the (m+1) -th frame image based on the (n) -th target in the (m) -th frame image; wherein m and n are positive integers; determining a similarity between the multi-scale structural feature of the nth object of the mth frame image and the multi-scale structural feature extracted from a candidate region of the (m+1) th frame image; and determining the position of the nth target in the (m+1) th frame image according to the similarity.

14. The apparatus according to claim 13, wherein the tracking module is specifically configured to perform at least one of:

15. The apparatus of claim 12, wherein the layering module is specifically configured to perform intensity layering according to a gray value of a pixel of the target image to obtain Pm-1 gray maps, where Pm is a number of local minima of the gray value of the pixel of the target image, the gray value of the pixel of the different gray maps is located between different local minima groups, and one of the local minima groups includes: adjacent two local minima.

16. The apparatus of claim 15, wherein the layering module is further specifically configured to divide the target image into a plurality of layered images according to the gray values of the pixels; the gray values of the pixels contained in different layered images are different; determining the number of local minima contained in pixels of two adjacent layered images; and when the number of the local minima is equal to a preset value, respectively determining the two adjacent layered images as the gray level map.

17. The apparatus according to claim 16, wherein the layering module is specifically configured to insert layered images in the two adjacent layered images when the number of local minima is smaller than the preset value; and/or when the number of the local minima is larger than the preset value, merging the two adjacent layered images.

18. The apparatus of claim 12, wherein the apparatus further comprises:

19. The apparatus of claim 18, wherein the apparatus further comprises:

20. The apparatus according to any one of claims 12 to 19, wherein the extracting module is specifically configured to perform gray value-based processing on each of the gray maps to obtain one or more of a multi-dimensional gray scale feature, a multi-dimensional gray scale variation feature, a multi-dimensional effective tile ratio, a tile edge variation feature, and a target scale distribution feature.

21. An object tracking device, comprising:

22. The apparatus according to claim 21, wherein the image to be processed module is specifically configured to perform high-hat transformation on the second type region of the original image to obtain the image to be processed for the target enhancement processing.

23. An electronic device, the electronic device comprising:

a memory;

a processor, coupled to the memory, for enabling the object tracking method provided in any one of claims 1 to 9 or 10 to 11 by executing computer executable instructions stored on the memory.

24. A computer storage medium having stored thereon computer executable instructions; the computer executable instructions, when executed by a processor, enable the object tracking method provided in any one of claims 1 to 9 or 10 to 11.