CN108932496B

CN108932496B - Method and device for counting number of target objects in area

Info

Publication number: CN108932496B
Application number: CN201810712189.5A
Authority: CN
Inventors: 谭文轩; 余睿; 宋宽; 顾竹; 张弓
Original assignee: Beijing Jiage Tiandi Technology Co ltd
Current assignee: Beijing Jiage Tiandi Technology Co ltd
Priority date: 2018-07-03
Filing date: 2018-07-03
Publication date: 2022-03-25
Anticipated expiration: 2038-07-03
Also published as: CN108932496A

Abstract

The invention provides a method and a device for counting the number of target objects in an area, wherein the method comprises the following steps: acquiring an image within a region; extracting a plurality of frame images in the image as key frame images; screening a plurality of key frame images, and calibrating a target object in the key frame images by adopting a calibration frame; calculating to obtain a preliminary detection result of the number of the target objects according to the calibration of the target objects in the key frame image; performing a first tracking check on the preliminary result; performing a second tracking check on the preliminary result; and updating the number of the target objects according to the result of the first tracking check and the result of the second tracking check to obtain a final detection result of the number of the target objects. The method for counting the number of the target objects in the area performs data tracking twice on the data, updates the number of the target objects by matching the results of tracking inspection twice, so as to accurately obtain the number of the target objects, and greatly improves the result mutation and deterioration caused by the jitter and the overlapping of the target objects.

Description

Method and device for counting number of target objects in area

Technical Field

The present invention relates to the field of image processing, and in particular, to a method and an apparatus for counting the number of objects in an area.

Background

The description of the background of the invention pertaining to the related art to which this invention pertains is given for the purpose of illustration and understanding only of the summary of the invention and is not to be construed as an admission that the applicant is explicitly or implicitly admitted to be prior art to the date of filing this application as first filed with this invention.

At present, most breeding enterprises rely on manual work to count and calculate, the working environment is severe, and errors are inevitably generated in real-time moving animal counting. Meanwhile, the low manual efficiency further reduces the timeliness of the data and the value of the data. In the emerging solution, a large number of neural network methods are limited to the training and modification of the detector, and for a large-scale and large-scale aquaculture scene, a single camera cannot cover all areas, the time sequence and the space mobility of livestock are ignored in the counting process of a single target detection algorithm, and the result deviation is extremely large. In early research work, the matching problem of detection is the most important core problem of the tracking problem, so that based on the traditional Hungarian algorithm, the Kalman filtering algorithm is matched with the effect of cascade matching, and the tracking result is strongly degraded due to the conditions of sudden change of the performance of the detector, object overlapping and the like. It is therefore extremely important to ensure that the tracker is relatively stable and robust against data fluctuations.

Disclosure of Invention

The invention provides a statistical method for the number of target objects in a region, which comprises the following steps:

acquiring an image within a region;

extracting a plurality of frame images in the image as key frame images;

screening a plurality of key frame images, and calibrating a target object in the key frame images by adopting a calibration frame;

calculating to obtain a preliminary detection result of the number of the target objects according to the calibration of the target objects in the key frame image;

performing a first tracking check on the preliminary result;

performing a second tracking check on the preliminary result;

and updating the number of the target objects according to the result of the first tracking check and the result of the second tracking check to obtain a final detection result of the number of the target objects.

Preferably, the first trace check on the preliminary result comprises:

generating a first list according to the images of the plurality of calibrated target objects in each key frame image;

performing feature extraction of scale invariance features on the images of the plurality of target objects in each first list to generate a second list;

matching the features in the second list of the Nth key frame image to be tracked and checked with the features in the second lists of the K key frame images before and after the Nth key frame image to be tracked and checked;

when the features of the second lists of the front and rear K key frame images are continuously matched with the features in the second list of the Nth key frame image successfully, judging that the unmatched features are the miss-selection target objects and updating the number of the target objects, wherein the unmatched features exist in the features of the second list of the current rear K key frame images and the features in the second list of the Nth key frame image;

and when the characteristics of the front and back K key frame image second lists are not matched with the characteristics in the Nth key frame image second list, matching the characteristics in the Nth key frame image second list with the characteristics in the previous frame second list, and when unmatched characteristics appear, judging that the unmatched characteristics are the missed target objects, and updating the number of the target objects.

Preferably, after the feature extraction of scale invariant features is performed on the images of the plurality of objects in each first list, and the generation of the second list further includes:

matching features in a second list of a plurality of key frame images within a set time;

when the feature matching is successful, judging that a target object corresponding to the successfully matched feature exists, and updating the number of the target objects;

and when the features are not matched successfully, judging that the target object corresponding to the unmatched features does not exist, and updating the number of the target objects.

Preferably, the feature extraction of scale invariant features is performed on the images of the plurality of objects in each first list, and the generating of the second list includes:

generating a Gaussian difference pyramid from the image of each target object in the first list, and constructing a scale space;

in the scale space, comparing each pixel point with all adjacent points thereof to determine extreme points;

screening the accurate positioning of the extreme point stable key points;

distributing stable key point direction information;

and describing the key points by using a descriptor of the multi-dimensional vector to generate a second list.

Preferably, the gaussian difference pyramid is constructed by the following formula:

the construction formula of the scale space is as follows:

L(x,y)＝G(x,y)*I(x,y)；

wherein, G (x, y) is a Gaussian function, L (x, y) is a scale space, I (x, y) is the pixel value of the original image at the point of x and y, x and y are the coordinates of the original image respectively₀,y₀Respectively, the coordinates of the target image, e is an irrational number, sigma₁Is a constant.

Preferably, the formula for the stable key point direction information distribution is as follows:

where m (x, y) is the gradient magnitude and L (x, y) is the scale space.

Preferably, performing a second trace check on the preliminary result comprises:

recording a key frame image of disappearance of a target object needing to be tracked and checked and a key image of reappearance of the target object;

extracting the position information of key points of a calibration frame of a plurality of key frame images before the key frame image with the disappeared target object;

extracting the position information of key points of a calibration frame of a plurality of key frame images after the key frame images reappearing to the target object;

calculating the difference value of the corresponding points according to the position information of the plurality of key points extracted twice;

calculating a mean square error value according to the plurality of difference values;

comparing the mean square deviation value with a preset threshold value;

when the mean square deviation value is smaller than the threshold value, the disappeared target object and the reappeared target object are the same target object, and the number of the target objects is updated;

and when the mean square deviation value is larger than the threshold value, the disappeared target objects and the reappeared target objects are two target objects, and the number of the target objects is updated.

Preferably, performing a second trace check on the preliminary result further comprises:

when the detection evaluation function values of the blocked target object and the blocking target object are larger than a preset threshold value, and the blocking target object does not disappear in the key frame image with the disappeared blocking target object, recording the position information of a plurality of key points of the blocking target object in the key frame image with the disappeared blocking target object;

reading the position information of a plurality of key points of the occluded target object in the key frame image reappeared by the occluded target object;

calculating the difference value of the corresponding points according to the position information of the shielding target object and the position information of the shielded target object;

comparing the mean square deviation value with a preset threshold value;

when the mean square deviation value is smaller than the threshold value, the shielded target object and the shielded target object are the same target object, and the number of the target objects is updated;

and when the mean square deviation value is larger than the threshold value, the shielded target object and the shielded target object are two target objects, and the number of the target objects is updated.

Preferably, the equation for the mean square error value is:

wherein σ₂Is mean square error, n is the number of keypoints, x_iIs the average of the ith difference, the μ difference.

Preferably, the key points of the calibration box include: the center point of the calibration frame, the center point of each frame of the calibration frame and each corner point of the calibration frame.

An embodiment of the second aspect of the present invention provides an apparatus for counting the number of objects in a region, including:

the acquisition module is used for acquiring images in the region;

the extraction module is used for extracting a plurality of frame images in the images as key frame images;

the calibration module is used for screening a plurality of key frame images and calibrating a target object in the key frame images by adopting a calibration frame;

the operation module is used for obtaining a preliminary detection result of the number of the target objects through operation according to the calibration of the target objects in the key frame images;

a first module for performing a first tracking check on the preliminary result;

a second module for performing a second tracking check on the preliminary result;

and the result module updates the number of the target objects according to the result of the first tracking check and the result of the second tracking check to obtain the final detection result of the number of the target objects.

Preferably, the first module comprises:

the image unit is used for generating a first list according to the images of the plurality of calibrated target objects in each key frame image;

the characteristic unit is used for carrying out characteristic extraction of scale invariance characteristics on the images of the plurality of target objects in each first list to generate a second list;

the first matching unit is used for matching the features in the second list of the Nth key frame image to be tracked and checked with the features in the second lists of the K key frame images before and after the N key frame images;

the first judging unit is used for judging whether the characteristics of the second list of the K key frame images are matched with the characteristics of the second list of the Nth key frame image, if so, judging whether the unmatched characteristics are the missed target objects, and updating the number of the target objects; and when the characteristics of the front and back K key frame image second lists are not matched with the characteristics in the Nth key frame image second list, matching the characteristics in the Nth key frame image second list with the characteristics in the previous frame second list, and when unmatched characteristics appear, judging that the unmatched characteristics are the missed target objects, and updating the number of the target objects.

Preferably, the first module further comprises:

the second matching unit is used for matching the features in the second list of the plurality of key frame images within the set time;

the second judging unit is used for judging that the target object corresponding to the successfully matched feature exists and updating the number of the target objects when the feature matching is successful; and when the features are not matched successfully, judging that the target object corresponding to the unmatched features does not exist, and updating the number of the target objects.

Preferably, the feature unit includes:

the construction subunit is used for generating a Gaussian difference pyramid from the image of each target object in the first list and constructing a scale space;

the comparison subunit is used for comparing each pixel point with all adjacent points thereof in a scale space to determine an extreme point;

the screening subunit is used for screening the accurate positioning of the extreme point stable key points;

the distribution subunit is used for stabilizing the distribution of the direction information of the key points;

and the generating subunit is used for describing the key points by using the descriptors of the multidimensional vectors to generate a second list.

the construction formula of the scale space is as follows:

L(x,y)＝G(x,y)*I(x,y)；

where m (x, y) is the gradient magnitude and L (x, y) is the scale space.

Preferably, the second module comprises:

the first recording unit is used for recording a key frame image of disappearance of a target object needing tracking inspection and a key image of reappearance of the target object;

a first extraction unit configured to extract position information of key points of a calibration frame of a plurality of key frame images before a key frame image in which a target object disappears;

a second extraction unit configured to extract position information of key points of a calibration frame of the plurality of key frame images after a key frame image in which the target object reappears;

the first difference unit is used for calculating the difference of the corresponding points according to the position information of the plurality of key points extracted twice;

a first calculation unit for calculating a mean square error value from the plurality of difference values;

the first comparison unit is used for comparing the mean square deviation value with a preset threshold value;

a third determination unit, configured to update the number of the target objects when the mean square deviation value is smaller than the threshold value, and the disappeared target object and the reappeared target object are the same target object; and when the mean square deviation value is larger than the threshold value, the disappeared target objects and the reappeared target objects are two target objects, and the number of the target objects is updated.

Preferably, the second module further comprises:

the second recording unit is used for recording the position information of a plurality of key points of the shielding target object in the key frame image with the disappeared shielding target object when the detection evaluation function values of the shielded target object and the shielding target object are larger than a preset threshold value and the shielding target object does not disappear;

the reading unit is used for reading the position information of a plurality of key points of the occluded target object in the key frame image reappeared by the occluded target object;

the second difference unit is used for calculating the difference of the corresponding points according to the position information of the shielding target object and the position information of the shielded target object;

a second calculation unit for calculating a mean square error value from the plurality of difference values;

the second comparison unit is used for comparing the mean square deviation value with a preset threshold value;

a fourth judging unit, configured to update the number of the target objects when the mean square difference value is smaller than the threshold value, and the shielded target object are the same target object; and when the mean square deviation value is larger than the threshold value, the shielded target object and the shielded target object are two target objects, and the number of the target objects is updated.

Preferably, the equation for the mean square error value is:

The third aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method for counting the number of objects in any one of the above-mentioned areas.

The fourth embodiment of the invention provides a human-computer interaction device, which comprises a memory, a processor and a program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the method for counting the number of the target objects in any one of the areas.

According to the technical scheme provided by the invention, firstly, a neural network algorithm and a sufficient calibration frame are utilized to calibrate the target objects, then the number of the target objects is calculated, then, data tracking is carried out on the data twice, the number of the target objects is updated to obtain the final detection result of the number of the target objects through the cooperation of the results of tracking and checking twice, so that the number of the target objects is accurately obtained, and the result mutation and degradation caused by the shaking and overlapping of the target objects are greatly improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart of a first embodiment of a method for counting the number of objects in a region according to the present invention;

FIG. 2 is a schematic flow chart of a second embodiment of the method for counting the number of objects in a region according to the present invention;

FIG. 3 is a flow chart of a third embodiment of the method for counting the number of objects in a region according to the present invention;

FIG. 4 is a schematic flow chart of a fourth embodiment of the method for counting the number of objects in a region according to the present invention;

FIG. 5 is a schematic flow chart of a fifth embodiment of the method for counting the number of objects in a region according to the present invention;

FIG. 6 is a flow chart of a method for counting the number of objects in a region according to a sixth embodiment of the present invention;

FIG. 7 is a block diagram of an apparatus for counting the number of objects in a region according to the present invention;

FIG. 8 is a block diagram of a first embodiment of the first module shown in FIG. 7;

FIG. 9 is a block diagram of a second embodiment of the first module shown in FIG. 7;

FIG. 10 is a block diagram of the structure of the feature cell shown in FIG. 9;

FIG. 11 is a block diagram of the first embodiment of the second module shown in FIG. 7;

FIG. 12 is a block diagram of a second embodiment of the second module shown in FIG. 7;

FIG. 13 is a gradient direction histogram;

FIG. 14 is a schematic representation of the image gradient of the region around a keypoint;

FIG. 15 is a schematic diagram of a first embodiment of a keypoint descriptor;

fig. 16 is a schematic diagram of a second embodiment of the keypoint descriptor.

Wherein, the correspondence between the reference numbers and the part names in fig. 7 to 12 is:

the system comprises a 10 acquisition module, a 20 extraction module, a 30 calibration module, a 40 operation module, a 50 first module, a 51 image unit, a 52 feature unit, a 521 construction subunit, a 522 comparison subunit, a 523 screening subunit, a 524 allocation subunit, a 525 generation subunit, a 53 first matching unit, a 54 first judgment unit, a 55 second matching unit, a 56 second judgment unit, a 60 second module, a 61 first recording unit, a 62 first extraction unit, a 63 second extraction unit, a 64 first difference unit, a 65 first calculation unit, a 66 first comparison unit, a 67 third judgment unit, a 68 second recording unit, a 69 reading unit, a 610 second difference unit, a 611 second calculation unit, a 612 second comparison unit, a 613 fourth judgment unit, a 70 result module and a device for counting the number of target objects in the 100 area.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

The following discussion provides multiple embodiments of the invention. While each embodiment represents a single combination of elements of the invention, the elements of different embodiments of the invention can be substituted or combined in any combination, and the invention thus also contemplates all possible combinations of elements of the same and/or different embodiments as disclosed. Thus, if one embodiment includes element A, B, C and another embodiment includes a combination of elements B and D, then the invention should also be construed as including embodiments that include all other possible combinations of one or more of the elements A, B, C, D, even though such embodiments may not be explicitly recited in the following text.

As shown in fig. 1, the statistical method for the number of target objects in the region provided by the first aspect of the present invention includes:

step 10, collecting images in a region;

step 20, extracting a plurality of frame images in the image as key frame images;

step 30, screening a plurality of key frame images, and calibrating a target object in the key frame images by adopting a calibration frame; specifically, the quality screening is performed after the image of the key frame is obtained. And eliminating extremely unstable images. Then, an Extensible Markup Language (XML) format is used for calibrating the target objects in the images, the target objects are usually single-item objects, so that the types of the marks are only one, the contents to be marked are coordinates of a left upper point and a right lower point of a rectangular frame to form the rectangular frame which completely wraps the object, and the images of the key frames are stored as an Extensible Markup Language file;

step 40, calculating to obtain a preliminary detection result of the number of the target objects according to the calibration of the target objects in the key frame image;

step 50, carrying out first tracking check on the preliminary result;

step 60, performing a second tracking check on the preliminary result;

and step 70, updating the number of the target objects according to the result of the first tracking check and the result of the second tracking check to obtain the final detection result of the number of the target objects.

According to the method for counting the number of the target objects in the area, the target objects are calibrated by utilizing a neural network algorithm and a sufficient calibration frame, then the number of the target objects is calculated, data tracking is carried out on the data twice, the number of the target objects is updated through the mutual matching of results of tracking and checking twice, and the final detection result of the number of the target objects is obtained, so that the number of the target objects is accurately obtained, and the method greatly improves the result mutation and degradation caused by shaking and overlapping of the target objects.

As shown in fig. 2, in one embodiment of the present invention, step 50 further comprises:

step 51, generating a first list according to the images of the plurality of calibrated target objects in each key frame image;

step 52, performing Scale-invariant feature transform (SIFT) feature extraction on the images of the plurality of objects in each first list to generate a second list;

step 53, matching the features in the second list of the nth key frame image to be tracked and checked with the features in the second lists of K key frame images before and after the nth key frame image;

In the embodiment, the characteristic matching is carried out based on the space scale invariance algorithm, the obtained quantity data of the target objects are tracked and checked, the problem that the selection omission of the primary detection result of the quantity of the target objects is solved, and the result degradation of the data fluctuation of the tracking result is caused is solved.

As shown in fig. 3, in an embodiment of the present invention, after step 52, the method further includes:

step 54, matching features in a second list of a plurality of key frame images within a set time;

In the embodiment, the characteristic matching is carried out based on the space scale invariance algorithm, the obtained quantity data of the target objects are tracked and checked, the problem that the primary detection result of the quantity of the target objects is subjected to multi-target coincidence, and the result of recounting is degraded is solved.

As shown in FIG. 4, in one embodiment of the present invention, step 52 comprises:

step 521, generating a gaussian difference pyramid from the image of each target object in the first list, and constructing a scale space;

step 522, comparing each pixel point with all adjacent points in the scale space, and determining an extreme point; specifically, in order to find the extreme points of the gaussian difference pyramid, each pixel point is compared with all adjacent points of the gaussian difference pyramid, whether the pixel point is larger or smaller than the adjacent points of the image domain and the scale space domain of the pixel point is judged, in the two-dimensional image space, the central point is compared with 8 points in the 3 × 3 adjacent domain of the pixel point, and in the scale space of the same group, the central point is compared with 2 × 9 points of two layers of images which are adjacent up and down, so that the detected key points are local extreme points in the scale space and the two-dimensional image space;

523, screening the precise location of the extreme point stable key points; specifically, the Difference of Gaussian (DOG) value is sensitive to noise and edges, so local extreme points detected in a scale space are further screened to remove unstable and erroneously detected extreme points, and the other point is that a downsampled image is adopted in the process of constructing a Gaussian pyramid, and the extreme points extracted from the downsampled image correspond to the exact positions of the original image;

step 524, stabilizing the direction information distribution of the key points; in particular, stable extreme points are extracted under different scale spaces, which guarantees scale invariance of the keypoints. The problem to be solved by assigning direction information to the keypoints is to make the keypoints invariant to image angle and rotation. The allocation of the direction is realized by gradient of each extreme point; the construction formula of the Gaussian difference pyramid is as follows:

the construction formula of the scale space is as follows:

L(x,y)＝G(x,y)*I(x,y)；

wherein, G (x, y) is a Gaussian function, L (x, y) is a scale space, I (x, y) is the pixel value of the original image at the point of x and y, x and y are the coordinates of the original image respectively₀,y₀Respectively, the coordinates of the target image, e is an irrational number, sigma₁Is a constant, σ₁＝16；

For any key point, the gradient magnitude is expressed as:

wherein m (x, y) is the gradient amplitude, and L (x, y) is the scale space;

the direction assigned to a keypoint is not directly the gradient direction of the keypoint, but is given in the form of a histogram of gradient directions;

the specific method comprises the following steps: calculating the gradient directions of all points in the neighborhood with the key point as the center, wherein the gradient directions are certainly in the range of 0-360 degrees, normalizing the gradient directions into 36 directions, and each direction represents the range of 10 degrees. The number of keypoints falling in each direction is then accumulated, thereby generating a gradient direction histogram as shown in fig. 13.

And if a peak value which is equivalent to 80% of energy of the main peak value exists in the gradient histogram, the direction is regarded as the auxiliary direction of the key point. The design of the auxiliary direction can enhance the robustness of matching, and Lowe indicates that about 15% of key points have the auxiliary direction, and exactly the 15% of key points play a key role in stable matching;

step 525, the key points are described by using a descriptor of a multi-dimensional vector to generate a second list, specifically, the pixel region around the key points is blocked, a fast internal gradient histogram is calculated, and a unique vector is generated, wherein the vector is an abstract expression of the image information of the region.

As shown in fig. 14, for 2 × 2 blocks, the gradients of all the pixels in each block are gaussian weighted, and each block finally takes 8 directions, i.e., a 2 × 8-dimensional vector as shown in fig. 15 can be generated, and the 2 × 8-dimensional vector is used as the mathematical description of the central keypoint.

As shown in fig. 16, the descriptor of the 128-dimensional vector with 4 × 8 is finally performed on the keypoints to perform the keypoint description:

the first tracking check is based on the feature matching of the space scale invariance algorithm, so that the problems of shaking and overlapping of the target objects can be detected more quickly, the number of the accurate target objects can be obtained quickly, and the detection efficiency is improved.

As shown in FIG. 5, in one embodiment of the present invention, step 60 comprises:

step 61, recording a key frame image of disappearance of the target object to be tracked and checked and a key image of reappearance of the target object;

step 62, extracting the position information of key points of a calibration frame of a plurality of key frame images before the key frame image with the disappeared target object;

step 63, extracting the position information of key points of a calibration frame of a plurality of key frame images after the key frame images reappeared to the target object;

step 64, calculating the difference value of the corresponding points according to the position information of the plurality of key points extracted twice;

step 65, calculating a mean square error value according to the plurality of difference values;

step 66, comparing the mean square deviation value with a preset threshold value;

In the embodiment, the operator variance principle based on the conventional data structure is used for tracking and checking the quantity data of the obtained target object, so that the problem of missed selection of the primary detection result of the quantity of the target object to cause the result degradation of the data fluctuation of the tracking result is solved, on one hand, the time of manual work under severe conditions is greatly reduced, the work efficiency is improved, on the other hand, the auditing and statistical efficiency of the target object are more accurate, and the detection accuracy is improved.

As shown in fig. 6, in one embodiment of the present invention, step 60 further comprises:

step 67, when the value of the detection evaluation function (IOU) of the occluded target object and the occluded target object is greater than the preset threshold value, and the occluded target object does not disappear in the key frame image in which the occluded target object disappears, recording the position information of a plurality of key points of the occluded target object in the key frame image in which the occluded target object disappears;

step 68, reading the position information of a plurality of key points of the occluded target object in the key frame image reappeared by the occluded target object;

step 69, calculating the difference of the corresponding points according to the position information of the shielding target object and the position information of the shielded target object;

step 610, calculating a mean square error value according to the plurality of difference values;

step 611, comparing the mean square deviation value with a preset threshold value;

In this embodiment, based on the operator variance principle of the conventional data structure, the obtained quantity data of the target object is tracked and checked, and the problem that the primary detection result of the quantity of the target object is subjected to multi-target coincidence to cause the deterioration of the recounting result is solved.

In one embodiment of the present invention, the equation for calculating the mean square error value is:

In one embodiment of the invention, scaling the key points of the box comprises: the center point of the calibration frame, the center point of each frame of the calibration frame and each corner point of the calibration frame.

The second tracking check is based on the operator variance principle of the conventional data structure, so that the problems of shaking and overlapping of the target objects can be solved more accurately, the number of the target objects can be obtained more accurately, and the detection accuracy is improved.

As shown in fig. 7, an embodiment of the second aspect of the present invention provides a device 100 for counting the number of objects in a region, including: the system comprises an acquisition module 10, an extraction module 20, a calibration module 30, an operation module 40, a first module 50, a second module 60 and a result module 70.

In particular, the acquisition module 10 is used to acquire images within an area; the extraction module 20 is configured to extract a plurality of frame images in the image as key frame images; the calibration module 30 is configured to screen a plurality of key frame images and calibrate a target object in the key frame images by using a calibration frame; the operation module 40 is configured to obtain a preliminary detection result of the number of the target objects according to the calibration operation on the target objects in the key frame image; the first module 50 is used for performing a first trace check on the preliminary result; a second module 60 for performing a second trace check on the preliminary result; the result module 70 updates the number of the target objects according to the result of the first tracking check and the result of the second tracking check to obtain a final detection result of the number of the target objects.

According to the device 100 for counting the number of the target objects in the area, provided by the invention, the extraction module 20 calibrates the target objects by using a neural network algorithm and a sufficient calibration frame, then the operation module 40 calculates the number of the target objects, then the data is subjected to data tracking twice through the first module 50 and the second module 60, the results of tracking inspection twice through the result module 70 are matched with each other, the number of the target objects is updated to obtain the final detection result of the number of the target objects, so that the number of the target objects is accurately obtained, and the result mutation and degradation caused by the jitter and overlapping of the target objects are greatly improved.

In one embodiment of the invention, the acquisition device is installed in the slide rail, each sampling point is provided with a parking groove for blocking the shooting device, and when the shooting device is stable, the shooting device shoots right below. Thereby completely obtaining the top view of the target object. The acquisition device is a video recording device and is used for acquiring videos of target objects in the area, specifically, the acquisition device is a recording device, the target objects are pigs, cattle, sheep and the like, and the acquisition area is as follows: a farm; the device can also be used for counting the number of people.

As shown in fig. 8, in one embodiment of the present invention, the first module 50 includes: an image unit 51, a feature unit 52, a first matching unit 53 and a first decision unit 54.

Specifically, the image unit 51 is configured to generate a first list according to the images of the plurality of targets calibrated in each key frame image; the feature unit 52 is configured to perform feature extraction of scale invariant features on the images of the multiple target objects in each first list, and generate a second list; the first matching unit 53 is configured to match features in the second list of the nth key frame image to be tracked and checked with features in the second lists of K key frame images before and after the nth key frame image; the first determining unit 54 is configured to determine that the unmatched features are the missed-selection target objects and update the number of the target objects when the features of the second lists of the preceding and following K key frame images are continuously matched with the features of the second list of the nth key frame image successfully, and the features of the second lists of the preceding and following K key frame images and the features of the second lists of the nth key frame images have unmatched features; and when the characteristics of the front and back K key frame image second lists are not matched with the characteristics in the Nth key frame image second list, matching the characteristics in the Nth key frame image second list with the characteristics in the previous frame second list, and when unmatched characteristics appear, judging that the unmatched characteristics are the missed target objects, and updating the number of the target objects.

In this embodiment, the first module 50 performs feature matching based on a spatial scale invariance algorithm, performs tracking check on the quantity data of the obtained target object, and solves the problem of missed selection of the preliminary detection result of the quantity of the target object, which results in the result degradation of data fluctuation of the tracking result.

As shown in fig. 9, in one embodiment of the present invention, the first module 50 further comprises: a second matching unit 55 and a second determination unit 56.

Specifically, the second matching unit 55 is configured to match features in a second list of a plurality of key frame images within a set time; the second determination unit 56 is configured to determine that the target object corresponding to the successfully matched feature exists when the feature matching is successful, and update the number of the target objects; and when the features are not matched successfully, judging that the target object corresponding to the unmatched features does not exist, and updating the number of the target objects.

In this embodiment, the first module 50 performs feature matching based on a spatial scale invariance algorithm, performs tracking check on the quantity data of the obtained target objects, and solves the problem that the primary detection result of the quantity of the target objects is multi-target coincident, resulting in deterioration of recounting results, on one hand, the time for manual work under severe conditions is greatly reduced, and the work efficiency is improved, on the other hand, auditing and statistical efficiency of the target objects are more accurate, and the detection accuracy is improved.

As shown in fig. 10, in one embodiment of the present invention, the feature unit 52 includes a construction sub-unit 521, a comparison sub-unit 522, a screening sub-unit 523, an assignment sub-unit 524, and a generation sub-unit 525.

Specifically, the constructing subunit 521 is configured to generate a gaussian difference pyramid from the image of each target object in the first list and construct a scale space;

the comparison subunit 522 is configured to compare each pixel point with all its neighboring points in the scale space to determine an extreme point; specifically, in order to find the extreme points of the gaussian difference pyramid, each pixel point is compared with all adjacent points of the gaussian difference pyramid, whether the pixel point is larger or smaller than the adjacent points of the image domain and the scale space domain of the pixel point is judged, in the two-dimensional image space, the central point is compared with 8 points in the 3 × 3 adjacent domain of the pixel point, and in the scale space of the same group, the central point is compared with 2 × 9 points of two layers of images which are adjacent up and down, so that the detected key points are local extreme points in the scale space and the two-dimensional image space;

the screening subunit 523 is configured to screen the precise locations of the extreme point stability key points; specifically, the Difference of Gaussian (DOG) value is sensitive to noise and edges, so local extreme points detected in a scale space are further screened to remove unstable and erroneously detected extreme points, and the other point is that a downsampled image is adopted in the process of constructing a Gaussian pyramid, and the extreme points extracted from the downsampled image correspond to the exact positions of the original image;

an allocation subunit 524 is used to stabilize the key point direction information allocation; in particular, stable extreme points are extracted under different scale spaces, which guarantees scale invariance of the keypoints. The problem to be solved by assigning direction information to the keypoints is to make the keypoints invariant to image angle and rotation. The allocation of the direction is realized by gradient of each extreme point; the construction formula of the Gaussian difference pyramid is as follows:

the construction formula of the scale space is as follows:

L(x,y)＝G(x,y)*I(x,y)；

wherein G (x, y) is a Gaussian function,l (x, y) is a scale space, I (x, y) is a pixel value of the original image at an x and y point, x and y are coordinates of the original image respectively₀,y₀Respectively, the coordinates of the target image, e is an irrational number, sigma₁Is a constant, σ₁＝16；

For any key point, the gradient magnitude is expressed as:

wherein m (x, y) is the gradient amplitude, and L (x, y) is the scale space;

the generating unit 525 is configured to perform a keypoint description on the keypoint by using a descriptor of a multidimensional vector to generate a second list, specifically, block a pixel region around the keypoint, calculate a fast internal gradient histogram, and generate a unique vector, where the vector is an abstract representation of image information in the region.

As shown in fig. 14, for 2 × 2 blocks, the gradients of all the pixels in each block are weighted in gaussian manner, and each block finally takes 8 directions, that is, a 2 × 8-dimensional vector can be generated as shown in fig. 15, and the 2 × 8-dimensional vector is used as the mathematical description of the central keypoint.

the first module 50 performs feature matching based on the spatial scale invariance algorithm, and can detect the jitter and overlap problem of the target object more quickly, so as to obtain the number of accurate target objects quickly, thereby improving the detection efficiency.

As shown in fig. 11, in one embodiment of the present invention, the second module 60 includes: a first recording unit 61, a first extraction unit 62, a second extraction unit 63, a first difference unit 64, a first calculation unit 65, a first comparison unit 66, and a third determination unit 67.

Specifically, the first recording unit 61 is configured to record a key frame image in which a target object to be tracked and inspected disappears and a key image in which the target object reappears; the first extraction unit 62 is configured to extract position information of key points of a calibration frame of a plurality of key frame images before a key frame image in which a target object disappears; the second extraction unit 63 is configured to extract position information of key points of the calibration frames of the plurality of key frame images after the key frame image reappearing to the target object; the first difference unit 64 is configured to calculate a difference of corresponding points according to the position information of the plurality of key points extracted twice; the first calculation unit 65 is configured to calculate a mean square error value from the plurality of difference values; the first comparing unit 66 is configured to compare the mean square deviation value with a preset threshold; the third determination unit 67 is configured to update the number of the target objects when the average variance value is smaller than the threshold value, and the disappeared target object and the reappeared target object are the same target object; and when the mean square deviation value is larger than the threshold value, the disappeared target objects and the reappeared target objects are two target objects, and the number of the target objects is updated.

In this embodiment, the second module 60 performs tracking check on the quantity data of the obtained target object based on the operator variance principle of the conventional data structure, so as to solve the problem of missed selection of the preliminary detection result of the quantity of the target object, which results in the result degradation of the data fluctuation of the tracking result, on one hand, greatly reduce the time of manual work under severe conditions, and improve the work efficiency, on the other hand, enable the auditing and statistical efficiency of the target object to be more accurate, and improve the accuracy of detection.

As shown in fig. 12, in one embodiment of the present invention, the second module 60 further includes: a second recording unit 68, a reading unit 69, a second difference unit 610, a second calculating unit 611, a second comparing unit 612, and a fourth determining unit 613.

Specifically, when the detection evaluation function values of the blocked target object and the blocking target object are greater than the preset threshold value and the blocking target object does not disappear in the key frame image in which the blocked target object disappears, the second recording unit 68 records the position information of the plurality of key points of the blocking target object in the key frame image in which the blocked target object disappears; the reading unit 69 reads the position information of the plurality of key points of the occluded target object in the occluded target object reappearance key frame image; the second difference unit 610 is configured to calculate a difference of corresponding points according to the plurality of position information of the occlusion object and the plurality of position information of the occluded object; the second calculating unit 611 is configured to calculate a mean square error value according to the plurality of difference values; the second comparing unit 612 is configured to compare the mean square deviation value with a preset threshold; the fourth determining unit 613 is configured to update the number of the target objects when the mean square deviation value is smaller than the threshold value, and the shielded target object are the same target object; and when the mean square deviation value is larger than the threshold value, the shielded target object and the shielded target object are two target objects, and the number of the target objects is updated.

In this embodiment, the second module 60 tracks and checks the quantity data of the obtained target objects based on the operator variance principle of the conventional data structure, and solves the problem of multi-target coincidence of the primary detection result of the quantity of the target objects, resulting in deterioration of the recounting result, on one hand, the time for manual work under severe conditions is greatly reduced, and the work efficiency is improved, on the other hand, the auditing and statistical efficiency of the target objects are more accurate, and the detection accuracy is improved.

The second module 60 can more accurately solve the problem of jitter and overlap of the target objects based on the operator variance principle of the conventional data structure, so that the number of the target objects can be more accurately obtained, and the detection accuracy is improved.

In summary, the statistical device considers the qualitative variation factors caused by the quantitative variation of the coincidence time for the problem. The long and short time operation is separated. The data jitter of short-time superposition is obviously improved, and the data jitter is directly judged to be forgotten by a memory sequence and recounted when the data jitter disappears for a long time. The detection accuracy reaches more than 97 percent, and exceeds the precision of manual counting by 2 percent. The manual inventory period is generally three days to one week, while the statistical device data update rate is half an hour. The data timeliness is greatly improved.

In the present invention, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or order, and the term "plurality" refers to two or more unless explicitly defined otherwise. The terms "mounted," "connected," "fixed," and the like are to be construed broadly, and for example, "connected" may be a fixed connection, a removable connection, or an integral connection; "coupled" may be direct or indirect through an intermediary. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the description herein, the description of the terms "one embodiment," "some embodiments," "specific embodiments," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for counting the number of objects in a region, comprising:

acquiring an image within a region;

extracting a plurality of frame images in the image as key frame images;

performing a first tracking check on the preliminary detection result;

performing a second tracking check on the preliminary detection result;

updating the number of the target objects according to the result of the first tracking check and the result of the second tracking check to obtain a final detection result of the number of the target objects;

wherein performing a first tracking check on the preliminary detection result comprises:

and when the characteristics of the second lists of the K key frame images before and after the first list are not matched with the characteristics of the second list of the Nth key frame image, matching the characteristics of the second list of the Nth key frame image with the characteristics of the second list of the previous frame, and when the unmatched characteristics exist, judging that the unmatched characteristics are the miss-selection target objects, and updating the number of the target objects.

2. The method of claim 1, wherein the performing of feature extraction of scale invariance features on the images of the plurality of objects in each first list, and generating the second list further comprises:

3. The method of claim 2, wherein the step of extracting the scale invariance features from the images of the plurality of objects in each first list to generate the second list comprises:

screening the accurate positioning of the extreme point stable key points;

distributing stable key point direction information;

4. The method of claim 3, wherein the Gaussian difference pyramid is constructed by the following formula:

the construction formula of the scale space is as follows:

L(x，y)＝G(x，y)*I(x，y)；

wherein G (x, y) is a gaussian function, L (x, y) is a scale space, I (x, y) is a pixel value of the original image at a point x, y, x, y are coordinates of the original image, x0, y0 are coordinates of the target image, e is an irrational number, and σ 1 is a constant.

5. The method of claim 3, wherein the direction information of the stable key points is distributed according to the following formula:

where m (x, y) is the gradient magnitude and L (x, y) is the scale space.

6. The method of claim 1, wherein performing the second tracking check on the preliminary detection result comprises:

comparing the mean square deviation value with a preset threshold value;

7. The method of claim 6, wherein performing the second tracking check on the preliminary detection result further comprises:

8. The method of claim 6, wherein the mean square error value is calculated by the following formula:

where σ 2 is the mean square error, n is the number of keypoints, xi is the ith difference, and the mean of μ differences.

9. The statistical method of the number of objects in a region according to claim 6,

the key points of the calibration frame include: the center point of the calibration frame, the center point of each frame of the calibration frame and each corner point of the calibration frame.

10. A device for counting the number of objects in a region, comprising:

the acquisition module is used for acquiring images in the region;

a first module for performing a first tracking check on the preliminary detection result;

a second module for performing a second tracking check on the preliminary detection result; and

the result module updates the number of the target objects according to the result of the first tracking check and the result of the second tracking check to obtain the final detection result of the number of the target objects;

wherein the first module comprises:

the first matching unit is used for matching the features in the second list of the Nth key frame image to be tracked and checked with the features in the second lists of the K key frame images before and after the N key frame images; and

11. The device for counting the number of objects in a region according to claim 10, wherein the first module further comprises:

the second matching unit is used for matching the features in the second list of the plurality of key frame images within the set time; and

12. The device for counting the number of objects in a region according to claim 11, wherein the feature unit comprises:

the distribution subunit is used for stabilizing the distribution of the direction information of the key points; and

13. The apparatus for counting the number of objects in a region according to claim 12, wherein the gaussian difference pyramid is constructed by the following formula:

the construction formula of the scale space is as follows:

L(x，y)＝G(x，y)*I(x，y)；

14. The apparatus for counting the number of objects in a region according to claim 12, wherein the formula for the distribution of the direction information of the stable key points is:

where m (x, y) is the gradient magnitude and L (x, y) is the scale space.

15. The device for counting the number of objects in a region according to claim 10, wherein the second module comprises:

the first comparison unit is used for comparing the mean square deviation value with a preset threshold value; and

16. The device for counting the number of objects in a region according to claim 15, wherein the second module further comprises:

the second comparison unit is used for comparing the mean square deviation value with a preset threshold value; and

17. The apparatus for counting the number of objects in a region according to claim 15, wherein the mean square error value is calculated by:

18. The device for counting the number of objects in a region according to claim 15, wherein the key points of the calibration frame comprise: the center point of the calibration frame, the center point of each frame of the calibration frame and each corner point of the calibration frame.

19. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for counting the number of objects in the area according to any one of claims 1 to 9.

20. A human-computer interaction device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for counting the number of objects in the area according to any one of claims 1 to 9 when executing the program.