CN111597980B

CN111597980B - Target object clustering method and device

Info

Publication number: CN111597980B
Application number: CN202010408248.7A
Authority: CN
Inventors: 张修宝; 沈海峰
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2023-04-28
Anticipated expiration: 2038-12-17
Also published as: CN110781710A; CN110781710B; CN111597979B; CN111597979A; CN111597980A

Abstract

The application relates to the technical field of image processing, in particular to a target object clustering method and device, wherein the method comprises the following steps: acquiring a monitoring video, intercepting a sub-image comprising target object information from an image of the monitoring video, and recording a frame number of each sub-image; then, extracting the feature vector of each sub-image, and dividing the sub-images comprising the same target object information into the same class set based on the extracted feature vector of each sub-image; and finally, determining a class set to be adjusted based on the frame numbers of the sub-images included in the classified class sets, and adjusting the sub-images included in the class set to be adjusted. By the method, the accuracy of target object clustering can be improved.

Description

Target object clustering method and device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a target object clustering method and apparatus.

The present application is a divisional application of patent application number 201811544296.8.

Background

In the application fields of video monitoring, security protection, unmanned driving, etc., detection of a target object in a monitoring video is generally involved, where the target object is, for example, a pedestrian or a vehicle appearing in the monitoring video. Specifically, in some specific application scenarios, for example, when determining the activity condition of the target object in the area monitored by the monitoring video, it is required to screen all the images of the target object from the monitoring video, which involves clustering the images of the target object in the monitoring video.

In the prior art, the common clustering algorithms mainly comprise K-means, KD-tree and the like, but the clustering algorithms usually need to determine a class threshold so as to determine that the target objects need to be clustered into several classes, but when the target objects in the monitoring video are clustered, it is often difficult to determine how many classes the target objects need to be clustered into, so that a large error may occur in the selection of the class threshold, and the clustering accuracy is low.

Disclosure of Invention

In view of this, the embodiments of the present application provide a target object clustering method and apparatus to improve the accuracy of clustering in the target object detection process.

Mainly comprises the following aspects:

in a first aspect, an embodiment of the present application provides a target object clustering method, including:

acquiring a monitoring video, intercepting a sub-image comprising target object information from an image of the monitoring video, and recording a frame number of each sub-image;

extracting a feature vector of each sub-image, and dividing sub-images comprising the same target object information into the same class set based on the extracted feature vector of each sub-image;

and determining a class set to be adjusted based on the frame numbers of the sub-images included in the classified class sets, and adjusting the sub-images included in the class set to be adjusted.

In a possible implementation manner, the dividing the sub-images including the same target object information into the same category set based on the extracted feature vector of each sub-image includes:

dividing an ith sub-image to be clustered into a kth class set aiming at the ith sub-image to be clustered; i, k is a positive integer;

selecting sub-images meeting the clustering condition of the kth category set from sub-images to be clustered except the ith sub-image, and dividing the sub-images into the kth category set.

In a possible implementation manner, the selecting sub-images meeting the clustering condition of the kth category set from the sub-images to be clustered except the ith sub-image and dividing the sub-images into the kth category set includes:

sequentially selecting sub-images from the sub-images to be clustered except the ith sub-image, and executing a first clustering process until all sub-images to be clustered are traversed; wherein the first clustering process comprises:

calculating a first feature similarity between the feature vector of the selected sub-image and the feature vector of the ith sub-image;

when the fact that the first feature similarity corresponding to the selected jth sub-image is larger than a first set threshold value is determined, dividing the jth sub-image into the kth class set;

Taking the sub-images selected after the jth sub-image as residual sub-images, and calculating a second feature similarity between the feature vector of each residual sub-image and the feature vector of any sub-image in the kth class set;

and dividing the residual sub-images with the second feature similarity larger than the first set threshold value into the kth category set based on the second feature similarity corresponding to each residual sub-image.

sequentially selecting sub-images to be clustered from the intercepted sub-images, taking the selected sub-images as a clustering center, and executing a second clustering process until all the sub-images to be clustered are traversed; wherein the second aggregation process comprises:

calculating a third feature similarity between the feature vector of the selected sub-image and the feature vector of each sub-image to be clustered except the selected sub-image;

screening sub-images to be clustered, of which the corresponding third feature similarity is larger than a first set threshold value, based on the third feature similarity corresponding to each sub-image to be clustered;

And dividing the selected sub-images and the sub-images to be clustered, of which the corresponding third feature similarity is larger than a first set threshold value, into the same class set.

In a possible implementation manner, the determining the category set to be adjusted based on the frame numbers of the sub-images included in the divided category sets includes:

determining the maximum frame number and the minimum frame number of the sub-images in each divided class set;

detecting whether intermediate frame numbers between the maximum frame number and the minimum frame number respectively corresponding to each divided class set are continuous or not;

and determining the class set with discontinuous intermediate frame numbers as the class set to be adjusted.

In a possible implementation manner, adjusting the sub-image included in the category set to be adjusted includes:

aiming at an nth class set to be adjusted, determining a missing intermediate frame number in the nth class set to be adjusted;

determining first candidate sub-images matched with the missing intermediate frame numbers in other category sets except the nth category set to be adjusted; determining a first reference sub-image matched with a frame number adjacent to the missing intermediate frame number in the nth class set to be adjusted;

Calculating a fourth feature similarity between the feature vector of the first reference sub-image and the feature vector of each first candidate sub-image;

screening out first candidate sub-images with the corresponding fourth feature similarity larger than a second set threshold value based on the fourth feature similarity corresponding to each first reference sub-image;

and dividing the screened first candidate sub-image into the nth class set to be adjusted.

In a possible implementation manner, determining a first candidate sub-image matched with the missing intermediate frame number in other category sets except the nth category set to be adjusted includes:

screening a first candidate class set of which the frame number of the sub-image is the missing intermediate frame number from the other class sets;

and determining a first candidate sub-image of which the frame number included in the first candidate class set is the missing intermediate frame number.

In a possible implementation manner, determining the class set to be adjusted based on the frame numbers of the sub-images included in the divided class sets includes:

determining a maximum frame number of a sub-image in the nth category set and a second reference sub-image matched with the maximum frame number aiming at the nth category set;

Determining a second candidate sub-image matched with the next frame number of the maximum frame number in other category sets except the nth category set;

calculating a fourth feature similarity between the feature vector of the second reference sub-image and the feature vector of each second candidate sub-image;

and when the second candidate sub-image with the fourth feature similarity larger than the second set threshold exists, determining a second candidate category set in which the second candidate sub-image with the fourth feature similarity larger than the second set threshold exists and the nth category set as a category set to be adjusted.

In a possible implementation manner, the adjusting the sub-image included in the category set to be adjusted includes:

and dividing a second candidate sub-image with the fourth characteristic similarity larger than the second set threshold value, which is included in the second candidate category set, into the nth category set.

determining a minimum frame number of a sub-image in the nth category set and a third reference sub-image matched with the minimum frame number aiming at the nth category set;

Determining a third candidate sub-image matched with the last frame number of the minimum frame number in other category sets except the nth category set;

calculating a fourth feature similarity between the feature vector of the third reference sub-image and the feature vector of each third candidate sub-image;

and when the third candidate sub-image with the fourth characteristic similarity larger than the second set threshold exists, determining a third candidate category set in which the third candidate sub-image with the fourth characteristic similarity larger than the second set threshold exists and the nth category set as a category set to be adjusted.

and dividing the third candidate sub-image with the fourth characteristic similarity larger than the second set threshold value, which is included in the third candidate category set, into the nth category set.

In a second aspect, an embodiment of the present application provides a target object clustering apparatus, including:

the acquisition module is used for acquiring a monitoring video, intercepting sub-images comprising target object information from the images of the monitoring video and recording the frame number of each sub-image;

The dividing module is used for extracting the characteristic vector of each sub-image and dividing the sub-images comprising the same target object information into the same class set based on the extracted characteristic vector of each sub-image;

the adjustment module is used for determining a category set to be adjusted based on the frame numbers of the sub-images included in the divided category sets, and adjusting the sub-images included in the category set to be adjusted.

In one possible design, the dividing module is specifically configured to, when dividing sub-images including the same target object information into the same class set based on the feature vector of each extracted sub-image:

In a possible design, the dividing module is specifically configured to, when selecting, from sub-images to be clustered except for the ith sub-image, a sub-image that meets a clustering condition of the kth category set and dividing the sub-image into the kth category set:

In one possible design, the adjustment module is specifically configured to, when determining the class set to be adjusted based on the frame numbers of the sub-images included in the divided class sets:

In a possible design, the adjusting module is specifically configured to, when adjusting the sub-images included in the category set to be adjusted:

In one possible design, the adjustment module, when determining the first candidate sub-image matching the missing intermediate frame number in the other class set except the nth class set to be adjusted, is specifically configured to:

In a third aspect, embodiments of the present application further provide an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the target object clustering method of the first aspect, or any one of the possible implementation manners of the first aspect.

In a fourth aspect, the embodiments of the present application further provide a computer readable storage medium, on which a computer program is stored, which when executed by a processor performs the steps of the target object clustering method described in the first aspect, or any possible implementation manner of the first aspect.

According to the target object clustering method and device, the sub-images comprising target object information are intercepted from the images of the monitoring video, and then when the sub-images comprising the target object information are clustered, the sub-images comprising the same target object information are divided into the same class set based on the feature vector of each extracted sub-image, and further, the sub-images comprising the class set can be adjusted by using the frame numbers of the sub-images comprising each class set. According to the method, the sub-images are coarsely clustered by utilizing the feature vectors of the sub-images, and then the sub-images included in the category set are adjusted according to the frame numbers of the sub-images included in the category set, so that errors generated when clustering is carried out only according to the feature vectors of the sub-images can be reduced, and the clustering accuracy is improved.

The foregoing objects, features and advantages of embodiments of the present application will be more readily apparent from the following detailed description of the embodiments taken in conjunction with the accompanying drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a flow diagram of a target object clustering method according to an embodiment of the present application;

FIG. 2 shows a schematic flow chart of a first clustering process provided in an embodiment of the present application;

FIG. 3 is a schematic flow diagram of a second aggregation process according to an embodiment of the present application;

FIG. 4 illustrates a flow chart of a method for determining a set of categories to be adjusted provided by an embodiment of the present application;

FIG. 5 is a flowchart of a method for adjusting missing intermediate frame numbers in a class set to be adjusted according to an embodiment of the present disclosure;

FIG. 6 illustrates a flowchart of another method for determining a set of categories to be adjusted provided by embodiments of the present application;

FIG. 7 illustrates a flowchart of another method for determining a set of categories to be adjusted provided by embodiments of the present application;

fig. 8 shows a schematic architecture diagram of a target object clustering device 800 according to an embodiment of the present application;

fig. 9 shows a schematic structural diagram of an electronic device 900 according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. The following detailed description of embodiments of the present application is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.

First, application scenarios applicable to the present application will be described. The method and the device can be applied to application scenes such as determining the activity condition of the target object in the area monitored by the monitoring video according to the monitoring video. By way of example, the target object is, for example, a pedestrian, a vehicle, or the like, and by recognizing an image of the pedestrian or the vehicle present in the monitoring video, the moving track of the pedestrian or the vehicle in the area monitored by the monitoring video, or the like, can be deduced.

It is worth noting that, because the clustering algorithms commonly used in the prior art for clustering target objects mainly comprise K-means, KD-tree and the like, the clustering algorithms often need to set the clustering class threshold in advance, however, in practical application, when the target objects in the monitoring video are clustered, it is often difficult to determine how many classes the target objects need to be classified, so that a large error may occur in the selection of the class threshold, and the clustering accuracy is low.

In view of the above problems, the present application provides a target object clustering method and apparatus, which may divide sub-images including the same target object information into different class sets according to feature vectors of the sub-images after capturing the sub-images including the target object information from a surveillance video, and may then adjust the sub-images in the class sets by using frame numbers of the sub-images included in each class set. Therefore, the clustering accuracy of the target objects appearing in the monitoring video can be improved.

The following describes in detail the technical solution provided in the present application with reference to specific embodiments.

Referring to fig. 1, a flow chart of a target object clustering method provided in an embodiment of the present application includes the following steps:

step 101, acquiring a monitoring video, intercepting sub-images comprising target object information from images of the monitoring video, and recording frame numbers of each sub-image.

The target object information may be pixel information of an area where the target object is located on an image including the target object in the monitoring video.

In a specific implementation, because not every frame image of the surveillance video contains the target object information, for example, if the target object is a pedestrian, the pedestrian appears in the acquired surveillance video only for a certain period of time, the image containing the target object information in the surveillance video can be identified and screened first, and then the sub-image containing the target object information is further intercepted from the screened image.

In one possible implementation manner, each frame of image of the monitoring video may be identified, whether the image contains the target object information may be determined, or one frame of image may be selected for identification every preset frame number, and whether the image contains the target object information may be determined. When it is determined that the target object information is contained in a certain frame of image, the image containing the target object information may be regarded as a sub-image; alternatively, the target object may be labeled in an image containing the target object information, and then the labeled portion may be cut out as a sub-image, for example, the target object may be labeled with a rectangular frame, and then the labeled rectangular frame area may be cut out as a sub-image.

Step 102, extracting the feature vector of each sub-image, and dividing the sub-images comprising the same target object information into the same class set based on the extracted feature vector of each sub-image.

In an embodiment of the present application, when dividing sub-images including the same target object information into the same class set based on the feature vector of each extracted sub-image, taking the ith sub-image to be clustered as an example, the ith sub-image may be divided into the kth class set, where i and k are positive integers; then selecting sub-images meeting the clustering condition of the kth category set from sub-images to be clustered except the ith sub-image, and dividing the sub-images into the kth category set.

For example, if the sub-images to be clustered are the 1 st sub-image, the 2 nd sub-image, the 3 rd sub-image, the 4 th sub-image, and the 5 th sub-image; then the 1 st sub-image may be divided into the 1 st class set, and then sub-images conforming to the clustering condition of the 1 st class set are selected from the 2 nd sub-image, the 3 rd sub-image, the 4 th sub-image, and the 5 th sub-image and divided into the 1 st class set.

Specifically, when selecting sub-images meeting the clustering condition of the kth category set from sub-images to be clustered except the ith sub-image and dividing the sub-images into the kth category set, sub-images can be sequentially selected from the sub-images to be clustered except the ith sub-image, and the first clustering process is executed until all the sub-images to be clustered are traversed. The first clustering process is shown in fig. 2, and includes the following steps:

Step 201, calculating a first feature similarity between the feature vector of the selected sub-image and the feature vector of the ith sub-image.

The feature vector of the selected sub-image may reflect the feature of the target object included in the selected sub-image, the feature vector of the i-th sub-image may reflect the feature of the target object included in the i-th sub-image, and the similarity between the target object included in the selected sub-image and the target object included in the i-th sub-image may be obtained by calculating the first feature similarity between the feature vector of the selected sub-image and the feature vector of the i-th sub-image.

In an example, when calculating the first feature similarity, for example, a cosine similarity between the feature vector of the selected sub-image and the feature vector of the i-th sub-image may be calculated, where the method for calculating the cosine similarity may be as follows:

wherein A is _i Feature vector representing the ith sub-image, B _i And (3) representing the feature vector of the selected sub-image, n representing the number of feature values contained in the feature vector, and theta representing the included angle between the feature vector of the selected sub-image and the feature vector of the ith sub-image.

After calculating the value of cos θ according to the above formula, the cosine value may be used to characterize the first feature similarity between the feature vector of the selected sub-image and the feature vector of the i-th sub-image, and the smaller θ, the closer to 1 the cos θ, indicating that the two feature vectors are more similar.

In still another example, the euclidean distance between the feature vector of the selected sub-image and the feature vector of the i-th sub-image may be calculated, and the calculated euclidean distance is used as the first feature similarity between the feature vector of the selected sub-image and the feature vector of the i-th sub-image, and the specific method for calculating the euclidean distance is not described herein.

Step 202, dividing the j sub-images into a k class set when determining that the first feature similarity corresponding to the selected j sub-images is greater than a first set threshold.

In specific implementation, the jth sub-image is set as any sub-image in the sub-images to be clustered except the ith sub-image, and when the first feature similarity between the feature vector of the jth sub-image and the feature vector of the ith sub-image is calculated to be larger than a first set threshold, the jth sub-image is divided into a class set where the ith sub-image is located.

And 203, taking the sub-image selected after the jth sub-image as a residual sub-image, and calculating the second feature similarity between the feature vector of each residual sub-image and the feature vector of any sub-image in the kth class set.

For example, if the mth sub-image is any sub-image of the sub-images to be clustered except for the ith sub-image and the jth sub-image, a second feature similarity between the feature vector of the mth sub-image and the feature vector of the ith sub-image may be calculated, or a second feature similarity between the feature vector of the mth sub-image and the feature vector of the jth sub-image may be calculated, and in practical application, when the kth class set further includes other sub-images except for the ith sub-image and the jth sub-image, the feature vector of the mth sub-image and the feature vector of any one of the other sub-images may be calculated, and then, according to the calculated second feature similarity, it may be determined whether to divide the mth sub-image into the kth class set.

In a possible implementation manner, the method for calculating the second feature similarity may be the same as the method for calculating the first feature similarity, which will not be described in detail herein; or, the first feature similarity adopts a cosine similarity calculation method, the second feature similarity adopts a euclidean distance calculation method or calculates the similarity through a hash algorithm, and specifically, the euclidean distance calculation method and the similarity calculation method through the hash algorithm are not expanded.

And 204, dividing the remaining sub-images with the second feature similarity larger than the first set threshold value into a kth category set based on the second feature similarity corresponding to each remaining sub-image.

Specifically, a second feature similarity between the feature vector of each sub-image except the ith sub-image and the jth sub-image and the feature vector of the ith sub-image or the feature vector of the jth sub-image in the sub-images to be clustered can be calculated, and sub-images with the second feature similarity larger than the first set threshold value are divided into kth class sets.

For example, after the ith sub-image and the jth sub-image are divided into the kth class set, the sub-images to be clustered further include an a-th sub-image, a b-th sub-image, a c-th sub-image, and a d-th sub-image, then a second feature similarity between the feature vector of the a-th sub-image, the b-th sub-image, the c-th sub-image, and the d-th sub-image and the ith sub-image or the jth sub-image may be calculated, respectively, and when the second feature similarity between the feature vector of the a-th sub-image and the feature vector of the ith sub-image is greater than a first set threshold, the a-th sub-image is divided into the kth class set; and when the feature vector of the b sub-image and the feature vector of the i sub-image and the second feature similarity between the feature vector of the b sub-image and the feature vector of the j sub-image are smaller than the first set threshold, not dividing the b sub-image into the k class set.

The first clustering process described above is exemplarily described below in connection with a specific implementation scenario.

Setting a first set threshold value as H, taking a target object as a pedestrian in the monitoring video, selecting a first pedestrian as one pedestrian of the category set 1, calculating the first feature similarity between each pedestrian to be clustered and the selected first pedestrian for all pedestrians to be clustered in the monitoring video, until the first feature similarity between the second pedestrian of all pedestrians to be clustered and the first pedestrian is calculated to be larger than the first set threshold value H, dividing the second pedestrian into the category set 1, wherein the category set 1 comprises the first pedestrian and the second pedestrian, calculating the second feature similarity between each remaining pedestrian to be clustered and the first pedestrian or the second pedestrian, and dividing the N-th pedestrian into the category set 1 when the second feature similarity between the N-th pedestrian and the first pedestrian or the second pedestrian is calculated to be larger than the first set threshold value H, and so on until all the remaining pedestrians to be clustered are traversed, thereby completing the clustering of the category set 1.

In another embodiment of the present application, when dividing sub-images including the same target object information into the same class set based on the feature vector of each extracted sub-image, the sub-images to be clustered may be sequentially selected from the intercepted sub-images, and the second clustering process is performed by using the selected sub-images as the clustering center until all the sub-images to be clustered are traversed. Wherein, the second aggregation process may be the method shown in fig. 3, including the following steps:

Step 301, calculating a third feature similarity between the feature vector of the selected sub-image and the feature vector of each sub-image to be clustered except the selected sub-image.

Step 302, screening out sub-images to be clustered with the corresponding third feature similarity larger than a first set threshold based on the third feature similarity corresponding to each sub-image to be clustered.

Step 303, dividing the selected sub-images and the sub-images to be clustered, of which the corresponding third feature similarity is greater than the first set threshold, into the same class set.

In specific implementation, the calculation manner of the third feature similarity may be the same as the calculation manner of the first feature similarity and/or the second feature similarity, which will not be described herein.

The two clustering processes shown above may lead to inaccurate classification of the class set for sub-images with varying feature vectors. For example, in a scene where a pedestrian passes through a road, the feature similarity between the feature vector of a sub-image containing pedestrian information captured when no vehicle obstructs the pedestrian and the feature vector of a sub-image containing pedestrian information captured when the vehicle obstructs the pedestrian may be small, in which case, different sub-images containing the same pedestrian feature may be easily classified into different class sets, resulting in some of the class sets missing some images of pedestrians originally belonging to the present class set, and some of the class sets having more pedestrian images not belonging to the present class set.

In view of the foregoing, the present application further provides a scheme for adjusting sub-images included in a category set based on a frame number of the sub-image, with reference to the following step 103:

step 103, determining a category set to be adjusted based on the frame numbers of the sub-images included in the divided category sets, and adjusting the sub-images included in the category set to be adjusted.

Considering that the moving track of the target object appearing in the monitoring video is always continuous, that is, the frame numbers of the sub-images containing the target object information are continuous, the target object information is not basically contained in a certain sub-image in the middle of the continuous sub-images, based on the characteristic of the moving continuity of the target object, the category set to be adjusted is determined by judging whether the frame numbers of the sub-images contained in the category set are continuous, and the sub-images contained in the category set to be adjusted are adjusted.

The following list two specific ways of determining the set of categories to be adjusted and adjusting the sub-images included in the set of categories to be adjusted:

in case one, detecting whether the sub-image of the intermediate frame number in the class set is missing.

The set of categories to be adjusted may be determined according to the method shown in fig. 4, comprising the steps of:

Step 401, determining the maximum frame number and the minimum frame number of the sub-images in each divided class set.

Step 402, detecting whether intermediate frame numbers between the maximum frame number and the minimum frame number respectively corresponding to the divided class sets are continuous.

Step 403, determining the discontinuous class set of the intermediate frame number as the class set to be adjusted.

For example, if the frame numbers of the sub-images included in the category set a are 1,2,3,4,5,6,7,8, the maximum frame number is 8, the minimum frame number is 1, and the intermediate frame number between the maximum frame number and the minimum frame number is a continuous frame number, the category set a is a category set that does not need adjustment; if the sub-images included in the class set B have frame numbers 1,2,3,4,6,7,8, the maximum frame number is 8, the minimum frame number is 1, and the intermediate frame numbers between the maximum frame number and the minimum frame number are not consecutive frame numbers, the class set B is the class set to be adjusted.

Further, the method shown in fig. 5 may be used to adjust the missing intermediate frame number in the class set to be adjusted, including the following steps:

step 501, determining, for the nth category set to be adjusted, a missing intermediate frame number in the nth category set to be adjusted.

The class set to be adjusted is a discontinuous set of frame numbers of the sub-images, and the missing intermediate frame numbers in the class set to be adjusted can be determined according to the frame numbers of the sub-images.

For example, if the frame number of the sub-image included in the class set to be adjusted is 1,2,3,4,6,7,8, the missing intermediate frame number is 5.

Step 502, determining a first candidate sub-image matched with the missing intermediate frame number in other category sets except the nth category set to be adjusted; and determining a first reference sub-image matched with a frame number adjacent to the missing intermediate frame number in the nth class set to be adjusted.

Specifically, a first candidate class set with a frame number of the sub-image being a missing intermediate frame number may be selected from other class sets, and then a first candidate sub-image with a frame number of the missing intermediate frame number included in the first candidate class set is determined.

For example, if the frame number missing in the nth to-be-adjusted class set is 6, determining that the class set including the sub-image with the frame number of 6 in the other class sets is the first candidate class set, and then screening the sub-image with the frame number of 6 from the first candidate class set to determine the sub-image as the first candidate sub-image.

In a possible implementation manner, if the missing intermediate frame number is x, the (x+1) th sub-image in the nth class set to be adjusted can be used as the first reference sub-image, the (x-1) th sub-image can be used as the first reference sub-image, or the (x+1) th sub-image and the (x-1) th sub-image can be used as the first reference sub-image at the same time.

Step 503, calculating a fourth feature similarity between the feature vector of the first reference sub-image and the feature vector of each first candidate sub-image.

The calculation method of the fourth feature similarity is the same as any one of the first feature similarity, the second feature similarity, and the third feature similarity, and will not be described herein.

Specifically, if the first reference sub-image is a sub-image, a fourth feature similarity between the feature vector of the first reference sub-image and the feature vector of each first candidate sub-image may be calculated; if the number of the sub-images contained in the first reference sub-image is greater than 1, calculating a fourth feature similarity between the feature vector of each sub-image in the first reference sub-image and the feature vector of each first candidate sub-image.

Step 504, screening out first candidate sub-images with the fourth feature similarity greater than a second set threshold based on the fourth feature similarity corresponding to each first reference sub-image.

Step 505, dividing the screened first candidate sub-image into an nth class set to be adjusted.

In a possible implementation manner, the calculated fourth feature similarities are smaller than or equal to the second set threshold, in which case the nth set of categories to be adjusted is not adjusted. For example, for a surveillance video in which a pedestrian passes through a road, the pedestrian is completely blocked by the vehicle for a long period of time, in which case, feature information of the pedestrian when completely blocked by the vehicle cannot be acquired in the surveillance video, and thus a discontinuity in frame numbers in a category set for describing the feature information of the pedestrian may be caused. In this way, sub-images corresponding to the frame numbers missing from the nth class set to be adjusted can be screened from other class sets, and the clustering accuracy is improved.

In the second case, the frame numbers between the maximum frame number and the minimum frame number in the category set are continuous, but the feature information after the maximum frame number or the feature information before the minimum frame number is lost, and in this case, the category set needs to be adjusted.

In a possible implementation manner, when determining whether to lose the feature information after the maximum frame number, the method shown in fig. 6 may determine, based on the frame numbers of the sub-images included in each divided category set, a category set to be adjusted, including the following steps:

Step 601, for the nth category set of the division, determining the maximum frame number of the sub-image in the nth category set and the second reference sub-image matched with the maximum frame number.

Step 602, determining a second candidate sub-image matched with the next frame number of the maximum frame number in other category sets except the nth category set.

Illustratively, if the maximum frame number in the nth class set is x, then the (x+1) th sub-image is determined to be the second reference sub-image in the other class set.

Step 603, calculating a fourth feature similarity between the feature vector of the second reference sub-image and the feature vector of each second candidate sub-image.

And step 604, when there is a second candidate sub-image with the fourth feature similarity greater than the second set threshold, determining a second candidate category set in which the second candidate sub-image with the fourth feature similarity greater than the second set threshold is located and an nth category set as a category set to be adjusted.

According to the method shown in fig. 6, after the class set to be adjusted is determined, the second candidate sub-image, where the fourth feature similarity included in the second candidate class set is greater than the second set threshold, may be divided into the nth class set, so as to implement adjustment on the nth class set.

In another possible implementation manner, when determining whether to lose the feature information before the minimum frame number, the method shown in fig. 7 may determine, based on the frame numbers of the sub-images included in the divided class sets, a class set to be adjusted, including the following steps:

step 701, determining, for the nth category set of the partition, a minimum frame number of the sub-image in the nth category set and a third reference sub-image matching the minimum frame number.

Step 702, determining a third candidate sub-image matching with the last frame number of the minimum frame number in other category sets except the nth category set.

Step 703, calculating a fourth feature similarity between the feature vector of the third reference sub-image and the feature vector of each third candidate sub-image.

And step 704, when a third candidate sub-image with the fourth feature similarity larger than the second set threshold exists, determining a third candidate category set in which the third candidate sub-image with the fourth feature similarity larger than the second set threshold exists and an nth category set as a category set to be adjusted.

According to the method shown in fig. 7, after the class set to be adjusted is determined, a third candidate sub-image, in which the fourth feature similarity included in the third candidate class set is greater than the second set threshold, may be divided into an nth class set, so as to implement adjustment of the nth class set.

By the method provided by the embodiment, the sub-images included in the category set can be adjusted according to the frame numbers of the sub-images included in the category set, so that the error of clustering only according to the feature similarity among the feature vectors of the sub-images can be reduced, and the clustering accuracy is improved.

In the above embodiment, by capturing the sub-image including the target object information from the image of the surveillance video and then clustering the sub-images including the target object information from the sub-images, the sub-images including the same target object information are first divided into the same class set based on the feature vector of each extracted sub-image, and further, the sub-images included in the class set may be adjusted by using the frame number of the sub-image included in each class set. According to the method, the sub-images are coarsely clustered by utilizing the feature vectors of the sub-images, and then the sub-images included in the category set are adjusted according to the frame numbers of the sub-images included in the category set, so that errors generated when clustering is carried out only according to the feature vectors of the sub-images can be reduced, and the clustering accuracy is improved.

Referring to fig. 8, an architecture diagram of a target object clustering device 800 provided in an embodiment of the present application includes an obtaining module 801, a dividing module 802, and an adjusting module 803, specifically:

An obtaining module 801, configured to obtain a surveillance video, intercept sub-images including target object information from an image of the surveillance video, and record a frame number of each sub-image;

a dividing module 802, configured to extract a feature vector of each sub-image, and divide sub-images including the same target object information into the same class set based on the extracted feature vector of each sub-image;

an adjustment module 803 is configured to determine a category set to be adjusted based on frame numbers of sub-images included in the divided category sets, and adjust sub-images included in the category set to be adjusted.

In a possible design, the dividing module 802 is specifically configured to, when dividing sub-images including the same target object information into the same class set based on the feature vector of each extracted sub-image:

In a possible design, the dividing module 802 is specifically configured to, when selecting, from sub-images to be clustered except for the ith sub-image, a sub-image that meets a clustering condition of the kth category set and dividing the sub-image into the kth category set:

In a possible design, the adjusting module 803 is specifically configured to, when determining the class set to be adjusted based on the frame numbers of the sub-images included in the divided class set:

In a possible design, the adjusting module 803 is specifically configured to, when adjusting the sub-images included in the category set to be adjusted:

In a possible design, the adjusting module 803 is specifically configured to, when determining the first candidate sub-image matching the missing intermediate frame number in the other class set except the nth class set to be adjusted:

The target object clustering device intercepts sub-images comprising target object information from the images of the monitoring video, and then when clustering is carried out on the sub-images comprising target object information, the sub-images comprising the same target object information are firstly divided into the same class set based on the extracted feature vector of each sub-image, and further, the sub-images comprising the class set can be adjusted by utilizing the frame numbers of the sub-images comprising each class set. According to the method, the sub-images are coarsely clustered by utilizing the feature vectors of the sub-images, and then the sub-images included in the category set are adjusted according to the frame numbers of the sub-images included in the category set, so that errors generated when clustering is carried out only according to the feature vectors of the sub-images can be reduced, and the clustering accuracy is improved.

Based on the same technical concept, the embodiment of the application also provides electronic equipment. Referring to fig. 9, a schematic structural diagram of an electronic device 900 according to an embodiment of the present application includes a processor 901, a memory 902, and a bus 903. The memory 902 is configured to store execution instructions, including a memory 9021 and an external memory 9022; the memory 9021 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 901 and data exchanged with an external memory 9022 such as a hard disk, the processor 901 exchanges data with the external memory 9022 through the memory 9021, and when the electronic device 900 is operated, the processor 901 and the memory 902 communicate through the bus 903, so that the processor 901 executes the following instructions:

The specific processing flow of the processor 901 may refer to the descriptions of the above method embodiments, and will not be described herein.

Based on the same technical concept, the embodiments of the present application also provide a computer readable storage medium, on which a computer program is stored, which when executed by a processor performs the steps of the above-described target object clustering method.

Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when the computer program on the storage medium is executed, the target object clustering method can be executed, so as to improve the accuracy of target object clustering.

Based on the same technical concept, the embodiments of the present application further provide a computer program product, which includes a computer readable storage medium storing program code, where instructions included in the program code may be used to execute the steps of the target object clustering method, and specific implementation may refer to the foregoing method embodiments and will not be described herein.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes or substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for clustering target objects, comprising:

determining a class set to be adjusted based on frame numbers of sub-images included in each classified class set, and adjusting the sub-images included in the class set to be adjusted;

determining a category set to be adjusted based on frame numbers of sub-images included in the divided category sets, including:

When a third candidate sub-image with the fourth feature similarity larger than a second set threshold exists, determining a third candidate category set in which the third candidate sub-image with the fourth feature similarity larger than the second set threshold exists and the nth category set as a category set to be adjusted;

the adjusting the sub-image included in the category set to be adjusted includes:

2. The method of claim 1, wherein the dividing sub-images including the same target object information into the same class set based on the extracted feature vector of each sub-image comprises:

3. The method of claim 2, wherein selecting sub-images conforming to the clustering condition of the kth class set from sub-images to be clustered other than the ith sub-image and dividing into the kth class set comprises:

4. The method of claim 1, wherein the dividing sub-images including the same target object information into the same class set based on the extracted feature vector of each sub-image comprises:

5. A target object clustering device, comprising:

The adjustment module is used for determining a class set to be adjusted based on the frame numbers of the sub-images included in the divided class sets and adjusting the sub-images included in the class set to be adjusted;

the adjustment module is specifically configured to, when determining a class set to be adjusted based on frame numbers of sub-images included in the divided class sets:

the adjustment module is specifically configured to, when adjusting the sub-images included in the category set to be adjusted:

6. The apparatus of claim 5, wherein the partitioning module is specifically configured to, when partitioning sub-images including the same target object information into the same class set based on the extracted feature vector of each sub-image:

7. The apparatus of claim 6, wherein the partitioning module, when selecting a sub-image that meets a clustering condition of the kth class set from sub-images to be clustered other than the ith sub-image and partitioning into the kth class set, is specifically configured to:

8. The apparatus of claim 5, wherein the partitioning module is specifically configured to, when partitioning sub-images including the same target object information into the same class set based on the extracted feature vector of each sub-image:

9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine readable instructions executable by the processor, the processor and the memory in communication over the bus when the electronic device is running, the processor executing the machine readable instructions to perform the steps of the target object clustering method of any one of claims 1 to 4.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the target object clustering method according to any one of claims 1 to 4.