CN110569918A

CN110569918A - sample classification method and related device

Info

Publication number: CN110569918A
Application number: CN201910873761.0A
Authority: CN
Inventors: 石楷弘; 陈志博; 王吉; 余莉萍
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2019-12-13

Abstract

the embodiment of the application provides a sample classification method and a related device, and the method can cluster pedestrian images through mutual neighbor distance, neighbor sequencing distance and absolute distance. The embodiment of the application aggregates the same people together, can generate a sample classification result according to the clustered images, has a good clustering effect, and can obtain a sample suitable for training a pedestrian re-recognition model.

Description

sample classification method and related device

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and a related apparatus for sample classification.

Background

pedestrian re-identification (Person re-ID) is a research focus of recent computer vision, given a monitored pedestrian image, the image of which is retrieved across devices. Due to the difference between different camera devices, the appearance of the pedestrian is easily influenced by wearing, size, shielding, posture, visual angle and the like, and pedestrian re-identification is a problem which has research value and is very challenging. The pedestrian re-identification technology may also be referred to as a pedestrian re-identification technology, hereinafter referred to as a ReID technology.

currently, the ReID technology is widely applied to the fields of commerce, security, transportation, finance and the like, data plays an important role in improving the performance of a ReID model, and for researchers, it is an important task to mine the identity of a tag from a large number of human body pictures. The clustering technology plays a core role in the mining process, people with the same identity are gathered together, and the subsequent manual marking amount can be reduced.

Because the face recognition technology is mature, on a face database (LFW), the accuracy of a face recognition model is higher than that of human eyes, and the clustering effect by using the face features is very good. However, the ReID technology is slower to develop than the face recognition technology, and the model accuracy is weak. In a clustering algorithm with good performance in face recognition, the clustering effect in the ReID technology is often not good, and the obtained samples often do not meet the requirements, for example, a bad file and a case that one person has multiple files, or for example, a case that the same person does not gather together. Therefore, it is difficult to obtain a proper sample by applying the clustering algorithm in the face recognition technology to the ReID technology.

Disclosure of Invention

The embodiment of the application provides a sample classification method and a related device, and solves the technical problems that in the prior art, the clustering effect is poor, and a proper sample is difficult to obtain.

in a first aspect, an embodiment of the present application provides a method for sample classification, including:

Acquiring a first to-be-processed set and a second to-be-processed set, wherein the first to-be-processed set comprises at least one first sample image, and the second to-be-processed set comprises at least one second sample image;

acquiring a first image feature vector and a second image feature vector, wherein the first image feature vector has a first corresponding relation with the first sample image, and the second image feature vector has a second corresponding relation with the second sample image;

acquiring the distance between the first to-be-processed set and the second to-be-processed set according to the first image feature vector and the second image feature vector, wherein the distance comprises an absolute distance, a mutual neighbor distance and a neighbor sorting distance;

And if the absolute distance meets a first set condition, the mutual neighbor distance meets a second set condition and the neighbor sorting distance meets a third set condition, generating sample classification results corresponding to the first to-be-processed set and the second to-be-processed set.

in a second aspect, an embodiment of the present application provides an apparatus for sample classification, including:

an obtaining unit, configured to obtain a first to-be-processed set and a second to-be-processed set, where the first to-be-processed set includes at least one first sample image, and the second to-be-processed set includes at least one second sample image;

The acquiring unit is further used for acquiring a first image feature vector and a second image feature vector, wherein the first image feature vector has a first corresponding relation with the first sample image, and the second image feature vector has a second corresponding relation with the second sample image;

A processing unit, configured to obtain, according to the first image feature vector and the second image feature vector, a distance between the first to-be-processed set and the second to-be-processed set, where the distance includes an absolute distance, a mutual neighbor distance, and a neighbor sorting distance;

The processing unit is further configured to generate sample classification results corresponding to the first to-be-processed set and the second to-be-processed set if the absolute distance satisfies a first set condition, the mutual neighbor distance satisfies a second set condition, and the neighbor sorting distance satisfies a third set condition.

in one possible design, in an implementation manner of the second aspect of the embodiment of the present application, the processing unit is further configured to:

Determining a cosine distance between the first sample image and the second sample image according to the first image feature vector and the second image feature vector;

And determining an absolute distance between the first to-be-processed set and the second to-be-processed set according to the cosine distance, wherein the absolute distance is the minimum value of the cosine distance.

acquiring a first absolute distance corresponding to the first set to be processed;

sequencing according to the magnitude of the first absolute distance to obtain a first nearest neighbor sequence corresponding to the first to-be-processed set;

acquiring a second absolute distance corresponding to the second to-be-processed set;

Obtaining a second nearest neighbor sequence corresponding to the second to-be-processed set according to the magnitude sorting of the second absolute distance;

And determining the mutual neighbor distance between the first to-be-processed set and the second to-be-processed set according to the sequence numbers of the first to-be-processed set in the second nearest neighbor sequence and the sequence numbers of the second to-be-processed set in the first nearest neighbor sequence.

Determining a neighbor ordering distance between the first to-be-processed set and the second to-be-processed set according to the first nearest neighbor sequence and the second nearest neighbor sequence.

Acquiring a target set, wherein the target set comprises the first to-be-processed set and the second to-be-processed set;

determining a single feature vector according to the first image feature vector and the second image feature vector;

Calculating the similarity between the target sets according to the single feature vectors;

And sending the target set with the similarity smaller than a set threshold value to a terminal device, so that the terminal device displays the target set.

selecting a sample image in the target set;

And sending the sample image to a terminal device, so that the terminal device displays the sample image.

obtaining marking information, wherein the marking information has an incidence relation with the target set;

and determining a secondary sample classification result corresponding to the target set according to the labeling information.

In a third aspect, an embodiment of the present application provides a server, including:

One or more than one central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;

the memory is a transient memory or a persistent memory;

The central processor is configured to communicate with the memory, and perform the following steps on the server:

in one possible design, in an implementation manner of the third aspect of the embodiment of the present application, the central processing unit is further configured to perform the following steps:

determining a first nearest neighbor sequence corresponding to the first to-be-processed set and a second nearest neighbor sequence corresponding to the second to-be-processed set according to the absolute distances, wherein the first nearest neighbor sequence is a sorting queue of the second sample images according to the cosine distances, and the second nearest neighbor sequence is a sorting queue of the first sample images according to the cosine distances;

Determining a mutual neighbor distance between the first to-be-processed set and the second to-be-processed set according to the sequence number of the first absolute sample image in the second nearest neighbor sequence and the sequence number of the first absolute sample image in the second nearest neighbor sequence.

selecting a sample image in the target set;

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored therein instructions, which, when executed on a computer, cause the computer to perform the method of the above aspects.

according to the technical scheme, the embodiment of the application has the following advantages:

the embodiment of the application provides a sample classification method and a related device, which can cluster pedestrian images through mutual neighbor distance, neighbor sequencing distance and absolute distance. The embodiment of the application aggregates the same people together, can generate a sample classification result according to the clustered images, has a good clustering effect, and can obtain a sample suitable for training a pedestrian re-recognition model.

drawings

FIG. 1 is a diagram illustrating an example of a pedestrian re-identification technique according to an embodiment of the present disclosure;

FIG. 2 is an exemplary diagram of outliers in an embodiment of the present application;

FIG. 3 is a schematic diagram of a sample classification method according to an embodiment of the present disclosure;

Fig. 4 is an exemplary diagram of a first nearest neighbor sequence and a second nearest neighbor sequence in an embodiment of the present application;

FIG. 5 is a diagram illustrating an example of a server calculating a neighbor rank distance in an embodiment of the present application;

FIG. 6 is a diagram showing an example of a target set in an embodiment of the present application;

FIG. 7 is a schematic flow chart of an alternative embodiment of the present application;

FIG. 8 is a schematic flow chart of another alternative embodiment of the present application;

FIG. 9 is a diagram illustrating an example of an apparatus for sample classification in an embodiment of the present application;

Fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

the terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

in the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

For clarity and conciseness of the following descriptions of the various embodiments, a brief introduction to the related art is first given:

artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition, fingerprint recognition, pedestrian re-recognition, and the like.

the scheme provided by the embodiment of the application relates to technologies such as pedestrian re-identification of artificial intelligence and the like, and is specifically explained by the following embodiment:

fig. 1 is a diagram illustrating an example of a scenario of a pedestrian re-identification technique in an embodiment of the present application. The image capturing device may include, but is not limited to, a camera, a monitoring device, a video recorder, and the like, which is not limited in this application. The image pickup apparatus and the server are connected through a network so that the image pickup apparatus can transmit a photographed image or video to the server. After receiving the images or videos, the server may identify pedestrians in the images or videos through a pedestrian re-identification technology (hereinafter referred to as ReID technology), where the specific process is as follows:

The server acquires image or video data in the image pickup apparatus. If the server receives the video data transmitted by the camera equipment, the server can extract a plurality of frame images in a sampling mode. Then, the server may cluster the acquired images, and classify all the images corresponding to the same pedestrian into the same category. For example, if the image capturing apparatus a captures three images of the pedestrian a and the image capturing apparatus B captures three images of the pedestrian a, the server may classify the six images into the same category. And finally, the server identifies the identity of the pedestrian corresponding to each type of image according to identification algorithms such as face identification and the like.

In the above process, the clustering algorithm for clustering the images by the server plays an important role in the whole process. The currently adopted hierarchical clustering algorithm and rank order clustering algorithm are difficult to obtain proper samples, or to obtain accurately classified samples. The hierarchical clustering algorithm and rank-order clustering algorithm will be briefly introduced as follows:

The hierarchical clustering algorithm is realized in the following mode: each object is first treated as a cluster and then the clusters are merged into larger and larger clusters until all objects are in a cluster or some termination condition is met. The ReID model adopting the hierarchical clustering algorithm has weak feature expression capability, the similarity of the same person crossing a camera, the front and back of the same person and other conditions is low, the clothing similarity of different persons is high, and a large number of bad files (images of different persons are classified into the same category) and one person with multiple files (images of the same person are classified into multiple categories) can be generated by using the hierarchical clustering method. Furthermore, the termination condition is not easy to set.

The clustering method based on rank order sorting distance is widely applied to mobile phone album clustering. rank order is based on a common phenomenon: there are many shared neighbors to two faces of the same person, but the neighbors from faces of different persons are usually very different. When the rank order clustering algorithm is applied to the ReiD technology, the appearance of outliers (outlers) can enlarge the sorting distance, so that the same person is not gathered together.

Fig. 2 is an exemplary diagram of outliers in an embodiment of the present application. It can be seen that samples a, b, and c, where a and b belong to the same person, and c does not belong to the same person as a and b, and if c is very close to one of a and b and is far from the other, then c is an outlier. For example, c is the first neighbor of a and is the 155 th neighbor of b.

therefore, the clustering algorithm does not meet the requirements, and a reasonably classified sample cannot be obtained.

in view of the above, the present disclosure provides a method for sample classification to solve the above technical problems. Fig. 3 is a schematic diagram of a sample classification method according to an embodiment of the present disclosure. The method comprises the following steps:

301. acquiring a first to-be-processed set and a second to-be-processed set, wherein the first to-be-processed set comprises at least one first sample image, and the second to-be-processed set comprises at least one second sample image;

In this embodiment of the present application, a server first obtains a first to-be-processed set and a second to-be-processed set, where the first to-be-processed set and the second to-be-processed set may also be referred to as a class in a clustering algorithm, and for convenience of description, the first to-be-processed set and the second to-be-processed set are collectively referred to as an a class and a B class hereinafter. It is understood that the server may obtain not only class a and class B, but also other classes, such as class C, class D, etc. In the embodiment of the present application, for convenience of description, only the cases of class a and class B are described, and it should not be understood that the server only acquires class a and class B. Illustratively, the to-be-processed set acquired by the server includes a class a (a first to-be-processed set), a class B (a second to-be-processed set), a class C, a class D, and the like. If class A and class B can be merged, the server classifies the sample images in class A and class B into one big class. By analogy, the server can combine all classes to obtain a plurality of large classes, each large class comprises a plurality of sample images, so that aggregation of the sample images is realized, the sample images are reasonably classified, and the server can identify the identity of the pedestrian according to the reasonably classified sample images. For example, the sample images in a large category obtained by the server are all images of the pedestrian a, and when the server identifies the pedestrian from the sample images in the large category, the accuracy of identification can be improved, and the images in the large category can be identified as the images of the pedestrian a.

At least one sample image may be included in each class acquired by the server. In some embodiments, after acquiring the images transmitted by the camera device or the images determined by the video, the server processes each image as a sample image. In some embodiments, the server may process each sample image as a class, which may include only one sample image. For example, when the server acquires the sample image a and the sample image B, the server preliminarily classifies the sample image a into class a and the sample image B into class B. It is understood that, in the embodiment of the present application, the sample image in the class a is the first sample image, and the sample image in the class B is the second sample image.

302. acquiring a first image feature vector and a second image feature vector, wherein the first image feature vector has a first corresponding relation with a first sample image, and the second image feature vector has a second corresponding relation with a second sample image;

in this embodiment, the server may extract features according to the first sample image to obtain a first image feature vector. The algorithm or model for extracting features may include, but is not limited to, a Convolutional Neural Network (CNN) model, a human Pose (dose) and a skeletal keypoint (skeeleton) model, and the like, and is not limited herein. Similarly, the server may extract features according to the second sample image to obtain a second image feature vector.

it is understood that the first image feature vector and the second image feature vector are used to represent information of a pedestrian in the sample image.

303. Acquiring the distance between the first to-be-processed set and the second to-be-processed set according to the first image feature vector and the second image feature vector, wherein the distance comprises an absolute distance, a mutual neighbor distance and a neighbor sorting distance;

In the embodiment of the present application, the server needs to calculate a plurality of distances, including an absolute distance (which may also be referred to as a nearest distance), a mutual neighbor distance, and a neighbor sorting distance, which will be described in detail below.

In some embodiments, the server may calculate the absolute distance in such a way that the server first determines a cosine distance between the first sample image and the second sample image from the first image feature vector and the second image feature vector. Then, the server determines the absolute distance between the first to-be-processed set and the second to-be-processed set according to the cosine distance, wherein the absolute distance is the minimum value of the cosine distance. The method for calculating the cosine distance by the server is not particularly limited in the embodiment of the present application. The server can calculate the absolute distance by an absolute distance calculation formula, wherein the absolute distance calculation formula is as follows:

d(C_i,C_j)＝min(f(x_m,x_n))；

Wherein d (C)_i,C_j) Is C_iClass and C_jabsolute distance between classes, x_mIs C_isample image in class, x_nis C_jsample image in class, f (x)_m,x_n) Is a function of the cosine distance between the sample images.

illustratively, the class a includes a sample image a and a sample image B, the class B includes a sample image c and a sample image d, the cosine distance between the sample image a and the sample image c is 1, the cosine distance between the sample image a and the sample image d is 2, the cosine distance between the sample image B and the sample image c is 3, and the cosine distance between the sample image B and the sample image d is 4, then the server determines that the absolute distance between the class a and the class B is 1.

In some embodiments, class a has only one sample image and class B also has only one sample image, then the absolute distance between class a and class B may be the cosine distance between the two sample images.

In some embodiments, the server may also determine a first absolute sample image and a second absolute sample image associated with the absolute distance. For example, the server calculates the minimum cosine distance from the sample image a and the sample image c, and then the server uses the cosine distance as an absolute distance, the sample image a associated with the absolute distance may be used as a first absolute sample image, and the sample image c associated with the cosine distance may be used as a second absolute sample image.

in some embodiments, the server may calculate the mutual neighbor distance in a manner that the server first obtains a first absolute distance corresponding to the first to-be-processed set, and obtains a first nearest neighbor sequence corresponding to the first to-be-processed set by sorting according to the size of the first absolute distance. Then, the server acquires a second absolute distance corresponding to the second set to be processed; and sorting according to the magnitude of the second absolute distance to obtain a second nearest neighbor sequence corresponding to the second to-be-processed set. Finally, the server determines the mutual neighbor distance between the first to-be-processed set and the second to-be-processed set according to the sequence number of the first to-be-processed set in the second nearest neighbor sequence and the sequence number of the second to-be-processed set in the first nearest neighbor sequence.

for example, the to-be-processed set acquired by the server may include a class a (first to-be-processed set), a class B (second to-be-processed set), a class C, a class D, and the like. The server may then calculate the absolute distance (which may also be referred to as the first absolute distance) between class a and all classes, and rank all classes according to the absolute distance, resulting in the first nearest neighbor sequence, as shown in fig. 4. Fig. 4 is an exemplary diagram of a first nearest neighbor sequence and a second nearest neighbor sequence in an embodiment of the present application. Wherein, O_Ais the first nearest neighbor sequence, O_Bfor the second nearest neighbor sequence, the classes in fig. 4 may include class a, class B, class C, class D, and the like. For example, assuming that the server calculates that the absolute distance between class a and class B is 1, the absolute distance between class a and class B is 0.1, the absolute distance between class a and class C is 0.8, and the absolute distance between class a and class D is 0.3, the server may obtain the first nearest neighbor sequence according to the absolute distances in descending order, as shown in fig. 4. In other embodiments, the servers may be ordered in other ways, which are not limited herein.

in some embodiments, the server may calculate the mutual-neighbor distance between the classes by a mutual-neighbor distance calculation formula:

MN(C_i,C_j)＝m+n；

Wherein MN is the mutual adjacent distance, m is C_iClass C_jthe sequence number of the nearest neighbor sequence corresponding to the class, n is C_jclass C_iThe sequence number of the nearest neighbor sequence to which the class corresponds.

Illustratively, as shown in fig. 4, the first nearest neighbor sequence has the order of class a, class C, class D, and class B has a sequence number of 3 in the first nearest neighbor sequence. Similarly, if the sequence number of class a in the second nearest neighbor sequence is 5, the distance between class a and class B is 3+5, which is 8.

In this embodiment of the present application, the method for the server to calculate the neighbor rank distance may be: and determining a neighbor ordering distance between the first to-be-processed set and the second to-be-processed set according to the first nearest neighbor sequence and the second nearest neighbor sequence.

in some embodiments, the server may first compute the ordered sum (which may also be referred to as asymmetric rank order distance) in the second neighbor sequence of all classes that are closer or equal in distance to class a than class B. The server may compute by a ranking and formula that is:

wherein D (A, B) is the asymmetric rank order distance between class A and class B, O_A(B) is the sequence number of B class in the first nearest neighbor sequence, O_B(f_A(i) ) is the sequence number corresponding to class i in the second nearest neighbor sequence.

specifically, the server may calculate an absolute distance between each class and class a, and if the absolute distance between a certain class and class a is greater than or equal to the absolute distance between class a and class B, it indicates that the class is closer to class a, and the server may record the sequence number of the class in the second nearest neighbor sequence. After the server traverses all classes, the recorded sequence numbers can be added to obtain a sorted sum. Illustratively, as shown in fig. 5, fig. 5 is an exemplary diagram of a server calculating a neighbor rank distance in the embodiment of the present application. In fig. 5, class a, class C, and class D are all ranked before class B, which indicates that class a, class C, and class D are all closer to class a than class B, and it can be said that the absolute distance between class a and class B is greater than the absolute distance between class a and class B, the absolute distance between class C and class a is greater than the absolute distance between class a and class B, and the absolute distance between class D and class a is also greater than the absolute distance between class a and class B. And the absolute distance between class B and class a is equal to the absolute distance between class a and class B. Thus, the server records the sequence numbers of class a, class C, class D, and class B in the second nearest neighbor sequence, which are 5, 2, 4, and 0, respectively. Therefore, the server can calculate that 5+2+4+0 is 11. The process can be represented by the following formula:

In this embodiment of the present application, the server may further calculate to obtain an asymmetric rank order distance D (B, a) between class B and class a, and a calculation process is similar to the aforementioned process of calculating D (a, B), and details are not described here.

After the server calculates D (a, B) and D (B, a), they may be added to obtain a neighbor rank distance, where the neighbor rank distance is calculated by the following formula:

RO(A,B)＝D(A,B)+D(B,A)；

wherein, RO (A, B) is the neighbor sorting distance between class A and class B.

304. And if the absolute distance meets a first set condition, the mutual neighbor distance meets a second set condition and the neighbor sorting distance meets a third set condition, generating sample classification results corresponding to the first to-be-processed set and the second to-be-processed set.

in this embodiment, the server may preset a first setting condition, a second setting condition, and a third setting condition, so that when the absolute distance satisfies the first setting condition, the mutual neighbor distance satisfies the second setting condition, and the neighbor sorting distance satisfies the third setting condition, the server generates sample classification results corresponding to the first to-be-processed set and the second to-be-processed set. In some embodiments, the sample classification result may be that the first to-be-processed set and the second to-be-processed set are merged into one large target set, and the first sample image and the second sample image in the first to-be-processed set and the second to-be-processed set both belong to the large target set. In embodiments of the present application, a target set may also be referred to as a broad class.

illustratively, if the absolute distance between class a and class B is less than or equal to 3, the mutual neighbor distance is greater than t, and the neighbor sorting distance is less than or equal to 30, the server merges class a and class B into a new large class, i.e., the sample images in class a and class B both belong to the new large class. The conditions satisfied can be formulated as follows:

MN(C_i,C_j)<＝3||d(C_i,C_j)＞t||RO(A,B)<＝30；

t may be set according to actual needs, and this is not specifically limited in this embodiment of the present application. In some embodiments, the server sets t such that the accuracy of the sample classification reaches 97%.

Optionally, on the basis of the embodiment corresponding to fig. 3, in an optional embodiment of the present invention, after generating the sample classification results corresponding to the first to-be-processed set and the second to-be-processed set, the method further includes:

acquiring a target set, wherein the target set comprises a first to-be-processed set and a second to-be-processed set;

and sending the target set with the similarity smaller than the set threshold value to the terminal equipment, so that the terminal equipment displays the target set.

In this embodiment of the present application, the server may obtain a target set, where the target set is formed by combining the first to-be-processed set and the second to-be-processed set. It can be understood that the server may form a plurality of target sets, for example, the server first obtains class a, class B, class C, and class D, and assuming that class a and class B are combined into one target set and class C and class D are combined into one target set, the server obtains two target sets. By analogy, the server can acquire a plurality of target sets in practical application.

In some embodiments, after the server acquires the target set, the target set includes the first sample image and the second sample image, and the server may determine the single feature vector according to the first image feature vector and the second image feature vector. In some embodiments, the single feature vector is an average of the first image feature vector and the second image feature vector. Illustratively, the first image feature vector is [1,2,3], the second image feature vector is [7,8,9], and the single feature vector is [ (1+7)/2, (2+8)/2, (3+9)/2] ═ 4,5,6 ]. In some embodiments, averaging all image feature vectors in the target set may obtain a single feature vector corresponding to the target set.

The server may then calculate the similarity between the target sets from the single feature vectors. In some embodiments, the server may calculate the cosine similarity between the target sets according to the single feature vector, and the calculation method is not specifically limited herein.

in some embodiments, the server sends the target set with the similarity smaller than the set threshold to the terminal device for presentation. In other embodiments, the server ranks the similarities and presents the top 5 sets of targets with the highest similarity. In some embodiments, the server may send the content to be displayed to the terminal device with the display screen, so that the terminal device displays the content to be displayed on the display screen. The content displayed can be a vector for displaying the target set, or can be a sample image for displaying the target set.

in some embodiments, the server selects one sample image in the target set through a posture optimization algorithm and sends the selected sample image to the terminal equipment with the display screen for presentation. The posture optimization algorithm is not particularly limited in the embodiment of the present application. FIG. 6 is a diagram illustrating an example of a target set in an embodiment of the present application. The server can send the selected sample image to a terminal device with a display screen, so that the terminal device displays the sample image on the display screen. As can be seen from fig. 6, the interface includes a title bar for displaying the title of the program, a function board for displaying selectable functions, and a main interface. In the main interface, a plurality of sets of sample images are displayed, and each sample image can represent a target set. Due to the margin limitation, only 3 groups of sample images are shown in fig. 6, and in practical applications, the number of the displayed sample images is not limited, which is not specifically limited in the embodiment of the present application.

optionally, on the basis of the foregoing embodiments corresponding to fig. 3, in an optional embodiment of the embodiments of the present invention, after displaying the target set whose similarity is smaller than the set threshold, the method further includes:

acquiring marking information, wherein the marking information has an incidence relation with a target set;

In the embodiment of the application, the server can obtain the annotation information, and the annotation information has an association relation with the target set. Then, the server may merge a plurality of target sets associated with the labeling information into a new large class, so as to obtain a secondary sample classification result. For example, as shown in fig. 6, the first group of sample images in the main interface are all milk, and the worker observes that the pedestrians in the two sample images are actually the same person, and then the worker may click the "yes" virtual button at the corresponding position on the main interface. In response to the clicking operation, the terminal device corresponding to the display screen may generate annotation information, where the annotation information includes a target set identifier corresponding to the first group of sample images. Then, the terminal device may send the annotation information to the server, and the server may obtain the annotation information.

Then, the server can merge the corresponding target sets into a new large class according to the target set identifier in the annotation information. For example, as shown in fig. 6, the server may merge the target sets corresponding to the old milk into a new large class according to the target set identifier in the annotation information, where the large class includes the two target sets, and the sample images in the large class are sample images corresponding to the old milk.

Fig. 7 is a schematic flow chart of an alternative embodiment of the present application. The process can be described as:

701. acquiring a pedestrian image;

In the present embodiment, the server may acquire an image about a pedestrian (hereinafter referred to as a pedestrian image). In some embodiments, the server may obtain the pedestrian image from the camera device. In other embodiments, the server may be read from a database. Specifically, reference may be made to the description of step 301, which is not described herein again.

702. Extracting features;

in this embodiment, the server may extract features about the pedestrian from the image of the pedestrian, or extract features of the entire image from the image of the pedestrian, which is not specifically limited in this embodiment. Specifically, reference may be made to the description of step 302, which is not repeated herein.

703. MN & RO fine-grained clustering;

In this embodiment of the application, the server may cluster the pedestrian images according to the extracted features, which specifically refers to the descriptions of step 303 and step 304, and is not described herein again. It is understood that, in the embodiment of the present application, one pedestrian image may be clustered as one class.

704. Fusing the multiple features in the class into a single feature;

in this embodiment of the application, the server clusters a plurality of pedestrian images through the clustering in step 703, so as to obtain N classes, where each class includes N_iAn individual pedestrian image. Wherein each pedestrian image has corresponding features, and then n is in each class_iand (4) a feature. The server may then assign n within each class_iThe individual features are fused into a single feature. In some embodiments, the server may perform fusion through a feature fusion algorithm. In other embodiments, the server may average the plurality of features for each class as a feature characterization for that class. Each class n_ifusing the pedestrian images into 1 feature to form a new feature set F ═ F₁,f₂,f₃,...f_N}. Wherein N is an integer greater than or equal to 1, N_iIs an integer greater than or equal to 1.

705. The posture is preferred;

In the embodiment of the present application, each class obtained by the server includes n_iAn individual pedestrian image. Thus, the server can start from n_iOne pedestrian image is selected from the individual pedestrian images to represent the class. In some embodiments, the server is selected by a pose preference algorithm. In other embodiments, the server may be selected manually, and is not limited in this regard.

706. Self-retrieval;

In the embodiment of the present application, the server may determine the new feature set F according to the new feature set＝{f₁,f₂,f₃,...f_NThe similarity between the N classes is calculated. Illustratively, the server may calculate f₁and f and₂the cosine similarity between them yields a similarity between class 1 and class 2. By analogy, the server may obtain the similarities between all classes, and then rank the similarities, and select the top 5 groups of classes with the highest similarity. Illustratively, the server calculates f₁and f and₂the cosine similarity between the two is ranked as 1, the server can select f₁and f and₂Corresponding classes (i.e., class 1 and class 2).

707. sending the label;

in this embodiment of the application, the server may send the representative image corresponding to the class selected in the foregoing step 706 to a terminal device having a display screen, so that the terminal device displays the representative image, as shown in fig. 6. In other embodiments, the server may also directly send the class selected in step 706 to a terminal device having a display screen, so that the terminal device selects a representative image in the class through an algorithm similar to step 705, and then displays the representative image. This is not particularly limited in the embodiments of the present application.

708. a sample is obtained.

in this embodiment of the application, after the labels are sent in step 707, the staff selects the same pedestrian class in the interface shown in fig. 6, so that the terminal device generates the labeling information, which may specifically refer to the description of the foregoing embodiment and is not described herein again. Then, the server acquires the labeling information and further merges the N classes in the server according to the labeling information. After the merging is finished, the server can obtain samples after the classification is finished.

fig. 8 is a schematic flow chart of another alternative embodiment in the embodiment of the present application, where the flow chart includes:

801. acquiring a pedestrian image;

step 801 is similar to step 701 described above, and is not described herein again.

802. Extracting features;

Step 802 is similar to step 702, and is not described herein again.

803. MN & RO fine-grained clustering;

step 803 is similar to step 703, and will not be described herein again. It should be noted that, in the embodiment of the present application, the server obtains N classes through the steps as in step 803.

804. Fusing the multiple features in the class into a single feature;

step 804 is similar to step 704, and will not be described herein.

805. MN & RO coarse-grained clustering;

in this embodiment, the server may further cluster the aforementioned N classes. Step 805 is similar to step 803, but the setting conditions may be different. Illustratively, the setting conditions in step 803 are:

MN(C_i,C_j)<＝3||d(C_i,C_j)>t₁||RO(A,B)<＝30；

the setting conditions in step 805 may be:

MN(C_i,C_j)<＝3||d(C_i,C_j)>t₂||RO(A,B)<＝30；

Wherein, t₁and t₂Modification can be performed according to actual needs, so that P classes obtained in step 805 are P, where P is an integer greater than or equal to 0 and less than N. Illustratively, the server clusters in step 803 into 10000 sub-classes, and the server clusters in step 805 into 5000 major classes, each of which may include at least one sub-class.

806. Only one subclass is reserved for each major class;

In the embodiment of the application, the server can select one subclass from each major class to be reserved (the server can delete the subclasses which are not reserved). In some embodiments, the server sends the representative images corresponding to the subclasses to the terminal equipment with the display screen, so that the staff can select one of the subclasses as the reservation. In other embodiments, the server selects the subclass with the largest number of pedestrian images for retention by counting the number of pedestrian images in each subclass. In other embodiments, the server may first select the top 5 subclasses in which the number of pedestrian images is arranged by at least a few more by counting the number of pedestrian images in each subclass. The server may then select the 1 subclass with the largest variance from the 5 subclasses for retention. The method for selecting the reserved subclasses is not particularly limited in the embodiments of the present application.

807. A sample is obtained.

In this embodiment of the application, after the server obtains P classes according to step 805, redundant subclasses in the P classes may be removed to obtain P classes { c after simplification₁,c₂,c₃,...c_P}. Each class includes several pedestrian images.

The server has corresponding classification through the pedestrian image obtained by the embodiment of the application, so that the server can select one type of image to identify the pedestrian when identifying the pedestrian. Exemplary, c₁the pedestrian image in class is most likely an image about the old milk, so the server is in pair c₁When the pedestrian images of the class are identified, the identification rate can be good.

Fig. 9 is an exemplary diagram of an apparatus for sample classification in an embodiment of the present application, where the apparatus 900 for sample classification includes:

an obtaining unit 901, configured to obtain a first to-be-processed set and a second to-be-processed set, where the first to-be-processed set includes at least one first sample image, and the second to-be-processed set includes at least one second sample image;

the obtaining unit 901 is further configured to obtain a first image feature vector and a second image feature vector, where the first image feature vector has a first corresponding relationship with the first sample image, and the second image feature vector has a second corresponding relationship with the second sample image;

a processing unit 902, configured to obtain a distance between the first to-be-processed set and the second to-be-processed set according to the first image feature vector and the second image feature vector, where the distance includes an absolute distance, a mutual neighbor distance, and a neighbor sorting distance;

The processing unit 902 is further configured to generate sample classification results corresponding to the first to-be-processed set and the second to-be-processed set if the absolute distance satisfies the first setting condition, the mutual neighbor distance satisfies the second setting condition, and the neighbor sorting distance satisfies the third setting condition.

optionally, on the basis of the foregoing embodiments corresponding to fig. 9, in an optional embodiment of the present invention, the processing unit 902 is further configured to:

and determining the absolute distance between the first to-be-processed set and the second to-be-processed set according to the cosine distance, wherein the absolute distance is the minimum value of the cosine distance.

acquiring a first absolute distance corresponding to a first set to be processed;

sorting according to the magnitude of the first absolute distance to obtain a first nearest neighbor sequence corresponding to the first to-be-processed set;

Acquiring a second absolute distance corresponding to a second set to be processed;

Sorting according to the magnitude of the second absolute distance to obtain a second nearest neighbor sequence corresponding to the second to-be-processed set;

And determining the mutual neighbor distance between the first to-be-processed set and the second to-be-processed set according to the sequence number of the first to-be-processed set in the second nearest neighbor sequence and the sequence number of the second to-be-processed set in the first nearest neighbor sequence.

and determining a neighbor ordering distance between the first to-be-processed set and the second to-be-processed set according to the first nearest neighbor sequence and the second nearest neighbor sequence.

Selecting a sample image in the target set;

And sending the sample image to the terminal equipment so that the terminal equipment displays the sample image.

fig. 10 is a schematic diagram of a server structure provided in an embodiment of the present application, where the server 1000 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1022 (e.g., one or more processors) and a memory 1032, and one or more storage media 1030 (e.g., one or more mass storage devices) storing an application 1042 or data 1044. Memory 1032 and storage medium 1030 may be, among other things, transient or persistent storage. The program stored on the storage medium 1030 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, a central processor 1022 may be disposed in communication with the storage medium 1030, and configured to execute a series of instruction operations in the storage medium 1030 on the server 1000.

The server 1000 may also include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input-output interfaces 1058, and/or one or more operating systems 1041, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

the steps performed by the server in the above embodiment may be based on the server structure shown in fig. 10.

In the embodiment of the present application, the CPU1022 is specifically configured to execute the following steps:

acquiring a first image feature vector and a second image feature vector, wherein the first image feature vector has a first corresponding relation with a first sample image, and the second image feature vector has a second corresponding relation with a second sample image;

in the embodiment of the present application, the CPU1022 is further configured to perform the following steps:

determining a first nearest neighbor sequence corresponding to the first to-be-processed set and a second nearest neighbor sequence corresponding to the second to-be-processed set according to the absolute distance, wherein the first nearest neighbor sequence is a sorting queue of the second sample images according to the cosine distance, and the second nearest neighbor sequence is a sorting queue of the first sample images according to the cosine distance;

And determining the mutual neighbor distance between the first to-be-processed set and the second to-be-processed set according to the sequence number of the first absolute sample image in the second nearest neighbor sequence and the sequence number of the first absolute sample image in the second nearest neighbor sequence.

Selecting a sample image in the target set;

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

in the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

in addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

the integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A method of sample classification, comprising:

2. The method according to claim 1, wherein said obtaining the distance between the first to-be-processed set and the second to-be-processed set according to the first image feature vector and the second image feature vector comprises:

3. the method according to claim 2, wherein the absolute distance is determined from a first absolute sample image and a second absolute sample image, and wherein the obtaining the distance between the first to-be-processed set and the second to-be-processed set from the first image feature vector and the second image feature vector comprises:

4. The method according to claim 3, wherein said obtaining the distance between the first to-be-processed set and the second to-be-processed set according to the first image feature vector and the second image feature vector comprises:

5. The method according to claim 1, wherein after the generating of the sample classification results corresponding to the first to-be-processed set and the second to-be-processed set, the method further comprises:

6. The method of claim 5, wherein the sending the target set with the similarity smaller than a set threshold to a terminal device, so that the terminal device presents the target set comprises:

selecting a sample image in the target set;

7. the method of claim 5, wherein after the presenting the target set with the similarity less than a set threshold, the method further comprises:

8. an apparatus for sample classification, comprising:

9. a server, comprising:

the memory is a transient memory or a persistent memory;

the central processor is configured to communicate with the memory, the instructions in the memory being executable on the server to perform the method of any one of claims 1 to 7.

10. a computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 7.