CN112307938B - Data clustering method and device, electronic equipment and storage medium - Google Patents

Data clustering method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112307938B
CN112307938B CN202011172426.7A CN202011172426A CN112307938B CN 112307938 B CN112307938 B CN 112307938B CN 202011172426 A CN202011172426 A CN 202011172426A CN 112307938 B CN112307938 B CN 112307938B
Authority
CN
China
Prior art keywords
data
similarity
target
target data
reliability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011172426.7A
Other languages
Chinese (zh)
Other versions
CN112307938A (en
Inventor
蔡官熊
郑清源
唐诗翔
陈大鹏
赵瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Sensetime Technology Co Ltd
Original Assignee
Shenzhen Sensetime Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Sensetime Technology Co Ltd filed Critical Shenzhen Sensetime Technology Co Ltd
Priority to CN202011172426.7A priority Critical patent/CN112307938B/en
Priority to PCT/CN2020/131241 priority patent/WO2022088331A1/en
Priority to TW109144955A priority patent/TWI767459B/en
Publication of CN112307938A publication Critical patent/CN112307938A/en
Application granted granted Critical
Publication of CN112307938B publication Critical patent/CN112307938B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Abstract

The application discloses a data clustering method and a device thereof, electronic equipment and a storage medium, wherein the data clustering method comprises the following steps: acquiring a plurality of target data about a target object from a data set to be clustered, wherein the target object comprises a first part and a second part, and the target data is data corresponding to the first part; determining a first similarity between the plurality of target data and a reference factor, wherein the reference factor comprises at least one of: a second similarity between the auxiliary data corresponding to each of the plurality of target data, a reliability of the target data, and a reliability of the auxiliary data, the auxiliary data being data corresponding to a second part; and clustering the plurality of target data based on the first similarity and the reference factor, wherein the clustering result is used for determining the target objects to which the plurality of target data belong. According to the scheme, the accuracy of data clustering can be improved.

Description

Data clustering method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data clustering method and apparatus, an electronic device, and a storage medium.
Background
With the rapid development of data acquisition technology, a large amount of target data of the same or different target objects are generated every day, for example, in an intelligent video monitoring system, a large amount of face image data are generated every day. Generally, the target data of the same target object are clustered into one class and the target data of different target objects are clustered into different classes from a large number of feature libraries through a clustering algorithm, so as to realize data clustering. Taking clustering of face image data in an intelligent video monitoring system as an example, the following situations may exist in face images: the human face image is shielded by a mask, sunglasses and other shielding objects, is a low-resolution image such as a blurred human face, is influenced by light intensity and the like, or has a large difference between a front face and a side face of the same person, so that errors often occur during data clustering. In view of this, how to improve the accuracy of data clustering is an urgent problem to be solved.
Disclosure of Invention
The application at least provides a data clustering method and device, electronic equipment and a storage medium.
The application provides a data clustering method in a first aspect. The data clustering method comprises the following steps: acquiring a plurality of target data about a target object from a data set to be clustered, wherein the target object comprises a first part and a second part, and the target data is data corresponding to the first part; determining a first similarity between the plurality of target data and a reference factor, wherein the reference factor comprises at least one of: a second similarity between the auxiliary data corresponding to each of the plurality of target data, a reliability of the target data, and a reliability of the auxiliary data, the auxiliary data being data corresponding to the second portion; clustering the plurality of target data based on the first similarity and a reference factor, wherein the clustering result is used for determining the target object to which the plurality of target data belong.
Therefore, after a plurality of target data about the target object are acquired from the data set to be clustered, not only the first similarity among the plurality of target data is determined, but also reference factors such as the second similarity among auxiliary data respectively corresponding to the plurality of target data, the reliability of the auxiliary data and the like are determined, so that the similarity and the reliability can be combined, or the target data and the auxiliary data corresponding to different parts of the target object are combined to cluster the plurality of target data, so that the target object to which the plurality of target data belong is determined, and the data clustering of the target object is realized. And compared with clustering only by utilizing the similarity of the target data, the method and the device can consider the data reliability and the data of other parts by combining the reference factors, and can improve the accuracy of data clustering.
The data set to be clustered further comprises the auxiliary data, and the data set to be clustered is obtained at least through the following steps: and performing feature extraction on the target object in a first image to obtain feature data of a first part and feature data of a second part of the target object, wherein the feature data of the first part is used as the target data in the data set to be clustered, and the feature data of the second part is used as auxiliary data in the data set to be clustered.
Therefore, the data set to be clustered further includes auxiliary data, and the target data and the auxiliary data corresponding to the target data in the data set to be clustered can be obtained by performing feature extraction on different parts of the target object in the first image respectively.
The feature extraction of the target object in the first image to obtain feature data of a first part and feature data of a second part of the target object includes: acquiring a first region corresponding to the first position and a second region corresponding to the second position from the first image; and under the condition that the first area and the second area meet preset matching conditions, respectively performing feature extraction on the first area and the second area to correspondingly obtain feature data of the first part and feature data of the second part.
Therefore, only when the first region corresponding to the first part and the second region corresponding to the second part meet the preset matching condition, corresponding feature data are obtained through feature extraction, and therefore the first image and data of the first image, which obviously show that the first part and the second part do not belong to the same target object, can be filtered out.
Wherein the preset matching condition comprises at least one of: the position relation between the first region and the second region meets a preset position relation, and the overlapping area of the first region and the second region is larger than a preset area threshold value.
Therefore, whether the first part and the second part belong to the same target object can be judged according to the position relation or the overlapping area condition between the first region and the second region, and image filtering is further achieved.
Before the feature extraction is performed on the target object in the first image to obtain feature data of a first portion and feature data of a second portion of the target object, the method further includes: acquiring the area of each second part contained in the second image; selecting a main second portion from second portions included in the second image based on an area of the second portion; a first image including the main second portion is extracted from the second image.
Therefore, the main second part can be selected by utilizing the second area, and the first image containing the main second part is extracted from the second image, so that the images with unobvious target objects in the same image are filtered preliminarily, and the quality of the data clustering image is improved.
Wherein the feature data and the reliability are obtained by processing the first image by the same neural network model.
Therefore, the first image is input into the neural network model, so that the characteristic data and the credibility can be simultaneously acquired, and the data clustering efficiency is improved.
Wherein, before the obtaining of the plurality of target data from the data set to be clustered, the method further comprises: and filtering the target data of which the credibility does not meet a preset credibility condition in the data set to be clustered.
Therefore, the target data can be filtered by judging whether the reliability meets the preset reliability condition, so that the reliability of the target data in the data set to be clustered is higher, and the data clustering precision is further improved.
Wherein the credibility of the target data and/or the auxiliary data is determined by at least one of the definition, the degree of being blocked and the light intensity of the corresponding part in the first image, wherein the first image is used for obtaining the target data and/or the auxiliary data.
Therefore, the target data and/or the auxiliary data can be obtained from the first image, and the factors such as definition, the sheltered degree, the light intensity and the like can be integrated to obtain the credibility of the target data and/or the auxiliary data.
Wherein the reference factor includes the second similarity, and the clustering the plurality of target data based on the first similarity and the reference factor includes: acquiring the weights of the first similarity and the second similarity, and performing weighting processing on the first similarity and the second similarity by using the weights to obtain the fusion similarity of the target data; and clustering the plurality of target data based on the fusion similarity.
Therefore, the weights of the first similarity and the second similarity of different parts of the target object can be integrated, and the first similarity and the second similarity are weighted, so that the fusion similarity is obtained and utilized to determine whether a plurality of target data belong to the same target object, and the clustering of the plurality of target data is facilitated.
Wherein the reference factor further includes a reliability of the target data and a reliability of the auxiliary data, and the obtaining the weight of the first similarity and the second similarity includes: and obtaining the weights of the first similarity and the second similarity based on the first similarity, the second similarity, the reliability of the target data and the reliability of the auxiliary data.
Therefore, the weights of the first similarity and the second similarity are determined according to the first similarity, the second similarity, the reliability of the target data and the reliability of the auxiliary data, so that the determination of the weights integrates the similarity and the reliability.
Wherein the obtaining of the weight of the first similarity and the second similarity based on the first similarity, the second similarity, the reliability of the target data, and the reliability of the auxiliary data includes: obtaining a first comprehensive credibility of the target data based on the credibility of the target data, and obtaining a second comprehensive credibility of the auxiliary data based on the credibility of the auxiliary data corresponding to the target data; and obtaining the weights of the first similarity and the second similarity by using the first similarity, the second similarity, the first comprehensive reliability and the second comprehensive reliability.
Therefore, the first comprehensive reliability and the second comprehensive reliability can be obtained respectively based on the reliability of the target data and the auxiliary data, and the weights of the first similarity and the second similarity are obtained based on the first similarity, the second similarity, the first comprehensive reliability and the second comprehensive reliability in a combined mode, so that the weights corresponding to the similarities can be obtained more accurately, and the accuracy of data clustering is improved.
Obtaining a first comprehensive credibility of the plurality of target data based on the credibility of the plurality of target data, wherein the obtaining the first comprehensive credibility of the plurality of target data comprises: taking the sum of the credibility of the target data as a first comprehensive credibility of the target data; the obtaining a second comprehensive credibility of the plurality of auxiliary data based on the credibility of the plurality of auxiliary data corresponding to the plurality of target data comprises: and taking the sum of the credibility of the plurality of auxiliary data as a second comprehensive credibility of the plurality of auxiliary data.
Therefore, the sum of the credibility of the plurality of target data or the plurality of auxiliary data can be used as the comprehensive credibility of the corresponding data, and the accuracy of the comprehensive credibility is improved.
Wherein the obtaining of the weight of the first similarity and the second similarity by using the first similarity, the second similarity, the first integrated reliability and the second integrated reliability comprises: and processing the first similarity, the second similarity, the first comprehensive reliability and the second comprehensive reliability by using a weight determination model to obtain the weights of the first similarity and the second similarity. Wherein the weight determination model is trained by at least the following steps: acquiring sample target data and the reliability thereof, and acquiring corresponding sample auxiliary data and the reliability thereof; determining a third similarity between a plurality of sample target data and a fourth similarity between a plurality of sample auxiliary data, and obtaining a third comprehensive reliability of the plurality of sample target data and a fourth comprehensive reliability of the plurality of sample auxiliary data based on the reliability of the sample target data and the sample auxiliary data; processing the third similarity, the fourth similarity, the third comprehensive reliability and the fourth comprehensive reliability by using the weight determination model to obtain weights of the third similarity and the fourth similarity; adjusting network parameters of the weight determination model based on the weights of the third similarity and the fourth similarity.
Therefore, the weights of the first similarity and the second similarity can be obtained through the weight determination model, the weights corresponding to the similarities can be efficiently and intelligently obtained, and the weight determination model can be trained by utilizing the third similarity, the fourth similarity, the third comprehensive reliability and the fourth comprehensive reliability of the samples, so that the final weight determination model is obtained.
Wherein the clustering the plurality of target data based on the fusion similarity comprises: and clustering the plurality of target data under the condition that the fusion similarity is detected to be larger than a preset similarity threshold.
Therefore, by presetting the similarity threshold, the target data with the fusion similarity not greater than the preset similarity can be filtered, and the precision of data clustering is further improved.
The target data and the auxiliary data are respectively feature data corresponding to the face and the body of the target object.
Therefore, the data can be clustered in association with the feature data corresponding to the face and body of the target object.
A second aspect of the present application provides a data clustering apparatus, including: the device comprises an acquisition module, a clustering module and a processing module, wherein the acquisition module is used for acquiring a plurality of target data related to a target object from a data set to be clustered, the target object comprises a first part and a second part, and the target data is data corresponding to the first part; a first determining module, configured to determine a first similarity between the plurality of target data and a reference factor, where the reference factor includes at least one of: a second similarity between the auxiliary data corresponding to each of the plurality of target data, a reliability of the target data, and a reliability of the auxiliary data, the auxiliary data being data corresponding to the second portion; a second determining module, configured to cluster the plurality of target data based on the first similarity and a reference factor, where a result of the clustering is used to determine the target object to which the plurality of target data belong.
A third aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, wherein the processor is configured to execute program instructions stored in the memory to implement the data clustering method in the first aspect.
A fourth aspect of the present application provides a computer-readable storage medium, on which program instructions are stored, and the program instructions, when executed by a processor, implement the data clustering method in the first aspect.
According to the scheme, after the plurality of target data are acquired from the data set to be clustered, not only are first similarity among the plurality of target data determined, but also reference factors such as second similarity among auxiliary data respectively corresponding to the plurality of target data, reliability of the auxiliary data and the like are determined, so that the similarity and the reliability can be combined, or the target data and the auxiliary data corresponding to different parts of the target object are combined to cluster the plurality of target data, and the target object to which the plurality of target data belong is determined. And compared with the method that the target data are clustered only by utilizing the similarity of the target data, the method and the device can consider the data reliability and the data of other parts by combining the reference factors, and can improve the accuracy of data clustering.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic flow chart diagram of an embodiment of a data clustering method of the present application;
FIG. 2 is a schematic flow chart diagram illustrating another embodiment of the data clustering method of the present application;
FIG. 3 is a schematic flow chart diagram illustrating a data clustering method according to another embodiment of the present application;
FIG. 4 is a schematic flowchart of step S34 according to yet another embodiment of the present invention;
FIG. 5 is a block diagram of an embodiment of the data clustering device of the present application;
FIG. 6 is a block diagram of an embodiment of an electronic device of the present application;
FIG. 7 is a block diagram of an embodiment of a computer-readable storage medium of the present application.
Detailed Description
The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.
In the following description, for purposes of explanation rather than limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of a, B, and C, and may mean including any one or more elements selected from the group consisting of a, B, and C.
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a data clustering method according to an embodiment of the present application.
Specifically, the method may include the steps of:
step S11: and acquiring a plurality of target data about the target object from the data set to be clustered.
The data set to be clustered includes, but is not limited to, image data such as video and image, and the data set to be clustered may include related data of the target object, and is not limited specifically herein. The data set to be clustered may be obtained by extracting a video image, may be composed of original images, may also be obtained by performing feature extraction on the original images, and may also be obtained in other data acquisition forms, which is not specifically limited herein. The data set to be clustered may include a plurality of target data about the target object so as to cluster the plurality of target data; the data set to be clustered may further include auxiliary data corresponding to the plurality of target data, respectively, so as to cluster the plurality of target data using the auxiliary data.
The target object may be any object that needs to be clustered, such as any object like a person, an animal, a vehicle, etc. The target object includes, but is not limited to, a first portion, a second portion and other regions reflecting different features of the target object. In one embodiment, the target object is a person, the first part is a face, and the second part is a body. The data corresponding to the first portion is target data, and the data corresponding to the second portion is auxiliary data. The target data and the auxiliary data are used for representing characteristic information of the target object, but the target data and the auxiliary data respectively correspond to different parts of the target object. In an embodiment, feature extraction is performed on a target object in a first image, so as to obtain feature data of a first portion and feature data of a second portion of the target object, where the feature data of the first portion is used as target data in a data set to be clustered, and the feature data of the second portion is used as auxiliary data in the data set to be clustered.
The target data is obtained from the data set to be clustered, and when the target data is clustered, the number of the target data obtained from the data set to be clustered is not particularly limited, and is, for example, two, three, or the like. The target data in the data set to be clustered may or may not belong to the same target object.
Step S12: a first similarity between the plurality of target data and a reference factor are determined.
The reference factors are used for assisting in clustering a plurality of target data, and further more accurate data clustering is achieved. The reference factors include at least one of: a second similarity between the auxiliary data corresponding to each of the plurality of target data, a reliability of the target data, and a reliability of the auxiliary data.
The auxiliary data is data of a different part of the target object corresponding to the target data, so as to provide characteristic information of other parts of the target object. In some embodiments, more comprehensive information can be obtained by combining the target data of the target object and the related information of the auxiliary data, so as to realize the joint clustering. It is understood that the auxiliary data may be data capable of reflecting the identity information and feature information of the target object in addition to the target data of the target object. Generally, a set of target data and corresponding auxiliary data correspond to the same target object, for example, the target data is data corresponding to a first portion of the target object a, and the auxiliary data corresponding to the target object is data corresponding to a second portion of the target object a. The target object part includes, but is not limited to, a face, an upper body, or an entire body, and the target data and the auxiliary data are data corresponding to the face and the body of the target object, respectively. The data set to be clustered and the manner of acquiring the target data and the auxiliary data thereof are not particularly limited, for example, feature data of different parts of the target object are extracted as corresponding target data and auxiliary data based on an image including the target object. The similarity indicates a degree of similarity between the data, for example, a first similarity between the target data indicates a degree of similarity between the target data, the greater the first similarity, the smaller the difference between the target data, and the similarity between the auxiliary data and the second similarity, which are not repeated herein. The confidence level of the target data and/or the auxiliary data is indicated in the data quality, for example, the higher the confidence level of the target data, the higher the quality of the target data is indicated, and the confidence level of the auxiliary data is similar thereto, which is not described herein again.
In order to improve the flexibility of data clustering, the first similarity between the plurality of target data may be arbitrarily combined with the reference factor. For example, in a disclosed embodiment, the first similarity between a plurality of target data and the second similarity between auxiliary data respectively corresponding to the plurality of target data may be determined, so that the similarity between the combined target data and the auxiliary data clusters the plurality of target data, and it may be determined whether the plurality of target data belong to the same target object, that is, the similarity at different parts of the target object is synthesized to perform the combined data clustering, thereby clustering the target data corresponding to the same target object into the same cluster. In a disclosed embodiment, a first similarity between a plurality of target data and a reliability of the target data can be determined, the first similarity and the reliability of the target data are integrated, the plurality of target data are clustered, whether the plurality of target data belong to the same target object can be determined, and then the target data corresponding to the same target object are clustered to the same cluster. In a disclosed embodiment, a first similarity among a plurality of target data, a second similarity among auxiliary data corresponding to the plurality of target data respectively, the reliability of the target data and the reliability of the auxiliary data can be determined, so that when the target data are clustered, the auxiliary data are introduced for joint clustering, reliability constraint is added, and the similarity and the reliability are integrated for high-accuracy data clustering.
The data clustering method can be used for any application scene needing to cluster the target object, such as identification or tracking of people or animals. In an application scene, in order to verify the authority of a person, a data clustering device is configured at an entrance and an exit of a specific place such as a company doorway to execute the data clustering method so as to realize face recognition by taking a target object as a person and target data as face data to realize the face recognition as an example; in an application scene, in order to record people coming in and going out of a public area, a data clustering device is configured in public places such as subways and railway stations to execute the data clustering method, so that face recognition is realized. When face recognition is performed by using only face data, it is easy to determine that a plurality of pieces of face data similar to a face belong to the same person, and therefore, reference factors such as the reliability of the face data or auxiliary data obtained by using body features can be introduced to assist in performing face recognition. When face recognition is performed based on only face similarity, the similarity between face data of the same person may be low (for example, the front face and the side face with a large angle of the same person), or the similarity between face data of different persons may be high (for example, different persons wear masks or sunglasses, or side faces with a large angle), and therefore, reference factors of target data may be introduced, for example, the similarity of the face and the similarity of the body may be fused, and the reliability of the face data and the reliability of the body data may also be fused, so that face recognition may be achieved by combining the similarity and the reliability.
In order to improve the quality of the target data in the data set to be clustered, the target data with the reliability not meeting the preset credibility condition in the data set to be clustered can be filtered before a plurality of target data are obtained from the data set to be clustered. Therefore, the target data can be filtered by judging whether the reliability meets the preset reliability condition, so that the reliability of the target data in the data set to be clustered is higher, and the data clustering precision is improved.
Step S13: and clustering the plurality of target data based on the first similarity and the reference factor.
And clustering the plurality of target data based on the first similarity and the reference factor, wherein the clustering result is used for determining target objects to which the plurality of target data belong, namely determining whether the plurality of target data belong to the same target object, and if the plurality of target data belong to the same target object, clustering the target data belonging to the same target object into the same clustering cluster. And (3) the plurality of target data belong to the same target object, which indicates that the plurality of target data belong to the same cluster, and the steps S11 to S13 can be repeatedly executed, so that all the target data in the data set to be clustered are clustered into the same or different clusters, and data clustering is realized.
After determining whether the plurality of target data belong to the same target object, constructing a connected graph of the target object according to the target data belonging to the same target object; or constructing a connected graph of the target object according to the target data and the auxiliary data belonging to the same target object.
In an application scenario, after determining whether a plurality of target data belong to the same target object, the target data of the target object may be recognized based on the target data belonging to the same target object, and the target data of the target object may also be assisted by auxiliary data, for example, the target data is feature data of a face of the target object, the auxiliary data is feature data of a body of the target object, the feature data of the face of the same target object may be used for face recognition of the target object, and the feature data of the face and the feature data of the body of the same target object may also be used for face recognition of the target object. For the convenience of target data identification of a target object, after determining whether a plurality of target data belong to the same target object, the plurality of target data belonging to the same target object may be placed into a database corresponding to the target object, so as to perform target data identification of the target object based on the database, wherein the database includes, but is not limited to, target data, auxiliary data, and the like, and is not limited herein.
In the above scheme, after the plurality of target data are acquired from the data set to be clustered, not only the first similarity between the plurality of target data is determined, but also the second similarity between the auxiliary data corresponding to the plurality of target data, the reliability of the auxiliary data, and other reference factors are determined, so that the similarity and the reliability can be combined, or the target data and the auxiliary data corresponding to different parts of the target object are combined to cluster the plurality of target data. Compared with the method that whether the target data belong to the same target object is determined only by utilizing the similarity of the target data, the method and the device can consider the data reliability and the data of other parts by combining the reference factors, and can improve the accuracy of data clustering.
It is understood that the subject of the data clustering method of the present application may be any device with processing capability, such as, but not limited to, a target data acquisition device, a server connected to the target data acquisition device, and the like. In an application scene, the target data is an image, so that at least one image acquisition device can be used for acquiring the image about the target object and sending the image to the server, and the server takes the image acquired by the image acquisition device as the target data and executes the data clustering method to cluster the target data. In another application scenario, after the image acquisition device acquires the image about the target object, the image acquired by the image acquisition device and/or other image acquisition devices can be used as target data, and the data clustering method is executed to cluster the target data.
In order to avoid that the target data of the same target object are clustered into a plurality of classes under the condition that the target data of the same target object are obviously different, so that the recall rate is low; or in order to avoid clustering target data of a plurality of target objects with similar target data into one class, which results in low clustering precision, whether the plurality of target data belong to the same target object can be determined by combining first similarity among the plurality of target data and second similarity among corresponding auxiliary data, so that more accurate clustering is realized. Referring to fig. 2, fig. 2 is a schematic flowchart illustrating another embodiment of the data clustering method of the present application. Specifically, the method may include the steps of:
step S21: feature extraction is carried out on the target object in the first image, and feature data of a first portion and feature data of a second portion of the target object are obtained, wherein the feature data of the first portion serve as target data in a data set to be clustered, and the feature data of the second portion serve as auxiliary data in the data set to be clustered.
The first image is an image containing a target object, including but not limited to a raw image, and is used to obtain target data and/or auxiliary data. In a disclosed embodiment, data clustering is performed in combination with target data and auxiliary data, feature extraction is performed on a target object in a first image, feature data of a first portion of the target object is taken as the target data in a data set to be clustered, and feature data of a second portion of the target object is taken as the auxiliary data in the data set to be clustered. The feature data is obtained by processing the first image by using a neural network model. Therefore, by extracting features of different parts of the target object in the first image, the target data and the auxiliary data corresponding to the target data in the data set to be clustered can be obtained respectively.
When feature extraction is carried out on a target object in a first image to obtain feature data of a first part and feature data of a second part of the target object, a first area corresponding to the first part and a second area corresponding to the second part are obtained from the first image; and under the condition that the first area and the second area meet the preset matching condition, respectively carrying out feature extraction on the first area and the second area so as to correspondingly obtain feature data of the first part and feature data of the second part. The preset matching condition comprises at least one of the following conditions: the position relation between the first region and the second region meets the preset position relation, and the overlapping area of the first region and the second region is larger than the preset area threshold value. The preset position relationship and the preset area threshold may be set by a user, and are not limited in particular, for example, a critical line is determined on the second area, and the preset position relationship is an area of the first area above the critical line of the second area. Therefore, whether the first part and the second part belong to the same target object can be judged through the position relation or the overlapping area condition between the first region and the second region, and corresponding feature data can be obtained through feature extraction only when the first region corresponding to the first part and the second region corresponding to the second part meet the preset matching condition, so that a first image and data of the first part and the second part which do not belong to the same target object can be filtered, and image filtering is realized.
In order to obtain a high-quality first image, in a disclosed embodiment, before feature extraction is performed on a target object in a first image to obtain feature data of a first portion and feature data of a second portion of the target object, an area of each second portion included in a second image may be obtained, a main second portion may be selected from the second portions included in the second image based on the area of the second portion, and finally, the first image including the main second portion is extracted from the second image to realize screening of the first image. When the area of each second portion included in the second image is obtained, the contour of each second portion included in the second image may be obtained through a Mask RCNN (Mask Region based Convolutional Neural Network ) segmentation technique, and the area of each second portion included in the second image is further obtained. When the main second portion is selected from the second portions included in the second image based on the area of the second portion, the second portion having the largest area may be used as the main second portion, or the second portion having the area of the second portion satisfying a predetermined area condition may be used as the main second portion, and the predetermined area condition is not particularly limited. In an application scene, acquiring the area of each second part contained in the second image; obtaining a second part with the largest area and a second part with the second largest area by sequencing the areas of all the second parts and the like; and if the area ratio of the second part with the second largest area to the second part with the largest area is smaller than a preset area value, taking the second part with the largest area as a main second part, otherwise, judging that the main second part does not exist, and not carrying out subsequent data clustering, so that only the image with a more obvious target object is taken as the first image. Therefore, the main second part can be selected by utilizing the second area, and the first image containing the main second part is extracted from the second image, so that the images with unobvious target objects in the same image are preliminarily filtered, and the quality of the data clustering image is improved.
Step S22: and acquiring a plurality of target data and auxiliary data corresponding to the target data from the data set to be clustered.
When the data is clustered, the number of target data acquired from the data set to be clustered is not particularly limited, and may be, for example, two or three. The target data and the auxiliary data are data corresponding to different parts of the target object.
Step S23: a first similarity between the plurality of target data and a second similarity between auxiliary data respectively corresponding to the plurality of target data are determined.
A first similarity between the plurality of target data indicates a degree of similarity at a first location of the target object, and a second similarity between the plurality of auxiliary data indicates a degree of similarity at a second location of the target object.
Step S24: and clustering the plurality of target data based on the first similarity and the second similarity.
When clustering is carried out on a plurality of target data, the weight of the first similarity and the weight of the second similarity are obtained, and the first similarity and the second similarity are weighted by the weight to obtain the fusion similarity of the plurality of target data; and clustering the plurality of target data based on the fusion similarity. And the clustering result is used for determining the target objects to which the target data belong, so that whether the target data belong to the same target object can be obtained by clustering the target data based on the fusion similarity. Compared with the method that whether the target data belong to the same target object is determined only by means of the first similarity corresponding to the first part of the target object, the method and the device for determining the target data cluster determine whether the target data belong to the same target object by combining the similarities of the different parts of the target object, and accuracy of data clustering is improved.
After determining whether the plurality of target data belong to the same target object, a connected graph of the target object may be constructed according to the target data belonging to the same target object. In the embodiment of the disclosure, the clustering of the plurality of target data can be assisted by utilizing the similarity of the auxiliary data, and then the edges can be built by utilizing the information of different parts of the target object when the connected graph is built.
By the method, the target object is subjected to feature extraction in the first image to obtain feature data of a first part and feature data of a second part of the target object, and a data set to be clustered is formed; the method comprises the steps of obtaining a plurality of target data and auxiliary data corresponding to the target data from a data set to be clustered, determining first similarity among the target data and second similarity among the auxiliary data corresponding to the target data respectively, clustering the target data by combining the similarities of different parts, determining whether the target data belong to the same target object, and clustering the target data of the same target object into the same class to realize multi-mode combined clustering.
In order to achieve more accurate data clustering, in addition to the similarity of the joint target data and the auxiliary data, the credibility of the joint target data and the auxiliary data can be further combined. Referring to fig. 3, fig. 3 is a schematic flow chart of a data clustering method according to another embodiment of the present application. Specifically, the method may include the steps of:
step S31: and performing feature extraction on the target object in the first image to obtain feature data of a first part and feature data of a second part of the target object, wherein the feature data of the first part is used as target data in a data set to be clustered, and the feature data of the second part is used as auxiliary data in the data set to be clustered.
The characteristic data of the first part is used as target data in the data set to be clustered, and the characteristic data of the second part is used as auxiliary data in the data set to be clustered. The first image is used to obtain target data and/or auxiliary data. The rest of the description about step S31 is similar to step S21 and will not be repeated herein.
Step S32: and acquiring a plurality of target data and auxiliary data corresponding to the target data from the data set to be clustered.
After the target data and the auxiliary data corresponding to the target data in the data set to be clustered are obtained by utilizing a first image and feature extraction technology, the target data and the auxiliary data corresponding to the target data to be clustered are obtained from the data set to be clustered when the data are clustered.
In order to filter target data with low reliability and auxiliary data corresponding to the target data as early as possible and further improve data clustering precision, before a plurality of target data and auxiliary data corresponding to the target data are obtained from a data set to be clustered, the target data and the auxiliary data corresponding to the target data, the reliability of which does not meet a preset reliability condition, of the data set to be clustered can be filtered. That is, whether the credibility of the target data and/or the auxiliary data meets the preset credibility condition can be judged, so that when the credibility does not meet the preset credibility condition, the target data and the auxiliary data corresponding to the target data are filtered together, the credibility of the data in the data set to be clustered is higher, and the data clustering precision is further improved.
Step S33: first similarity among the target data, second similarity among auxiliary data corresponding to the target data, reliability of the target data, and reliability of the auxiliary data are determined.
The credibility of the target data and/or the auxiliary data is determined by at least one of the definition, the blocked degree and the light intensity of the corresponding part in the first image, so that the target data and/or the auxiliary data can be obtained from the first image, and the credibility of the target data and/or the auxiliary data can be obtained by integrating the factors such as the definition, the blocked degree and the light intensity.
The feature data and the credibility are obtained by processing the first image through a neural network model. In a disclosed embodiment, the feature data and the reliability are obtained by processing the first image by the same neural network model, so that the feature data and the reliability can be simultaneously obtained by inputting the first image into the neural network model, and the efficiency of data clustering is improved.
Step S34: and clustering the plurality of target data based on the first similarity, the second similarity, the reliability of the target data and the reliability of the auxiliary data.
In the embodiment of the disclosure, the similarity and the reliability are combined, and the target data and the auxiliary data corresponding to different parts of the target object are combined to cluster the plurality of target data, so that whether the plurality of target data belong to the same target object is determined more accurately, the target data of the same target object are clustered into the same class, and high-precision data clustering is realized.
To clearly describe how to determine whether multiple target data belong to the same target object by using the similarity and reliability of the target data and the auxiliary data, please refer to fig. 4, where fig. 4 is a flowchart illustrating step S34 of another embodiment of the data clustering method according to the present application. Specifically, step S34 may include the steps of:
step S341: and acquiring the weights of the first similarity and the second similarity, and performing weighting processing on the first similarity and the second similarity by using the weights to obtain the fusion similarity of the plurality of target data.
The fusion similarity is a result of weighting the first similarity and the second similarity, and the similarity of the plurality of target data is mapped.
When the weights of the first similarity and the second similarity are obtained, the weights of the first similarity and the second similarity are obtained based on the first similarity, the second similarity, the reliability of the target data and the reliability of the auxiliary data, so that the similarity and the reliability are integrated for determining the weights, and the self-adaptive weighting is realized. Specifically, a first comprehensive credibility of the plurality of target data is obtained based on the credibility of the plurality of target data, and a second comprehensive credibility of the plurality of auxiliary data is obtained based on the credibility of the plurality of auxiliary data corresponding to the plurality of target data; the weights of the first similarity and the second similarity are obtained by utilizing the first similarity, the second similarity, the first comprehensive reliability and the second comprehensive reliability, the weights corresponding to the similarities can be more accurately obtained, and the accuracy of data clustering is further improved.
In a disclosed embodiment, when a first comprehensive reliability of a plurality of target data is obtained based on the reliability of the plurality of target data, or a second comprehensive reliability of a plurality of auxiliary data is obtained based on the reliability of the plurality of auxiliary data corresponding to the plurality of target data, the sum of the reliabilities of the plurality of target data or the plurality of auxiliary data is used as the comprehensive reliability of the corresponding data. That is, the sum of the reliabilities of the plurality of target data is used as a first comprehensive reliability of the plurality of target data, and the sum of the reliabilities of the plurality of auxiliary data is used as a second comprehensive reliability of the plurality of auxiliary data, so that the summed reliability obtains a comprehensive reliability, and the accuracy of the comprehensive reliability is improved.
In a disclosed embodiment, when the first similarity, the second similarity, the first comprehensive reliability and the second comprehensive reliability are used to obtain the weights of the first similarity and the second similarity, the weight determination model is used to process the first similarity, the second similarity, the first comprehensive reliability and the second comprehensive reliability to obtain the weights of the first similarity and the second similarity. When the weight determining model obtains the weights of the first similarity and the second similarity, the similarity and the reliability are combined to learn the modal weight, and the appropriate modal weight can be locally and adaptively increased to construct the joint similarity. The weight determination model is trained by at least the following steps: acquiring sample target data and the reliability thereof, and acquiring corresponding sample auxiliary data and the reliability thereof; determining third similarity among the plurality of sample target data and fourth similarity among the plurality of sample auxiliary data, and obtaining third comprehensive credibility of the plurality of sample target data and fourth comprehensive credibility of the plurality of sample auxiliary data based on credibility of the sample target data and the sample auxiliary data; processing the third similarity, the fourth similarity, the third comprehensive reliability and the fourth comprehensive reliability by using a weight determination model to obtain weights of the third similarity and the fourth similarity; and adjusting the network parameters of the weight determination model based on the weights of the third similarity and the fourth similarity. Therefore, the weight determination model is trained by using the third similarity, the fourth similarity, the third comprehensive reliability and the fourth comprehensive reliability of the sample, so as to obtain a final weight determination model. In a disclosed embodiment, the weight of the third similarity may also be obtained based on the third similarity, the fourth similarity, the reliability of the sample target data, and the reliability of the sample auxiliary data, the difference between the weight of 1 and the third similarity is used as the weight of the fourth similarity, or the weight of the fourth similarity is obtained based on the third similarity, the fourth similarity, the reliability of the sample target data, and the reliability of the sample auxiliary data, and the difference between the weight of 1 and the fourth similarity is used as the weight of the third similarity.
Step S342: and clustering the plurality of target data based on the fusion similarity.
After the fusion similarity is obtained, the target data can be clustered based on the fusion similarity, that is, the target objects to which the target data belong can be determined based on the clustering result. In a disclosed embodiment, under the condition that the fusion similarity is detected to be greater than the preset similarity threshold, clustering is performed on the plurality of target data, and it is determined that the plurality of target data belong to the same target object, so that the target data with the fusion similarity not greater than the preset similarity can be filtered through the preset similarity threshold, and the precision of data clustering is further improved.
In an application embodiment, face clustering is performed on feature data corresponding to the face and the body of a combined target object, the target data and the auxiliary data are respectively feature data corresponding to the face and the body of the target object, and the number of the target data and the auxiliary data corresponding to the target data is two.
And acquiring the area of each second part contained in the second image, taking the second part with the largest area as a main second part, and finally extracting the first image containing the main second part from the second image. Feature extraction is carried out on the target object in the first image, so that feature data of the face and feature data of the body of the target object are obtained, and then target data in the data set to be clustered and auxiliary data corresponding to the target data are obtained, namely the feature data of the face of a person are used as the target data in the data set to be clustered, and the feature data of the body of the person are used as the auxiliary data in the data set to be clustered.
Acquiring target data A and target data B from a data set to be clustered, and corresponding auxiliary data A and auxiliary data B, and determining a first similarity S between the target data A and the target data B by using a neural network model fe Second similarity S between A target data and B target data be A reliability Q of target data f1 B confidence level Q of target data f2 A reliability of auxiliary data Q b1 And confidence Q of the B auxiliary data b2 . The credibility Q of A target data f1 And B credibility Q of target data f2 Sum as the first integrated reliability Q of A target data and B target data fe A confidence level Q of the auxiliary data b1 And confidence Q of the B auxiliary data b2 Sum as a second integrated confidence level Q of the A-side data and the B-side data be
Determining a first similarity S of the model pair using the weights fe Second degree of similarity S be First, aComprehensive confidence level Q fe And a second integrated confidence level Q be Processing to obtain the weight W of the first similarity f Weight W of second similarity b . Obtaining a first similarity S fe And a second degree of similarity S be After the weight of (2), the first similarity S is determined by using the weight pair fe And a second degree of similarity S be Performing weighting processing to obtain fusion similarity S of A target data and B target data, wherein S = S fe* W f+ S be* W b And further determining whether the plurality of target data belong to the same target object based on the fusion similarity. The weight determination model is obtained by learning the weights of different modes by utilizing the information of two modes, namely the face and the body, and by training regression, and carrying out self-adaptive weighting.
When the face clustering is carried out by simply using the target data, the target data with similar faces but different target objects are easily determined to belong to the same target object, so that auxiliary data corresponding to the body are introduced to guide the face clustering, and more comprehensive information is utilized to realize the combined clustering. When clustering is performed based on only the similarity, the similarity between target data of the same target object may be low (for example, the front face and the side face of the same target object having a large angle), or the similarity between target data of different target objects may be high (for example, all masks, sunglasses, or all side faces having a large angle), and thus, the clustering may be performed with introducing credibility. In the embodiment of the application, the similarity of the target data corresponding to the face of the target object and the similarity of the auxiliary data corresponding to the body of the target object are fused, the face modality and the body modality are subjected to modality fusion, the credibility of the target data and the auxiliary data is fused, the weight of the quantitative fusion similarity of the credibility of the face (or the body) is realized, and the fusion similarity is constructed by combining the similarity and the credibility.
By the method, the target object is subjected to feature extraction in the first image to obtain feature data of a first part and feature data of a second part of the target object, and a data set to be clustered is formed; the method comprises the steps of obtaining a plurality of target data and auxiliary data corresponding to the target data from a data set to be clustered, determining first similarity, second similarity, reliability of the target data and reliability of the auxiliary data, combining the similarity of the target data and the auxiliary data corresponding to different parts, and determining whether the plurality of target data belong to the same target object by integrating the reliability of the target data and the reliability of the auxiliary data, so that the target data of the same target object are clustered into the same class, and the accuracy of data clustering can be improved.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Referring to fig. 5, fig. 5 is a schematic diagram of a framework of an embodiment of a data clustering device 50 according to the present application. The data clustering device 50 comprises an obtaining module 51, a first determining module 52 and a second determining module 53. An obtaining module 51, configured to obtain a plurality of target data about a target object from a data set to be clustered, where the target object includes a first portion and a second portion, and the target data is data corresponding to the first portion; a first determining module 52, configured to determine a first similarity between a plurality of target data and a reference factor, where the reference factor includes at least one of: a second similarity between the auxiliary data corresponding to each of the plurality of target data, reliability of the target data, and reliability of the auxiliary data, the auxiliary data being data corresponding to a second part; and a second determining module 53, configured to cluster the plurality of target data based on the first similarity and the reference factor, where a result of the clustering is used to determine a target object to which the plurality of target data belong.
In a disclosed embodiment, the data set to be clustered further includes auxiliary data, and the data clustering device 50 includes a feature extraction module 54. The feature extraction module 54 is configured to perform feature extraction on the target object in the first image to obtain feature data of a first portion and feature data of a second portion of the target object, where the feature data of the first portion are respectively used as target data in a data set to be clustered, and the feature data of the second portion and its corresponding are used as auxiliary data in the data set to be clustered.
In a disclosed embodiment, the feature extraction module 54 is configured to perform feature extraction on the target object in the first image, and when feature data of a first portion and feature data of a second portion of the target object are obtained, further configured to obtain a first region corresponding to the first portion and a second region corresponding to the second portion from the first image; and under the condition that the first area and the second area meet the preset matching condition, respectively carrying out feature extraction on the first area and the second area so as to correspondingly obtain feature data of the first part and feature data of the second part.
In a disclosed embodiment, the predetermined matching condition includes at least one of: the position relation between the first region and the second region meets the preset position relation, and the overlapping area of the first region and the second region is larger than the preset area threshold value.
In a disclosed embodiment, before the feature extraction module 54 is configured to perform feature extraction on the target object in the first image to obtain feature data of the first portion and feature data of the second portion of the target object, it is further configured to obtain an area of each second portion included in the second image; selecting a main second portion from second portions included in the second image based on an area of the second portion; a first image including a main second portion is extracted from the second image.
In a disclosed embodiment, the feature data and the confidence level are obtained by processing the first image with the same neural network model.
In a disclosed embodiment, the obtaining module 51 is further configured to filter target data in the data set to be clustered, where the reliability of the target data does not satisfy a preset reliability condition, before the obtaining module is configured to obtain a plurality of target data from the data set to be clustered.
In a disclosed embodiment, the confidence level of the target data and/or the auxiliary data is determined by at least one of the definition, the degree of occlusion, and the light intensity of the corresponding portion in the first image, wherein the first image is used for obtaining the target data and/or the auxiliary data.
In a disclosed embodiment, the reference factor includes a second similarity, and the second determining module 53 is configured to, when clustering is performed on the plurality of target data based on the first similarity and the reference factor, further obtain weights of the first similarity and the second similarity, and perform weighting processing on the first similarity and the second similarity by using the weights to obtain a fusion similarity of the plurality of target data; and clustering the plurality of target data based on the fusion similarity.
In a disclosed embodiment, the reference factor further includes a reliability of the target data and a reliability of the auxiliary data, and the second determining module 53 is further configured to obtain the weights of the first similarity and the second similarity based on the first similarity, the second similarity, the reliability of the target data, and the reliability of the auxiliary data when the weights of the first similarity and the second similarity are obtained.
In a disclosed embodiment, the second determining module 53 is configured to, when obtaining the weights of the first similarity and the second similarity based on the first similarity, the second similarity, the reliability of the target data, and the reliability of the auxiliary data, further obtain a first comprehensive reliability of the plurality of target data based on the reliability of the plurality of target data, and obtain a second comprehensive reliability of the plurality of auxiliary data based on the reliability of the plurality of auxiliary data corresponding to the plurality of target data; and obtaining the weight of the first similarity and the second similarity by using the first similarity, the second similarity, the first comprehensive reliability and the second comprehensive reliability.
In a disclosed embodiment, the second determining module 53 is configured to, when obtaining a first comprehensive reliability of the plurality of target data based on the reliability of the plurality of target data, further use a sum of the reliabilities of the plurality of target data as the first comprehensive reliability of the plurality of target data; the second determining module 53 is configured to, when obtaining a second comprehensive reliability of the plurality of auxiliary data based on the reliability of the plurality of auxiliary data corresponding to the plurality of target data, further use a sum of the reliability of the plurality of auxiliary data as the second comprehensive reliability of the plurality of auxiliary data.
In a disclosed embodiment, the second determining module 53 is configured to, when obtaining the weights of the first similarity and the second similarity by using the first similarity, the second similarity, the first comprehensive reliability, and the second comprehensive reliability, further use the weight determining model to process the first similarity, the second similarity, the first comprehensive reliability, and the second comprehensive reliability, so as to obtain the weights of the first similarity and the second similarity. The second determination module 53 further includes a model training unit, which is configured to obtain sample target data and reliability thereof, and obtain corresponding sample auxiliary data and reliability thereof; determining a third similarity between the plurality of sample target data and a fourth similarity between the plurality of sample auxiliary data, and obtaining a third comprehensive credibility of the plurality of sample target data and a fourth comprehensive credibility of the plurality of sample auxiliary data based on the credibility of the sample target data and the sample auxiliary data; processing the third similarity, the fourth similarity, the third comprehensive reliability and the fourth comprehensive reliability by using a weight determination model to obtain weights of the third similarity and the fourth similarity; and adjusting network parameters of the weight determination model based on the weights of the third similarity and the fourth similarity so as to train to obtain the weight determination model.
In a disclosed embodiment, the second determining module 53 is further configured to determine whether multiple target data belong to the same target object based on the fusion similarity, and further configured to determine that multiple target data belong to the same target object if it is detected that the fusion similarity is greater than a preset similarity threshold.
In a disclosed embodiment, the target data and the auxiliary data are feature data corresponding to the face and the body of the target object respectively.
In the above scheme, after the obtaining module 51 obtains a plurality of target data from the data set to be clustered, the first determining module 52 determines not only the first similarity between the plurality of target data, but also the second similarity between the auxiliary data corresponding to the plurality of target data, the reliability of the auxiliary data, and other reference factors, so that the second determining module 53 may combine the similarity and the reliability, or combine the target data corresponding to different parts of the target object and the auxiliary data to determine whether the plurality of target data belong to the same target object, thereby clustering the target data of the same target object into the same class, and implementing high-accuracy data clustering.
Referring to fig. 6, fig. 6 is a schematic block diagram of an embodiment of an electronic device 60 according to the present application. The electronic device 60 comprises a memory 61 and a processor 62 coupled to each other, the processor 62 being configured to execute program instructions stored in the memory 61 to implement the steps of any of the above-described embodiments of the data clustering method. In one particular implementation scenario, electronic device 60 may include, but is not limited to: a microcomputer, a server, and in addition, the electronic device 60 may also include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein.
In particular, the processor 62 is configured to control itself and the memory 61 to implement the steps of any of the above-described embodiments of the data clustering method. Processor 62 may also be referred to as a CPU (Central Processing Unit). The processor 62 may be an integrated circuit chip having signal processing capabilities. The Processor 62 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 62 may be collectively implemented by an integrated circuit chip.
Referring to fig. 7, fig. 7 is a block diagram illustrating an embodiment of a computer readable storage medium 70 of the present application. The computer readable storage medium 70 stores program instructions 701 executable by the processor, the program instructions 701 being for implementing the steps of any of the data clustering method embodiments described above.
In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.
The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is only one type of logical division, and other divisions may be implemented in practice, for example, the unit or component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims (18)

1. A method for clustering data, comprising:
acquiring a plurality of target data about a target object from a data set to be clustered, wherein the target object is a target object in image data, the target object comprises a first part and a second part, and the target data is data corresponding to the first part;
determining a first similarity between the plurality of target data and a reference factor, wherein the reference factor includes a second similarity between auxiliary data corresponding to the plurality of target data respectively, and the auxiliary data is data corresponding to the second portion;
clustering the plurality of target data based on the first similarity and a reference factor, wherein the clustering result is used for determining the target object to which the plurality of target data belong;
wherein the clustering the plurality of target data based on the first similarity and a reference factor comprises:
acquiring the weights of the first similarity and the second similarity, and performing weighting processing on the first similarity and the second similarity by using the weights to obtain the fusion similarity of the target data;
and clustering the plurality of target data based on the fusion similarity.
2. The method of claim 1, wherein the reference factors further include at least one of a trustworthiness of the target data and a trustworthiness of the helper data.
3. The method according to claim 1, wherein the set of data to be clustered further comprises the assistance data, the set of data to be clustered resulting from at least the following steps:
and performing feature extraction on the target object in a first image to obtain feature data of a first part and feature data of a second part of the target object, wherein the feature data of the first part is used as the target data in the data set to be clustered, and the feature data of the second part is used as auxiliary data in the data set to be clustered.
4. The method according to claim 3, wherein the performing feature extraction on the target object in the first image to obtain feature data of a first portion and feature data of a second portion of the target object comprises:
acquiring a first region corresponding to the first position and a second region corresponding to the second position from the first image;
and respectively extracting the features of the first region and the second region under the condition that the first region and the second region meet preset matching conditions so as to correspondingly obtain feature data of the first part and feature data of the second part.
5. The method of claim 4, wherein the preset matching condition comprises at least one of: the position relation between the first region and the second region meets a preset position relation, and the overlapping area of the first region and the second region is larger than a preset area threshold value.
6. The method of claim 3, wherein before the feature extraction of the target object in the first image to obtain feature data of a first portion and feature data of a second portion of the target object, the method further comprises:
acquiring the area of each second part contained in a second image;
selecting a main second region from at least one second region included in the second image based on an area of the second region;
a first image including the main second portion is extracted from the second image.
7. The method of claim 3, wherein the reference factors further include at least one of a confidence level of the target data and a confidence level of the auxiliary data, the feature data and the confidence level being processed from the same neural network model for the first image.
8. The method according to any one of claims 2 to 7, wherein prior to said obtaining a plurality of target data from a data set to be clustered, the method further comprises:
and filtering the target data of which the credibility does not meet a preset credibility condition in the data set to be clustered.
9. The method according to any one of claims 2 to 7, wherein the confidence level of the target data and/or the auxiliary data is determined by at least one of the clarity, the degree of occlusion, and the light intensity of the corresponding portion in the first image, wherein the first image is used for obtaining the target data and/or the auxiliary data.
10. The method of claim 1, wherein the reference factors further include a reliability of the target data and a reliability of the auxiliary data, and wherein the obtaining the weight of the first similarity and the second similarity comprises:
and obtaining the weights of the first similarity and the second similarity based on the first similarity, the second similarity, the reliability of the target data and the reliability of the auxiliary data.
11. The method of claim 10, wherein the obtaining the weight of the first similarity and the second similarity based on the first similarity, the second similarity, the reliability of the target data, and the reliability of the auxiliary data comprises:
obtaining a first comprehensive credibility of the target data based on the credibility of the target data, and obtaining a second comprehensive credibility of the auxiliary data based on the credibility of the auxiliary data corresponding to the target data;
and obtaining the weights of the first similarity and the second similarity by using the first similarity, the second similarity, the first comprehensive reliability and the second comprehensive reliability.
12. The method of claim 11, wherein obtaining a first integrated trustworthiness of the plurality of target data based on the trustworthiness of the plurality of target data comprises:
taking the sum of the credibility of the target data as a first comprehensive credibility of the target data;
the obtaining a second comprehensive credibility of the plurality of auxiliary data based on the credibility of the plurality of auxiliary data corresponding to the plurality of target data includes:
and taking the sum of the credibility of the plurality of auxiliary data as a second comprehensive credibility of the plurality of auxiliary data.
13. The method according to claim 11 or 12, wherein the using the first similarity, the second similarity, the first integrated reliability and the second integrated reliability to obtain a weight of the first similarity and the second similarity comprises:
processing the first similarity, the second similarity, the first comprehensive reliability and the second comprehensive reliability by using a weight determination model to obtain weights of the first similarity and the second similarity;
wherein the weight determination model is trained by at least the following steps:
acquiring sample target data and the reliability thereof, and acquiring corresponding sample auxiliary data and the reliability thereof;
determining a third similarity between a plurality of sample target data and a fourth similarity between a plurality of sample auxiliary data, and obtaining a third comprehensive reliability of the plurality of sample target data and a fourth comprehensive reliability of the plurality of sample auxiliary data based on the reliability of the sample target data and the sample auxiliary data;
processing the third similarity, the fourth similarity, the third comprehensive reliability and the fourth comprehensive reliability by using the weight determination model to obtain weights of the third similarity and the fourth similarity;
adjusting network parameters of the weight determination model based on the weights of the third similarity and the fourth similarity.
14. The method according to any one of claims 10 to 12, wherein the clustering the plurality of target data based on the fusion similarity comprises:
and clustering the plurality of target data under the condition that the fusion similarity is detected to be larger than a preset similarity threshold.
15. The method according to claim 1, wherein the target data and the auxiliary data are feature data corresponding to a face and a body of the target object, respectively.
16. A data clustering apparatus, comprising:
the device comprises an acquisition module, a clustering module and a processing module, wherein the acquisition module is used for acquiring a plurality of target data about a target object from a data set to be clustered, the target object is a target object in image data, the target object comprises a first part and a second part, and the target data is data corresponding to the first part;
a first determining module, configured to determine a first similarity between the plurality of target data and a reference factor, where the reference factor includes a second similarity between auxiliary data corresponding to the plurality of target data, and the auxiliary data is data corresponding to the second location;
a second determining module, configured to cluster the plurality of target data based on the first similarity and a reference factor, where a result of the clustering is used to determine the target object to which the plurality of target data belong;
the second determining module is configured to, when clustering is performed on a plurality of target data based on a first similarity and a reference factor, further obtain weights of the first similarity and a second similarity, and perform weighting processing on the first similarity and the second similarity by using the weights to obtain a fusion similarity of the plurality of target data; and clustering the plurality of target data based on the fusion similarity.
17. An electronic device comprising a memory and a processor coupled to each other;
the processor is configured to execute the program instructions stored in the memory to implement the data clustering method of any one of claims 1 to 15.
18. A computer readable storage medium having stored thereon program instructions, which when executed by a processor implement the data clustering method of any one of claims 1 to 15.
CN202011172426.7A 2020-10-28 2020-10-28 Data clustering method and device, electronic equipment and storage medium Active CN112307938B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202011172426.7A CN112307938B (en) 2020-10-28 2020-10-28 Data clustering method and device, electronic equipment and storage medium
PCT/CN2020/131241 WO2022088331A1 (en) 2020-10-28 2020-11-24 Data clustering method, device thereof, electronic device, storage medium, and program
TW109144955A TWI767459B (en) 2020-10-28 2020-12-18 Data clustering method, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011172426.7A CN112307938B (en) 2020-10-28 2020-10-28 Data clustering method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112307938A CN112307938A (en) 2021-02-02
CN112307938B true CN112307938B (en) 2022-11-11

Family

ID=74331400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011172426.7A Active CN112307938B (en) 2020-10-28 2020-10-28 Data clustering method and device, electronic equipment and storage medium

Country Status (3)

Country Link
CN (1) CN112307938B (en)
TW (1) TWI767459B (en)
WO (1) WO2022088331A1 (en)

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003330583A (en) * 2002-05-17 2003-11-21 Hitachi Software Eng Co Ltd Cluster analysis system
US20090290791A1 (en) * 2008-05-20 2009-11-26 Holub Alex David Automatic tracking of people and bodies in video
US10403016B2 (en) * 2017-06-02 2019-09-03 Apple Inc. Face syncing in distributed computing environment
CN108154171B (en) * 2017-12-20 2021-04-23 北京奇艺世纪科技有限公司 Figure identification method and device and electronic equipment
CN109117803B (en) * 2018-08-21 2021-08-24 腾讯科技(深圳)有限公司 Face image clustering method and device, server and storage medium
CN109753920B (en) * 2018-12-29 2021-09-17 深圳市商汤科技有限公司 Pedestrian identification method and device
CN109800744B (en) * 2019-03-18 2021-08-20 深圳市商汤科技有限公司 Image clustering method and device, electronic equipment and storage medium
CN110245679B (en) * 2019-05-08 2021-12-28 北京旷视科技有限公司 Image clustering method and device, electronic equipment and computer readable storage medium
CN111079648A (en) * 2019-12-16 2020-04-28 北京旷视科技有限公司 Data set cleaning method and device and electronic system
CN111291678B (en) * 2020-02-06 2024-01-12 北京爱笔科技有限公司 Face image clustering method and device based on multi-feature fusion

Also Published As

Publication number Publication date
TWI767459B (en) 2022-06-11
TW202217594A (en) 2022-05-01
CN112307938A (en) 2021-02-02
WO2022088331A1 (en) 2022-05-05

Similar Documents

Publication Publication Date Title
CN110826519A (en) Face occlusion detection method and device, computer equipment and storage medium
US9286537B2 (en) System and method for classifying a skin infection
WO2020248387A1 (en) Face recognition method and apparatus based on multiple cameras, and terminal and storage medium
US20080260239A1 (en) Object image detection method
WO2021120961A1 (en) Brain addiction structure map evaluation method and apparatus
JP7071991B2 (en) Methods and equipment for inspecting certificates and bearers
CN111738302B (en) System for classifying and diagnosing Alzheimer's disease based on multi-modal data
CN110363760B (en) Computer system for recognizing medical images
Chaddad et al. Deep radiomic analysis based on modeling information flow in convolutional neural networks
CN110991249A (en) Face detection method, face detection device, electronic equipment and medium
US20240087368A1 (en) Companion animal life management system and method therefor
CN112836653A (en) Face privacy method, device and apparatus and computer storage medium
WO2021082433A1 (en) Digital pathological image quality control method and apparatus
CN112329586A (en) Client return visit method and device based on emotion recognition and computer equipment
CN114973211A (en) Object identification method, device, equipment and storage medium
CN112307938B (en) Data clustering method and device, electronic equipment and storage medium
WO2021259604A1 (en) Domain aware medical image classifier interpretation by counterfactual impact analysis
CN109117800A (en) Face gender identification method and system based on convolutional neural networks
CN117352164A (en) Multi-mode tumor detection and diagnosis platform based on artificial intelligence and processing method thereof
CN109508699B (en) Old man's seat occupies warning system and corresponding terminal
CN114445679A (en) Model training method, related device, equipment and storage medium
CN112967216B (en) Method, device, equipment and storage medium for detecting key points of face image
CN115035450A (en) Method and device for determining animal species
CN114612995A (en) Face feature recognition method and device
CN113705366A (en) Personnel management system identity identification method and device and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40036823

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant