CN113221786A

CN113221786A - Data classification method and device, electronic equipment and storage medium

Info

Publication number: CN113221786A
Application number: CN202110556441.XA
Authority: CN
Inventors: 张丹丹; 王长春
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2021-05-21
Filing date: 2021-05-21
Publication date: 2021-08-06
Also published as: WO2022242032A1

Abstract

The embodiment of the disclosure discloses a data classification method and device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a plurality of views to be classified, and extracting a face image contained in each view of the views to be classified to obtain a plurality of face images; clustering a plurality of face images to obtain at least one image set; the face images in each image set correspond to the same person, and each face image in each image set carries an authenticity detection result representing authenticity of the image.

Description

Data classification method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a data classification method and apparatus, an electronic device, and a storage medium.

Background

Massive images and videos exist on the Internet, and users can find out the needed images and videos from the images and videos for classification according to actual requirements.

At present, generally, the classification mode of images and videos is that portrait images containing specific characters are utilized, all images and videos containing the characters are searched out from the internet in a graph search mode and classified into one type, the intelligence of data classification is low, and the effect is poor.

Disclosure of Invention

The embodiment of the disclosure is expected to provide a data classification method and device, electronic equipment and a storage medium.

The technical scheme of the embodiment of the disclosure is realized as follows:

the embodiment of the disclosure provides a data classification method, which comprises the following steps:

acquiring a plurality of views to be classified, and extracting a face image contained in each view of the views to be classified to obtain a plurality of face images;

clustering the face images to obtain at least one image set; and each face image in each image set corresponds to the same person, and each face image in each image set carries an authenticity detection result for representing authenticity of the image.

In the above method, the clustering the face images to obtain at least one image set includes:

performing depth forgery detection on each face image in the plurality of face images to obtain a plurality of authenticity detection results corresponding to the plurality of face images one by one;

extracting the features of each face image in the face images to obtain a plurality of groups of face features which correspond to the face images one by one;

and dividing the face images corresponding to the same person in the plurality of face images into the same set by using the plurality of groups of face features, and carrying the corresponding authenticity detection results in the plurality of authenticity detection results on each face image contained in each divided set to obtain at least one image set.

In the above method, after clustering the face images to obtain at least one image set, the method further includes: acquiring at least one class center information corresponding to the at least one image set one by one;

and for each image set in the at least one image set, performing library collision by using the corresponding class center information in the at least one class center information and a preset portrait library to determine corresponding label information.

In the above method, determining, by using the class center information corresponding to the at least one class center information to collide with a preset portrait library, corresponding tag information for each image set in the at least one image set includes:

searching a first face image matched with the first type center information from the preset face database; the first type center information is type center information corresponding to a first image set, and the first image set is any one of the at least one image set;

and under the condition that the first face image is found, determining the identity information corresponding to the first face image in the preset face image library as the label information corresponding to the first image set.

In the above method, after the searching for the first face image matching with the first type center information from the preset person image library, the method further includes:

and under the condition that the first face image is not found, determining that the label information corresponding to the first image set is an anonymous identity.

In the above method, after clustering the face images to obtain at least one image set, the method further includes:

and adding the view of each face image in the multiple views to be classified in each image set of the at least one image set to obtain at least one view set.

In the above method, the method further includes a publisher archive, where the publisher archive includes identity information and published views of different publishers, and after obtaining at least one view set, the method further includes:

searching publisher information of each view in the at least one view set from the publisher archive;

and associating each view with the identity information of the corresponding publisher in the at least one view set.

The embodiment of the present disclosure provides a data classification apparatus, including:

the data processing module is used for acquiring a plurality of views to be classified, extracting a face image contained in each view of the views to be classified and obtaining a plurality of face images;

the data classification module is used for clustering the face images to obtain at least one image set; and each face image in each image set corresponds to the same person, and each face image in each image set carries an authenticity detection result for representing authenticity of the image.

In the above apparatus, the data classification module is specifically configured to perform depth forgery detection on each of the face images to obtain a plurality of authenticity detection results corresponding to the face images one to one; extracting the features of each face image in the face images to obtain a plurality of groups of face features which correspond to the face images one by one; and dividing the face images corresponding to the same person in the plurality of face images into the same set by using the plurality of groups of face features, and carrying the corresponding authenticity detection result in the plurality of authenticity detection results on each face image contained in each divided set to obtain at least one image set.

In the above apparatus, the data classification module is further configured to obtain at least one class center information corresponding to the at least one image set one to one; and for each image set in the at least one image set, performing library collision by using the corresponding class center information in the at least one class center information and a preset portrait library to determine corresponding label information.

In the device, the data classification module is specifically configured to search a first face image matched with first-class central information from the preset portrait library; the first type center information is type center information corresponding to a first image set, and the first image set is any one of the at least one image set; and under the condition that the first face image is found, determining the identity information corresponding to the first face image in the preset face image library as the label information corresponding to the first image set.

In the above apparatus, the data classification module is further configured to determine that the tag information corresponding to the first image set is an anonymous identity when the first face image is not found.

In the above apparatus, the data classification module is further configured to add, to each image set of the at least one image set, a view to which each face image included in the plurality of views to be classified belongs, to obtain at least one view set.

In the above apparatus, the apparatus further includes a publisher archive, where the publisher archive includes identity information and published views of different publishers, and the data classification module is further configured to search, from the publisher archive, identity information of publishers of each view in the at least one view set; and associating each view with the identity information of the corresponding publisher in the at least one view set.

An embodiment of the present disclosure provides an electronic device, including: a processor, a memory, and a communication bus; wherein the content of the first and second substances,

the communication bus is used for realizing connection communication between the processor and the memory;

the processor is used for executing one or more programs stored in the memory so as to realize the data classification method.

The disclosed embodiments provide a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the above-described data classification method.

The embodiment of the disclosure provides a data classification method and device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of views to be classified, and extracting a face image contained in each view of the views to be classified to obtain a plurality of face images; clustering a plurality of face images to obtain at least one image set; the face images in each image set correspond to the same person, and each face image in each image set carries an authenticity detection result representing authenticity of the image. According to the technical scheme provided by the embodiment of the disclosure, all images and videos are classified by taking different people as dimensions, and each face image included in each divided set carries information representing authenticity of the image, so that intelligence and effect of data classification are improved.

Drawings

Fig. 1 is a first schematic flow chart of a data classification method according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart illustrating a data classification method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an exemplary data classification process provided by an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a data classification apparatus according to an embodiment of the disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure.

The present disclosure provides a data classification method, an execution subject of which may be a data classification apparatus, for example, the data classification method may be executed by a terminal device or a server or other electronic devices, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the data classification method may be implemented by way of a processor invoking computer readable instructions stored in a memory.

Fig. 1 is a first schematic flow chart of a data classification method according to an embodiment of the present disclosure. As shown in fig. 1, in the embodiment of the present disclosure, the data classification method mainly includes the following steps:

s101, obtaining a plurality of views to be classified, and extracting a face image contained in each view of the views to be classified to obtain a plurality of face images.

In the embodiment of the disclosure, the data classification device may acquire a plurality of views to be classified, so as to extract a face image included in each view to be classified, and obtain a plurality of face images.

It should be noted that, in the embodiment of the present disclosure, the multiple views to be classified may be images and videos published on various internet platforms and social media. The specific sources of the multiple views to be classified are not limiting in the embodiments of the present disclosure.

It can be understood that, in the embodiment of the present disclosure, the data classification apparatus may perform face recognition and extraction on each view to be classified, so as to obtain a face image therein. In addition, the data classification device may further extract other information such as a timestamp from each view to be classified, and the embodiment of the present disclosure is not limited.

S102, clustering a plurality of face images to obtain at least one image set; the face images in each image set correspond to the same person, and each face image in each image set carries an authenticity detection result representing authenticity of the image.

In the embodiment of the disclosure, after obtaining a plurality of face images, the data classification device may cluster the plurality of face images, so as to divide the face images corresponding to the same person into the same set, to obtain at least one image set, and each face image in each image set carries an authenticity detection result representing authenticity of the image.

Specifically, in an embodiment of the present disclosure, the clustering a plurality of facial images by a data classification device to obtain at least one image set includes: carrying out depth forgery detection on each face image in the plurality of face images to obtain a plurality of authenticity detection results which correspond to the plurality of face images one by one; extracting the features of each face image in the face images to obtain a plurality of groups of face features which correspond to the face images one by one; the face images corresponding to the same person in the face images are divided into the same set by using multiple groups of face features, and each face image in each divided set carries the corresponding authenticity detection result in the authenticity detection results to obtain at least one image set.

It can be understood that, in the embodiment of the present disclosure, the data classification device may perform depth forgery detection on each face image by using a specific depth forgery detection algorithm, so as to obtain an authenticity detection result of each face image. For any face image, if the authenticity detection result is counterfeit, that is, the face image is subjected to depth forgery, correspondingly, the view to be classified to which the face image belongs is forged, and if the authenticity detection result is real, that is, the face image is not subjected to depth forgery, correspondingly, the view to be classified to which the face image belongs is real.

It should be noted that, in the embodiment of the present disclosure, the data classification device may divide the face images corresponding to the same person in the plurality of face images into the same image set by using a plurality of sets of face features corresponding to the plurality of face images one to one. The data classification device can utilize a specific feature extraction algorithm or a specific feature extraction model to realize the extraction of the face features in each face image, so that whether different face images correspond to the same person or not is determined by comparing the similarity of the face features, the division of an image set is realized, and the intelligence of data classification is improved. In addition, the data classification device also obtains the authenticity detection result of each face image, so that each face image can carry a corresponding authenticity detection result in each divided set, namely, the image set actually comprises not only the face image but also the authenticity information of the face image, and a user can directly know the authenticity of the image when viewing the image in the follow-up process, thereby improving the data classification effect.

It should be noted that, in the embodiment of the present disclosure, the data classification device may further associate the authenticity detection results of different face images with the views to be classified to which the corresponding face images belong.

It can be understood that, in the embodiment of the present disclosure, since the data classification device associates the authenticity detection result of the face image included in each view to be classified with each view to be classified, a user can directly know whether the view is real or not when looking at any view to be classified subsequently.

Fig. 2 is a schematic flow chart illustrating a data classification method according to an embodiment of the present disclosure. As shown in fig. 2, in the embodiment of the present disclosure, after clustering a plurality of facial images to obtain at least one image set, that is, after performing step S102, the data classification apparatus may further perform the following steps:

s201, acquiring at least one piece of class center information corresponding to at least one image set one by one.

In an embodiment of the present disclosure, the data classification apparatus may acquire at least one class center information in one-to-one correspondence with at least one image set.

It should be noted that, in the embodiment of the present disclosure, the data classification device obtains at least one class center information in one-to-one correspondence with at least one image set, specifically, obtains a specific feature of a face image included in each image set, or selects one face image from each image set according to a specific rule, so as to serve as the corresponding class center information. For example, the face image with the highest definition may be selected from each view set, or a face image of the front side may be selected from each view set as the corresponding class center information. The specific class center information may be set according to actual needs and application scenarios, and the embodiments of the present disclosure are not limited.

S202, for each image set in at least one image set, performing library collision by using corresponding class center information in at least one class center information and a preset portrait library to determine corresponding label information.

In the embodiment of the present disclosure, at least one image set actually corresponds to at least one person one to one, and the data classification device may perform library collision with a preset person library by using the class center information corresponding to each image set to determine corresponding tag information, that is, identity information.

Specifically, in an embodiment of the present disclosure, the determining, by the data classification device, for each image set in the at least one image set, corresponding tag information by using class center information corresponding to the at least one class center information to collide with a preset portrait library includes: searching a second face image matched with the first type of central information from a preset face database; the first type center information is type center information corresponding to a first image set, and the first image set is any one of at least one image set; and under the condition that the first face image is found, determining the identity information corresponding to the first face image in the preset face image library as the label information corresponding to the first image set.

Specifically, in the embodiment of the present disclosure, after the data classification device searches for the first face image matched with the first type of center information from the preset face library, the following steps may also be performed: and under the condition that the first face image is not found, determining that the label information corresponding to the first image set is an anonymous identity.

It should be noted that, in the embodiment of the present disclosure, a large number of face images and identity information corresponding to each face image are stored in the preset face database.

For example, in an embodiment of the present disclosure, for any one image set of at least one image set, that is, the first image set, the data classification device may select one face image from the at least one image set as the first type of central information, and the data classification device may compare the selected face image with face images included in the preset face database one by one, so as to find a matching first face image. If the first face image is not found, it is indicated that the preset face library does not contain the face image of the person corresponding to the first face image, that is, the identity of the person corresponding to the face image in the first image set cannot be obtained, therefore, it is determined that the tag information corresponding to the first image set is an anonymous identity, if the first face image is found, the identity information corresponding to the first face image can be directly obtained, and the identity information is used as the tag information of the first image set.

It can be understood that, in the embodiment of the present disclosure, the data classification device determines the tag information corresponding to each image set, and when a user views any image set, the specific identity of the person corresponding to all the face images included in the image set can be directly obtained according to the tag information.

In the embodiment of the present disclosure, after the data classification apparatus clusters a plurality of face images to obtain at least one image set, that is, after step S102 is executed, the following steps may also be executed: and adding views of each face image contained in the multiple views to be classified in each image set of at least one image set to obtain at least one view set.

It can be understood that, in the embodiment of the present disclosure, after obtaining at least one image set, the data classification apparatus may add, to each image set, a view to which each face image included in the image set belongs, to obtain at least one view set, so as to implement classification of multiple views to be classified.

It is understood that, in the embodiment of the present disclosure, the facial images included in at least one image set are extracted from the views to be classified, so that, for each image set, the data classification apparatus may put the views to be classified to which the facial images included in the set belong together into the set, thereby obtaining a view set, and for at least one view set, the views in the same view set correspond to the same person, and the views in different view sets correspond to different persons. One view set includes not only a face image of a person but also other videos and images including the person.

For example, in the embodiment of the present disclosure, at least one image set includes an image set a, where the image set a includes a face image a1, a face image a2, a face image A3, and a face image a4, the data classification apparatus may add, in the multiple views to be classified, the view a1 to which the face image a1 belongs, the view a2 to which the face image a2 belongs, the view A3 to which the face image A3 belongs, and the view a4 to which the face image a4 belongs to the image set a, and the added image set a may be determined as the view set a.

It is understood that, in the embodiment of the present disclosure, some views of the multiple views to be classified may include multiple people, that is, multiple faces, and when the data classification apparatus adds, to each image set of at least one image set, a view to which each face image included in the multiple views to be classified belongs, the views actually including the multiple people are added to the image sets to which different face images in the views belong, respectively.

In an embodiment of the present disclosure, the system further includes a publisher archive, where the publisher archive includes identity information and published views of different publishers, and after obtaining at least one view set, the data classification apparatus may further perform the following steps: searching the identity information of the publisher of each view in at least one view set from the publisher archive; and associating each view with the identity information of the corresponding publisher in at least one view set.

It can be understood that, in the embodiment of the present disclosure, the data classification apparatus may find out the identity information of the publisher corresponding to each video and image from the publisher archive, so as to associate the identity information with the publisher, thereby facilitating the analysis and tracing of the images and videos.

Fig. 3 is a schematic diagram of an exemplary data classification process provided in an embodiment of the present disclosure. As shown in fig. 3, in the case where the data classification device acquires a plurality of views to be classified first, face recognition is performed for each view, thereby extracting the face images, further carrying out depth forgery detection, then, carrying out feature extraction on each face image, thereby clustering the face images by using the face characteristics, and in each obtained set, each face image carries the authenticity detection result corresponding to the image to obtain at least one image set, therefore, in each image set, adding the view of each face image in a plurality of views to be classified to obtain at least one view set, and finally, and selecting a face image from each view set as class center information, and colliding the face image with a preset face library to obtain label information corresponding to the view set. It should be noted that, when at least one image set is obtained, the data classification device may select one face image from each image combination as class center information to perform collision library, so as to determine the label information of the image set, where each image set is actually the same as the label information of the view set constructed based on the image set. In addition, the data classification device can search and associate the identity information of the publishers corresponding to the video and the image contained in each view set from the publisher archive. And for the videos and images in the view set, the authenticity detection result corresponding to the face image can be associated to represent whether the face image is real or fake.

The embodiment of the disclosure provides a data classification method, which includes: acquiring a plurality of views to be classified, and extracting a face image contained in each view of the views to be classified to obtain a plurality of face images; clustering a plurality of face images to obtain at least one image set; the face images in each image set correspond to the same person, and each face image in each image set carries an authenticity detection result representing authenticity of the image. According to the data classification method provided by the embodiment of the disclosure, all images and videos are classified by taking different people as dimensions, and each face image included in each divided set carries information representing authenticity of the image, so that intelligence and effect of data classification are improved.

The embodiment of the disclosure provides a data classification device. Fig. 4 is a schematic structural diagram of a data classification apparatus according to an embodiment of the present disclosure. As shown in fig. 4, in an embodiment of the present disclosure, a data classification apparatus includes:

the data processing module 401 is configured to acquire a plurality of views to be classified, and extract a face image included in each view of the plurality of views to be classified to obtain a plurality of face images;

a data classification module 402, configured to cluster the face images to obtain at least one image set; and each face image in each image set corresponds to the same person, and each face image in each image set carries an authenticity detection result for representing authenticity of the image.

In an embodiment of the present disclosure, the data classification module 402 is specifically configured to perform depth forgery detection on each of the face images to obtain a plurality of authenticity detection results corresponding to the face images one by one; extracting the features of each face image in the face images to obtain a plurality of groups of face features which correspond to the face images one by one; and dividing the face images corresponding to the same person in the plurality of face images into the same set by using the plurality of groups of face features, and carrying the corresponding authenticity detection result in the plurality of authenticity detection results on each face image contained in each divided set to obtain at least one image set.

In an embodiment of the present disclosure, the data classification module 402 is further configured to obtain at least one class center information corresponding to the at least one image set one to one; and for each image set in the at least one image set, performing library collision by using the corresponding class center information in the at least one class center information and a preset portrait library to determine corresponding label information.

In an embodiment of the present disclosure, the data classification module 402 is specifically configured to search a second face image matched with the first type center information from the preset face database; the first type center information is type center information corresponding to a first image set, and the first image set is any one of the at least one image set; and under the condition that the first face image is found, determining the identity information corresponding to the first face image in the preset face image library as the label information corresponding to the first image set.

In an embodiment of the present disclosure, the data classification module 402 is further configured to determine that the tag information corresponding to the first image set is an anonymous identity when the first facial image is not found.

In an embodiment of the present disclosure, the data classification module 402 is further configured to add, to each image set of the at least one image set, a view to which each face image included in the at least one image set belongs in the multiple views to be classified, so as to obtain at least one view set.

In an embodiment of the present disclosure, the system further includes a publisher archive, where the publisher archive includes identity information and published views of different publishers, and the data classification module 402 is further configured to search, from the publisher archive, identity information of publishers of each view in the at least one view set; and associating each view with the identity information of the corresponding publisher in the at least one view set.

The embodiment of the disclosure provides a data classification device, which is used for acquiring a plurality of views to be classified, extracting a face image contained in each view of the plurality of views to be classified and obtaining a plurality of face images; clustering a plurality of face images to obtain at least one image set; the face images in each image set correspond to the same person, and each face image in each image set carries an authenticity detection result representing authenticity of the image. The data classification device provided by the embodiment of the disclosure classifies all images and videos by taking different people as dimensions, and each face image included in each divided set carries information representing authenticity of the image, so that intelligence and effect of data classification are improved.

The embodiment of the disclosure provides an electronic device. Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 5, the electronic apparatus includes: a processor 501, a memory 502, and a communication bus 503; wherein the content of the first and second substances,

the communication bus 503 is used for realizing connection communication between the processor 501 and the memory 502;

the processor 501 is configured to execute one or more programs stored in the memory 502 to implement the data classification method.

Embodiments of the present disclosure also provide a computer-readable storage medium storing one or more programs, which may be executed by one or more processors to implement the data classification method described above. The computer-readable storage medium may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (Hard Disk Drive, HDD) or a Solid-State Drive (SSD); or may be a respective device, such as a mobile phone, computer, tablet device, personal digital assistant, etc., that includes one or any combination of the above-mentioned memories.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable signal processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable signal processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable signal processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable signal processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only for the preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure.

Claims

1. A method of data classification, the method comprising:

2. The method of claim 1, wherein clustering the plurality of facial images into at least one image set comprises:

3. The method according to claim 1 or 2, wherein after clustering the plurality of facial images to obtain at least one image set, the method further comprises:

acquiring at least one class center information corresponding to the at least one image set one by one;

4. The method of claim 3, wherein the determining, for each image set of the at least one image set, corresponding tag information by using class center information corresponding to the at least one class center information to collide with a preset portrait library comprises:

5. The method according to claim 4, wherein after searching the preset portrait base for the first facial image matching with the first type center information, the method further comprises:

6. The method of claim 1, wherein after clustering the plurality of facial images into at least one image set, the method further comprises:

7. The method of claim 6, further comprising a publisher archive, the publisher archive including identity information and published views of different publishers, and after obtaining at least one view set, the method further comprising:

searching the identity information of the publisher of each view in the at least one view set from the publisher archive;

8. A data sorting apparatus, comprising:

9. An electronic device, comprising: a processor, a memory, and a communication bus; wherein the content of the first and second substances,

the processor, configured to execute one or more programs stored in the memory to implement the data classification method of any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more programs which are executable by one or more processors to implement the data classification method of any one of claims 1-7.