CN113032610A

CN113032610A - Archive management method, device, equipment and computer readable storage medium

Info

Publication number: CN113032610A
Application number: CN201911356230.0A
Authority: CN
Inventors: 戴世稳
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2021-06-25
Anticipated expiration: 2039-12-25
Also published as: CN113032610B

Abstract

The invention provides a method, a device, equipment and a computer readable storage medium for file management, wherein the method comprises the following steps: respectively taking every two archival data in a plurality of archival data as an archival data group, calculating a similarity value between archival cover images in the archival data group, and taking the similarity value as a first similarity of the archival data group; each of the archive data comprises an archive cover image and an archive characteristic event; determining a target archive data set from a plurality of archive data sets according to the acquired first similarity; and determining similar archival data in the archival data according to the determined archival characteristic events of the archival data contained in the target archival data group. The invention can improve the accuracy of searching similar files.

Description

Archive management method, device, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of archive management technologies, and in particular, to an archive management method, apparatus, device, and computer-readable storage medium.

Background

With the progress of society, the circulation of personnel is more common, and the difficulty of personnel management is increased. Based on this, some departments or systems manage personnel by establishing a personnel profile. However, in the process of filing and archiving, the event data (i.e. the characteristic value of the snapshot face data) is only compared with the file cover image (i.e. each file has a cover characteristic value which represents the person) in a ratio of 1:1 or a plurality of covers are filed in a ratio of 1: N, and the event data is not filed successfully due to the angle and the light of the snapshot face, whether accessories are worn or not, so that the same person is repeatedly filed, one person has a plurality of files, and the management workload of the file data is increased. In order to reduce the number of files for one person and facilitate the inquiry of file data, when managing the file data, similar files in a plurality of file data need to be searched, but at present, the similar files are generally searched through the similarity between cover images, and the accuracy of searching the similar files is low due to the reasons of snapping the angle and light of a human face, wearing accessories and the like.

Disclosure of Invention

The invention provides a method, a device and equipment for managing files and a computer readable storage medium, and aims to solve the problem of low accuracy rate of searching similar files.

In order to achieve the above object, in a first aspect, an embodiment of the present invention provides a file management method, including:

respectively taking every two archival data in a plurality of archival data as an archival data group, calculating a similarity value between archival cover images in the archival data group, and taking the similarity value as a first similarity of the archival data group; each of the archive data comprises an archive cover image and an archive characteristic event;

determining a target archive data set from a plurality of archive data sets according to the acquired first similarity;

and determining similar archival data in the archival data according to the determined archival characteristic events of the archival data contained in the target archival data group.

In a second aspect, an embodiment of the present invention further provides a file management apparatus, including:

the acquisition module is used for respectively taking every two archival data in the archival data as an archival data group, calculating a similarity value between archival cover images in the archival data group, and taking the similarity value as a first similarity of the archival data group; each of the archive data comprises an archive cover image and an archive characteristic event;

the first determining module is used for determining a target archive data set from the plurality of archive data sets according to the acquired first similarity;

and the second determining module is used for determining similar archival data in the archival data according to the determined archival characteristic event of the archival data contained in the target archival data group.

In a third aspect, an embodiment of the present invention further provides an archive management device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the archive management method when executing the computer program.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the archive management method described above.

The scheme of the invention has at least the following beneficial effects:

in the embodiment of the invention, the similarity value between the cover images of the archives in each archival data group is calculated by taking every two archival data in the archival data groups as an archival data group, and the similarity value is taken as the first similarity of the archival data group, and then the target archival data group is determined from the archival data groups according to the obtained first similarity; finally, according to the file characteristic events of the file data contained in the target file data group, the similar file data in the plurality of file data are determined, namely the similar files are searched through the file cover images and the file characteristic events in the file data, and compared with a mode of searching the similar files only through the similarity between the file cover images, the method can greatly improve the accuracy of searching the similar files.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a file management method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of distributed parallel computing according to an embodiment of the present invention;

FIG. 3 is a graph illustrating the result of a third similarity of profile signature events according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of the third similarity values in FIG. 3 after being sorted according to an embodiment of the present invention;

FIG. 5 is a schematic illustration of a prompt message in an example of an embodiment of the invention;

FIG. 6 is a schematic structural diagram of a file management apparatus according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating an embodiment of an archive management device.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

As shown in fig. 1, an embodiment of the present invention provides a file management method, including:

and step 11, respectively taking every two archival data in the archival data as an archival data group, calculating a similarity value between archival cover images in the archival data group, and taking the similarity value as a first similarity of the archival data group.

The file data is file data of personnel, and the file data comprises file cover images and file characteristic events. Wherein, archives cover image can be a face image of personnel (the form of this face image can be for lowering head, face up, side face etc.), and archives characteristic event can include the face image of this personnel when multiple different states, like the face image when wearing glasses, the face image when not wearing glasses, the face image when wearing hat, the face image when not wearing hat, the face image when raising head, the face image when lowering head, the face image when side face and the face image when laughing etc.. It is understood that, in order to improve the efficiency of obtaining the first similarity, the first similarity may be obtained by calculating the similarity of the feature values of the cover image. That is, the feature values of the cover image of the two archive data are extracted for the two archive data in each archive data set, and then the similarity between the two extracted feature values is calculated, and finally the similarity is used as the first similarity of the archive data set.

Specifically, the feature value of the file cover image of the file data can be extracted through an image feature extraction algorithm aiming at the extraction of the feature value of the file cover image, and the feature value can be specifically a human face feature value.

As a preferred example, to further improve the efficiency of obtaining the first similarity, the feature value may be a high-dimensional graph, such as a 512 × 2-dimensional feature graph. Of course, the similarity between the feature values can be quickly obtained by the currently general similarity calculation formula.

It should be noted that, in order to improve the efficiency of obtaining the first similarity and to improve the efficiency of searching for similar files, in an embodiment of the present invention, a plurality of similarity calculation units are used to respectively use every two file data in a plurality of file data as a file data group, calculate a similarity value between file cover images in each file data group, and use the similarity value as the first similarity of the file data group. Specifically, in the embodiment of the present invention, the first similarities of the plurality of archival data sets may be calculated in parallel in a distributed manner by the plurality of similarity calculation units. The similarity calculation Unit may be specifically a Graphics Processing Unit (GPU). In the description of distributed parallel computing with reference to fig. 2, assuming that there are n computing devices, the plurality of file data may be equally divided into n parts, and loaded into the GPUs of the n computing devices in parallel (i.e., one file data is loaded into each GPU), and simultaneously the plurality of file data in the file data pool 21 is sequentially loaded into the buffer queue 22; then, one file data is obtained from the buffer queue 22 at a time, and the obtained file data is loaded into n computing devices, respectively, and similarity calculation is performed with file cover images of the file data in GPUs of the n computing devices, respectively.

And step 12, determining a target archive data set from the plurality of archive data sets according to the acquired first similarity.

In the embodiment of the present invention, step 12 is performed to use two archive data preliminarily regarded as similar archive data as a target archive data set, so as to subsequently determine whether the two are similar archive data.

Specifically, in the embodiment of the present invention, the specific implementation manner of the step 12 may be: and judging whether the acquired first similarity is greater than a first preset threshold, and when the acquired first similarity is greater than the first preset threshold, taking the file data group corresponding to the first similarity as a target file data group.

It should be noted that when the obtained first similarity is greater than the first preset threshold, it is considered that the two archive data corresponding to the first similarity may be similar archive data, otherwise, it is considered that the two archive data corresponding to the first similarity may not be similar archive data. It is understood that, in the embodiment of the present invention, the first preset threshold may be set according to specific situations, such as 0.7. It should be understood that, in order to improve the determination efficiency of the target archive data set, step 12 may also be implemented in a distributed manner, that is, after the first similarity is obtained through calculation in step 11, each similarity calculation unit directly determines the first similarity, and if the first similarity obtained through calculation is greater than a first preset threshold, the archive data set corresponding to the first similarity is used as the target archive data set.

And step 13, determining similar archival data in the archival data according to the determined archival characteristic event of the archival data contained in the target archival data group.

In the embodiment of the present invention, after a target archive data set is determined from a plurality of archive data sets, for each target archive data set, according to archive feature events of two archive data included in the target archive data set, a similarity between the two archive data included in the target archive data set is determined, so as to determine similar archive data in the plurality of archive data.

It is worth mentioning that in the embodiment of the present invention, similar files are searched through the file cover images and the file characteristic events in each file data, the similarity between different file data can be finely calculated, and similar file determination is performed on the file data according to the calculated similarity, which greatly improves the accuracy of searching similar files compared with the mode of searching similar files only through the similarity between file cover images.

Next, a specific implementation of step 13 will be described.

Specifically, the specific implementation manner of step 13 includes the following steps:

step one, aiming at each determined target archive data set, acquiring second similarity of two archive data contained in the target archive data set according to archive characteristic events of the two archive data contained in the target archive data set to obtain a plurality of second similarity. That is, the second similarity of the two archival data included in each target archival data set is obtained according to the archival feature events of the two archival data included in the target archival data set.

In an embodiment of the present invention, a specific implementation manner of obtaining the second similarity between two archival data included in each target archival data set in the first step (i.e. obtaining the second similarity between two archival data included in the target archival data set according to archival feature events of the two archival data included in the target archival data set) includes the following steps:

the first step is to extract a plurality of archive characteristic events of preset categories from the archive characteristic events of the two archive data contained in the target archive data set. That is, a plurality of archive feature events of a preset category are extracted from the archive feature events of each archive data of the target archive data set.

It should be noted that, in order to improve the accuracy of searching for similar files, the file feature events of the multiple preset categories are representative high-quality events, such as a face image when glasses are worn, a face image when glasses are not worn, a face image when a hat is not worn, a face image when a head is raised, a face image when a head is lowered, and a face image when a face is on the side. And it can be understood that the number and form of the preset categories can be set according to actual conditions.

And secondly, respectively aiming at each preset type of file characteristic event, acquiring a characteristic value of the preset type of file characteristic event extracted from the two file data, calculating the similarity of the two acquired characteristic values, and taking the similarity as a third similarity of the preset type of file characteristic event in the two file data to obtain a plurality of third similarities.

It is understood that, in order to improve the efficiency of calculating the third similarity, the third similarity may be obtained by calculating the similarity of the feature values of the profile feature events. That is, for each preset category of file feature event, the feature value of the preset category of file feature event extracted from the two pieces of file data may be obtained, then the similarity between the two obtained feature values is calculated, and finally the similarity is used as the third similarity of the preset category of file feature event in the two pieces of file data.

Specifically, the feature value of the file cover image of the file data can be extracted through an image feature extraction algorithm aiming at the acquisition of the feature value of the file feature event, and the feature value can be specifically a human face feature value.

As a preferred example, to further improve the efficiency of obtaining the third similarity, the feature value may be a high-dimensional graph, such as a 512 × 2-dimensional graph. Of course, the similarity between the feature values can be quickly obtained by the currently general similarity calculation formula.

And thirdly, selecting a preset number of third similarity degrees from the plurality of third similarity degrees in descending order. That is, a preset number of third similarity degrees are selected from the plurality of third similarity degrees in order of the similarity numerical values from large to small. It should be noted that the preset number is smaller than or equal to the number of the preset categories. It should be understood that, in the embodiment of the present invention, the specific numerical value of the preset number is not limited, and may be set according to the actual situation.

And fourthly, calculating the average value of the selected third similarity of the preset number, and taking the average value as the second similarity of the two archival data contained in the target archival data group.

Here, for the convenience of understanding, the first step to the fourth step are explained with a specific example. In this example, it is assumed that the number of the preset categories is 7, which are face images when glasses are worn, face images when glasses are not worn, face images when a hat is not worn, face images when a head is raised, face images when a head is lowered, and face images when a face is on the side (note that, in the implementation of specific software coding, the category of the file feature event may be annotated by a field so as to extract the file feature event of the preset category), file feature events of the 7 categories are extracted from one file data included in the target file data set, which are a1 (face images when glasses are worn), a2 (face images when glasses are not worn), a3 (face images when a hat is worn), a4 (face images when a hat is not worn), a5 (face images when a head is raised), a6 (face images when a head is lowered), and a7 (face images when a face is on the side), the 7 types of archival feature events extracted from the other archival data included in the target archival data set are b1 (face image with glasses), b2 (face image without glasses), b3 (face image with hat), b4 (face image without hat), b5 (face image with head up), b6 (face image with head down), and b7 (face image with side face); then, as shown in fig. 3, respectively calculating a third similarity of the archival feature events of each category to obtain 7 third similarities, where the third similarity of a1 and b1 is 0.94, the third similarity of a2 and b2 is 0.95, the third similarity of a3 and b3 is 0.91, the third similarity of a4 and b4 is 0.85, the third similarity of a5 and b5 is 0.90, the third similarity of a6 and b6 is 0.96, and the third similarity of a7 and b7 is 0.91; then, as shown in fig. 4, the 7 third similarities are sorted from large to small, and 3 third similarities with the largest value (i.e. 0.96, 0.95 and 0.94) are selected from the sorted third similarities; finally, the average of the three third similarities (i.e., 0.96, 0.95, and 0.94) is calculated, and the calculated average of 0.95 is used as the second similarity of the two archival data included in the target archival data set.

It can be understood that, in order to improve the efficiency of obtaining the plurality of second similarities, the first step may be implemented in a distributed manner, that is, the plurality of target archive data sets are uniformly distributed to different device work processes (workers), so as to calculate the second similarities of the two archive data in each target archive data set in a distributed manner.

And step two, determining similar archival data in the archival data according to the second similarities.

Specifically, in the embodiment of the present invention, a specific implementation manner of the step two is as follows: firstly, taking the archival data corresponding to the plurality of second similarities as target archival data; and then, regarding each target file data, taking the file data with the second similarity greater than a second preset threshold value in each target file data as the similar file data of the target file data. That is, first, the archive data corresponding to each second similarity obtained in the first step is used as a target archive data, and then, for each target archive data, the archive data of which the second similarity with the target archive data is greater than a second preset threshold value among all the target archive data is used as the similar archive data of the target archive data, so that the similar archive data in the plurality of archive data can be obtained. The second preset threshold may be set according to actual conditions, for example, set to 0.85.

It should be noted that, in order to distinguish and identify each archive data, when each archive data is archived, each archive data is configured with a corresponding identification information (e.g., a number), that is, each archive data in the archive data pool has a corresponding identification information. Therefore, when the similar file data of each target file data is determined, file data with the second similarity greater than a second preset threshold with the target file data can be searched in a traversing mode according to the identification information of each target file data.

In addition, in the embodiment of the present invention, after determining similar archive data in the plurality of archive data, the archive management method further includes the following steps: and respectively displaying a prompt message for prompting that similar file data exists in the target file data aiming at each target file data so as to facilitate the user to process (such as merging, checking and the like) the similar file data. Wherein the prompt message includes identification information of similar profile data of the target profile data. Here, the above prompt information is described as a specific example. In this example, assuming that the target file data is file data a, the similar file data of file data a is file data c, file data d, file data e and file data b, wherein a, b, c, d and e all represent identification information of file data, the prompt information may be as shown in fig. 5, the second similarity between file data a and file data c in fig. 5 is 0.95, the second similarity between file data a and file data d is 0.90, the second similarity between file data a and file data e is 0.90, and the second similarity between file data a and file data b is 0.85.

As shown in fig. 6, an embodiment of the present invention further provides a file management apparatus, including: an acquisition module 61, a first determination module 62 and a second determination module 63.

The acquiring module 61 is configured to use every two archival data in the archival data sets as an archival data set, calculate a similarity value between archival cover images in the archival data sets, and use the similarity value as a first similarity of the archival data sets; each of the archive data comprises an archive cover image and an archive characteristic event;

a first determining module 62, configured to determine a target archive data set from the plurality of archive data sets according to the obtained first similarity;

the second determining module 63 is configured to determine similar archival data in the plurality of archival data according to the determined archival feature event of the archival data included in the target archival data set.

In the embodiment of the present invention, the file management device 60 is a device corresponding to the file management method, and can improve the accuracy of searching for similar files.

It should be noted that the file management apparatus 60 includes all modules or units for implementing the file management method, and in order to avoid too many repetitions, the modules or units of the file management apparatus 60 are not described herein again.

As shown in fig. 7, the embodiment of the present invention further provides an archive management device, which includes a memory 71, a processor 72, and a computer program 73 stored in the memory 71 and operable on the processor 72, wherein the processor 72 executes the computer program 73 to implement the steps of the archive management method described above.

Specifically, the processor 72 of the archive management device 70 implements the following steps when executing the computer program 73: respectively taking every two archival data in a plurality of archival data as an archival data group, calculating a similarity value between archival cover images in the archival data group, and taking the similarity value as a first similarity of the archival data group; each of the archive data comprises an archive cover image and an archive characteristic event; determining a target archive data set from a plurality of archive data sets according to the acquired first similarity; and determining similar archival data in the archival data according to the determined archival characteristic events of the archival data contained in the target archival data group.

Optionally, when the processor 72 of the archive management device 70 executes the computer program 73, the following steps are further implemented: judging whether the acquired first similarity is greater than a first preset threshold value or not; and when the acquired first similarity is larger than the first preset threshold, taking the file data group corresponding to the first similarity as a target file data group.

Optionally, when the processor 72 of the archive management device 70 executes the computer program 73, the following steps are further implemented: respectively aiming at each determined target archive data set, acquiring second similarity of two archive data contained in the target archive data set according to archive characteristic events of the two archive data contained in the target archive data set to obtain a plurality of second similarity; and determining similar archival data in the archival data according to the second similarities.

Optionally, when the processor 72 of the archive management device 70 executes the computer program 73, the following steps are further implemented: extracting a plurality of archive characteristic events of preset categories from archive characteristic events of two archive data contained in the target archive data set respectively; respectively aiming at the archive characteristic events of each preset type, calculating a third similarity of the archive characteristic events of the preset type extracted from the two archive data to obtain a plurality of third similarities; selecting a preset number of third similarity degrees from the plurality of third similarity degrees in descending order; wherein the preset number is less than or equal to the number of the preset categories; and calculating the average value of the selected third similarity of the preset number, and taking the average value as the second similarity of the two archival data contained in the target archival data group.

Optionally, when the processor 72 of the archive management device 70 executes the computer program 73, the following steps are further implemented: taking the archival data corresponding to the plurality of second similarities as target archival data; and respectively aiming at each target file data, taking the file data of which the second similarity with the target file data is greater than a second preset threshold value in each target file data as the similar file data of the target file data.

Optionally, when the processor 72 of the archive management device 70 executes the computer program 73, the following steps are further implemented: respectively displaying a prompt message for prompting that similar file data exists in the target file data aiming at each target file data; wherein the prompt message includes identification information of similar profile data of the target profile data.

Optionally, when the processor 72 of the archive management device 70 executes the computer program 73, the following steps are further implemented: through a plurality of similarity calculation units, respectively using every two file data in the plurality of file data as a file data group, calculating the similarity value between file cover images in each file data group, and using the similarity value as the first similarity of the file data group.

That is, in the embodiment of the present invention, the processor 72 of the archive management device 70, when executing the computer program 73, implements the steps of the archive management method described above, so as to improve the accuracy of searching for similar archives.

Illustratively, the above-described computer program 73 may be partitioned into one or more modules/units, which are stored in the memory 71 and executed by the processor 72 to implement the present invention. And the one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 73 in the archive management device 70.

The archive management device 70 of the swarm network may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The archive management device 70 of the swarm relational network may include, but is not limited to, a processor 72, and a memory 71. It will be understood by those skilled in the art that the illustrated diagram is merely an example of the archive management device 70 and is not intended to limit the archive management device 70 and may include more or fewer components than shown, or some components in combination, or different components, e.g., the archive management device 70 may also include input output devices, network access devices, buses, etc.

The Processor 72 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor 72 may be any conventional processor or the like, the processor 72 being the control center for the archive management device 70 and the various components of the overall archive management device 70 connected by various interfaces and lines.

The memory 71 may be used for storing computer programs 73 and/or modules, and the processor 72 may implement various functions of the archive management device 70 by running or executing the computer programs 73 and/or modules stored in the memory 71 and calling data stored in the memory 71. Specifically, the memory 71 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 71 may include a high speed random access memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

It should be noted that, since the processor 72 of the archive management device 70 executes the computer program 73 to implement the steps of the archive management method, all the embodiments of the archive management method can be applied to the archive management device 70, and the same or similar beneficial effects can be achieved.

Furthermore, an embodiment of the present invention also provides a computer-readable storage medium, which stores a computer program, and the computer program realizes the steps of the above-mentioned archive management method when being executed by a processor.

That is, in the embodiment of the present invention, when the computer program of the computer readable storage medium is executed by the processor, the steps of the above-mentioned file management method are implemented, so as to improve the accuracy of searching for similar files.

Illustratively, the computer program of the computer-readable storage medium comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

It should be noted that, since the computer program of the computer readable storage medium is executed by the processor to implement the steps of the above-mentioned archive management method, all the embodiments of the above-mentioned archive management method can be applied to the computer readable storage medium, and can achieve the same or similar advantages.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A method of archive management, comprising:

2. The method according to claim 1, wherein the step of determining a target archival data set from a plurality of archival data sets according to the obtained first similarity comprises:

judging whether the acquired first similarity is greater than a first preset threshold value or not;

and when the acquired first similarity is larger than the first preset threshold, taking the file data group corresponding to the first similarity as a target file data group.

3. The method according to claim 1, wherein the step of determining similar profile data of the plurality of profile data according to the profile characteristic event of the profile data included in the determined target profile data set comprises:

respectively aiming at each determined target archive data set, acquiring second similarity of two archive data contained in the target archive data set according to archive characteristic events of the two archive data contained in the target archive data set to obtain a plurality of second similarity;

and determining similar archival data in the archival data according to the second similarities.

4. The method according to claim 3, wherein the step of obtaining the second similarity of the two archival data included in the target archival data set according to the archival characteristic events of the two archival data included in the target archival data set comprises:

extracting a plurality of archive characteristic events of preset categories from archive characteristic events of two archive data contained in the target archive data set respectively;

respectively aiming at each preset type of file characteristic event, acquiring a characteristic value of the preset type of file characteristic event extracted from the two file data, calculating the similarity of the two acquired characteristic values, and taking the similarity as a third similarity of the preset type of file characteristic event in the two file data to obtain a plurality of third similarities;

selecting a preset number of third similarity degrees from the plurality of third similarity degrees in descending order; wherein the preset number is less than or equal to the number of the preset categories;

and calculating the average value of the selected third similarity of the preset number, and taking the average value as the second similarity of the two archival data contained in the target archival data group.

5. The method of claim 3, wherein the step of determining similar profile data of the plurality of profile data based on the plurality of second similarities comprises:

taking the archival data corresponding to the plurality of second similarities as target archival data;

and respectively aiming at each target file data, taking the file data of which the second similarity with the target file data is greater than a second preset threshold value in each target file data as the similar file data of the target file data.

6. The method of claim 5, wherein after the step of determining similar profile data of the plurality of profile data based on profile characteristic events of profile data included in the determined target profile data set, the method further comprises:

respectively displaying a prompt message for prompting that similar file data exists in the target file data aiming at each target file data; wherein the prompt message includes identification information of similar profile data of the target profile data.

7. The method of claim 1, wherein the step of calculating a similarity value between cover images in the archival data set using each two archival data of the plurality of archival data as an archival data set, respectively, and using the similarity value as a first similarity of the archival data set comprises:

through a plurality of similarity calculation units, respectively using every two file data in the plurality of file data as a file data group, calculating the similarity value between file cover images in each file data group, and using the similarity value as the first similarity of the file data group.

8. An archive management apparatus characterized by comprising:

9. An archive management device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the archive management method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the archive management method according to any one of claims 1 to 7.