CN113032610B

CN113032610B - File management method, device, equipment and computer readable storage medium

Info

Publication number: CN113032610B
Application number: CN201911356230.0A
Authority: CN
Inventors: 戴世稳
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2024-05-07
Anticipated expiration: 2039-12-25
Also published as: CN113032610A

Abstract

The invention provides a file management method, a file management device, file management equipment and a computer readable storage medium, wherein the file management method comprises the following steps: taking each two archive data in a plurality of archive data as an archive data set, calculating similarity values between archive cover images in the archive data set, and taking the similarity values as first similarity of the archive data set; each archive data comprises an archive cover image and an archive feature event; determining a target archive data set from the plurality of archive data sets according to the acquired first similarity; and determining similar archive data in the plurality of archive data according to the archive feature events of the archive data contained in the determined target archive data set. The invention can improve the accuracy rate of searching similar files.

Description

File management method, device, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of archive management technologies, and in particular, to an archive management method, apparatus, device, and computer readable storage medium.

Background

With the progress of society, personnel circulation is more common, and the difficulty of personnel management is increased. Based on this, some departments or systems manage personnel by creating personnel profiles. However, in the process of filing and archiving, event data (i.e. the characteristic value of the snapshot face data) is only compared with a file cover image (i.e. each file has a cover characteristic value, the characteristic value represents the person) in a 1:1 ratio or a 1:N mode is adopted for archiving, and the event data archiving is unsuccessful due to the reasons of angles, light rays, wearing accessories and the like of the snapshot face, so that the same person is repeatedly filed, multiple files are caused, and the management workload of the file data is increased. In order to reduce the number of people and files and facilitate the inquiry of file data, when the file data is managed, similar files in the file data need to be searched, but at present, similar files are generally searched through the similarity between cover images, and the accuracy rate of searching the similar files is low due to the reasons of the angle, light rays, wearing accessories and the like of the snap-shot faces.

Disclosure of Invention

The invention provides a file management method, a file management device, file management equipment and a computer readable storage medium, and aims to solve the problem of low accuracy in searching similar files.

To achieve the above object, in a first aspect, an embodiment of the present invention provides a file management method, including:

Taking each two archive data in a plurality of archive data as an archive data set, calculating similarity values between archive cover images in the archive data set, and taking the similarity values as first similarity of the archive data set; each archive data comprises an archive cover image and an archive feature event;

Determining a target archive data set from the plurality of archive data sets according to the acquired first similarity;

And determining similar archive data in the plurality of archive data according to the archive feature events of the archive data contained in the determined target archive data set.

In a second aspect, an embodiment of the present invention further provides an archive management device, including:

The file cover image acquisition module is used for respectively taking every two file data in a plurality of file data as a file data group, calculating similarity values between file cover images in the file data group, and taking the similarity values as first similarity of the file data group; each archive data comprises an archive cover image and an archive feature event;

the first determining module is used for determining a target archive data set from the archive data sets according to the acquired first similarity;

And the second determining module is used for determining similar archive data in the plurality of archive data according to the archive feature events of the archive data contained in the determined target archive data set.

In a third aspect, an embodiment of the present invention further provides an archive management device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the archive management method described above when executing the computer program.

In a fourth aspect, embodiments of the present invention also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the archive management method described above.

The scheme of the invention has at least the following beneficial effects:

In the embodiment of the invention, a similarity value between file cover images in each file data group is calculated by taking every two file data in a plurality of file data groups as one file data group, the similarity value is taken as a first similarity of the file data groups, and then a target file data group is determined from the plurality of file data groups according to the acquired first similarity; and finally, determining similar archives in the plurality of archives according to the archives characteristic events of the archives contained in the target archives data group, namely completing the search of similar archives through the archives cover images and the archives characteristic events in each archives data, and greatly improving the accuracy of searching similar archives compared with a mode of searching similar archives only through the similarity between the archives cover images.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a file management method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of distributed parallel computing according to an embodiment of the invention;

FIG. 3 is a diagram illustrating the results of a third similarity of profile feature events in an example of an embodiment of the present invention;

FIG. 4 is a schematic diagram of the third similarity ordering of FIG. 3 according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a hint in an example of an embodiment of the present invention;

FIG. 6 is a schematic diagram of a file management apparatus according to an embodiment of the invention;

fig. 7 is a schematic structural diagram of an archive management device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that the terms "first," "second," and "second" are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implying a number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

As shown in fig. 1, an embodiment of the present invention provides a file management method, which includes:

Step 11, respectively taking every two archive data in a plurality of archive data as an archive data set, calculating similarity values between archive cover images in the archive data set, and taking the similarity values as first similarity of the archive data set.

The archive data is archive data of personnel, and the archive data comprises archive cover images and archive feature events. The archive cover image may be a face image of a person (the face image may be in a low head, a face on the head, a side face, etc.), and the archive feature event may include a face image of the person in a plurality of different states, such as a face image when wearing glasses, a face image when wearing no glasses, a face image when not wearing a hat, a face image when facing the head on the head, a face image when facing the side face, a face image when smiling, and the like. It is understood that, to improve the efficiency of obtaining the first similarity, the first similarity may be obtained by calculating the similarity of the feature values of the cover images of the file. That is, for two archive data in each archive data set, the feature values of the archive cover images of the two archive data sets are extracted, then the similarity of the extracted two feature values is calculated, and finally the similarity is used as the first similarity of the archive data set.

The feature value of the file cover image of the file data can be extracted by an image feature extraction algorithm, and the feature value can be a face feature value.

As a preferred example, to further improve the efficiency of obtaining the first similarity, the feature value may be a high-dimensional graph, such as a feature graph with 512×2×2 dimensions. Of course, the similarity between the feature values can be obtained quickly by the general similarity calculation formula.

It should be noted that, in general, the number of the above-mentioned archive data is huge, in order to increase the efficiency of obtaining the above-mentioned first similarity and increase the efficiency of searching similar archives, in the embodiment of the present invention, a plurality of similarity calculation units may be used to calculate a similarity value between the archive cover images in each archive data set by using each two archive data in the plurality of archive data as one archive data set, and the similarity value is used as the first similarity of the archive data set. Specifically, in the embodiment of the present invention, the first similarities of the plurality of archive data sets may be calculated in a distributed and parallel manner by the plurality of similarity calculation units. The similarity calculation unit may be specifically a graphics processor (GPU, graphics Processing Unit). For the description of the distributed parallel computing in conjunction with fig. 2, assuming that there are n computing devices, the plurality of archive data may be divided into n parts equally, and loaded into GPUs of the n computing devices in parallel (i.e. one part of archive data is loaded into each GPU), and simultaneously, the plurality of archive data in the archive data pool 21 are sequentially loaded into the buffer queue 22; then, each time, one archive data is obtained from the buffer queue 22, and the obtained archive data is loaded into n computing devices respectively, and similarity calculation is performed on the obtained archive data and the archive cover images of the archive data in the GPUs of the n computing devices respectively.

And step 12, determining a target archive data set from the plurality of archive data sets according to the acquired first similarity.

In this embodiment of the present invention, the step 12 is to take two archive data that are primarily considered as similar archive data as one target archive data set, so as to further determine whether the two archive data are similar archive data.

Specifically, in the embodiment of the present invention, the specific implementation manner of the step 12 may be: judging whether the acquired first similarity is larger than a first preset threshold value, and taking the archival data set corresponding to the first similarity as a target archival data set when the acquired first similarity is larger than the first preset threshold value.

It should be noted that, when the obtained first similarity is greater than the first preset threshold, it is considered that the two archive data corresponding to the first similarity may be similar archive data, otherwise, it is considered that the two archive data corresponding to the first similarity may not be similar archive data. It is to be understood that, in the embodiment of the present invention, the first preset threshold may be set according to a specific situation, for example, set to 0.7. It should be understood that, to refer to the determination efficiency of the target archive data set, the step 12 may be implemented in a distributed manner, that is, after the first similarity is calculated in the step 11, each similarity calculating unit directly determines the first similarity, and if the calculated first similarity is greater than a first preset threshold, the archive data set corresponding to the first similarity is taken as the target archive data set.

And step 13, determining similar archive data in the plurality of archive data according to the archive feature event of the archive data contained in the determined target archive data set.

In the embodiment of the present invention, after determining a target archive data set from a plurality of archive data sets, for each target archive data set, similarity of two archive data included in the target archive data set is determined according to archive feature events of the two archive data included in the target archive data set, so as to determine similar archive data in the plurality of archive data sets.

It should be noted that in the embodiment of the present invention, similar files are searched through the file cover images and file feature events in each file data, so that the similarity between different file data can be calculated finely, and the similar files of the file data can be determined according to the calculated similarity, which greatly improves the accuracy of searching similar files compared with the mode of searching similar files only through the similarity between the file cover images.

Next, a specific implementation manner of the step 13 will be described.

Specifically, the specific implementation manner of the step 13 includes the following steps:

Step one, respectively aiming at each determined target archive data set, acquiring second similarity of two archive data contained in the target archive data set according to archive feature events of the two archive data contained in the target archive data set, and obtaining a plurality of second similarity. That is, the second similarity of the two archive data included in each target archive data set is obtained by the archive feature event of the two archive data included in the target archive data set.

In an embodiment of the present invention, the specific implementation manner of obtaining the second similarity of the two archive data included in each target archive data set in the above step one (i.e. obtaining the second similarity of the two archive data included in the target archive data set according to the archive feature event of the two archive data included in the target archive data set) includes the following steps:

First, a plurality of preset types of archive feature events are extracted from archive feature events of two archive data contained in the target archive data set respectively. That is, a plurality of preset categories of archive feature events are extracted from archive feature events of each archive data of the target archive data set.

In order to improve the accuracy of searching similar files, the file feature events of the preset categories are representative high-quality events, such as a face image when wearing glasses, a face image when not wearing glasses, a face image when wearing caps, a face image when not wearing caps, a face image when leaning on the head, a face image when lowering the head, a face image when sideways, and the like. And it is understood that the number and form of the preset categories can be set according to actual situations.

And a second step of respectively acquiring the characteristic values of the file characteristic events of the preset category extracted from the two file data aiming at the file characteristic events of each preset category, calculating the similarity of the two acquired characteristic values, and taking the similarity as the third similarity of the file characteristic events of the preset category in the two file data to acquire a plurality of third similarities.

It is understood that, to improve the efficiency of calculating the third similarity, the third similarity may be obtained by calculating the similarity of the feature values of the profile feature events. That is, for each preset type of archival feature event, the feature value of the preset type of archival feature event extracted from the two archival data may be obtained, then the similarity of the two obtained feature values is calculated, and finally the similarity is used as the third similarity of the preset type of archival feature event in the two archival data.

The characteristic value of the file cover image of the file data can be extracted by an image characteristic extraction algorithm according to the acquisition of the characteristic value of the file characteristic event, and the characteristic value can be a face characteristic value.

As a preferred example, to further improve the efficiency of obtaining the third similarity, the feature value may be a high-dimensional map, such as a feature map with 512×2×2 dimensions. Of course, the similarity between the feature values can be obtained quickly by the general similarity calculation formula.

And a third step of selecting a preset number of third similarities from the plurality of third similarities in order from large to small. That is, a preset number of third similarities are selected from the plurality of third similarities in order of the similarity values from the large to the small. It should be noted that the preset number is less than or equal to the number of preset categories. It will be appreciated that in the embodiments of the present invention, the specific values of the preset number are not limited to the above-mentioned specific values, and may be specifically set according to actual situations.

And step four, calculating the average value of the selected preset number of third similarity degrees, and taking the average value as the second similarity degree of the two archive data contained in the target archive data set.

Here, the first to fourth steps are explained in a specific example for the sake of understanding. In this example, assuming that the number of preset categories is 7, the preset categories are respectively a face image when wearing glasses, a face image when not wearing glasses, a face image when wearing a hat, a face image when not wearing a hat, a face image when leaning against a head, a face image when low head, and a face image when sideways (it should be noted that, in a specific software coding implementation process, the categories of archive feature events can be annotated through fields so as to extract archive feature events of the preset categories), the 7 types of archival feature events extracted from one archival data included in the target archival data set are a1 (face image when wearing glasses), a2 (face image when not wearing glasses), a3 (face image when wearing glasses), a4 (face image when not wearing glasses), a5 (face image when leaning on the head), a6 (face image when leaning on the head) and a7 (face image when sideways), and the 7 types of archival feature events extracted from the other archival data included in the target archival data set are b1 (face image when wearing glasses), b2 (face image when not wearing glasses), b3 (face image when wearing glasses), b4 (face image when not wearing glasses), b5 (face image when leaning on the head), b6 (face image when leaning on the head) and b7 (face image when sideways); then, as shown in fig. 3, calculating the third similarity of the archive feature events of each category respectively to obtain 7 third similarities, wherein the third similarity of a1 and b1 is 0.94, the third similarity of a2 and b2 is 0.95, the third similarity of a3 and b3 is 0.91, the third similarity of a4 and b4 is 0.85, the third similarity of a5 and b5 is 0.90, the third similarity of a6 and b6 is 0.96, and the third similarity of a7 and b7 is 0.91; next, as shown in fig. 4, the 7 third similarities are sorted in order from the top to the bottom, and 3 third similarities with the largest values (i.e., 0.96, 0.95, and 0.94) are selected from the sorted third similarities; finally, the average value of the three third similarities (namely 0.96, 0.95 and 0.94) is calculated, and the calculated average value 0.95 is taken as the second similarity of the two archive data contained in the target archive data set.

It will be understood, of course, that to enhance the efficiency of obtaining the second similarities, the first step may be implemented in a distributed manner, that is, the second similarities of the two archive data in each target archive data set are calculated in a distributed parallel manner by uniformly distributing the plurality of target archive data sets to different equipment working processes (workers).

And step two, determining similar archive data in the plurality of archive data according to the plurality of second similarity.

Specifically, in the embodiment of the present invention, the specific implementation manner of the second step is: firstly, taking archival data corresponding to the plurality of second similarity as target archival data; and then, respectively aiming at each target file data, taking the file data with the second similarity larger than a second preset threshold value in each target file data as the similar file data of the target file data. Firstly, the archival data corresponding to each second similarity obtained in the first step is used as target archival data, and then, for each target archival data, all archival data with the second similarity greater than a second preset threshold value in the target archival data are used as the similar archival data of the target archival data, so that the similar archival data in a plurality of archival data can be obtained. The second preset threshold may be set according to practical situations, for example, set to 0.85.

It should be noted that, in order to distinguish and identify each archive data, when each archive data is archived, each archive data is configured with a corresponding identification information (e.g. a number), that is, each archive data in the archive data pool has a corresponding identification information. Therefore, when determining similar archival data of each target archival data, the archival data with the second similarity greater than the second preset threshold value can be searched for according to the identification information of each target archival data.

In addition, in an embodiment of the present invention, after determining similar archive data in the plurality of archive data, the archive management method further includes the following steps: for each target profile data, a prompt message for prompting the target profile data to have similar profile data is displayed so that the user can process (e.g. merge, view, etc.) the similar profile data. The prompt information comprises identification information of similar archive data of the target archive data. Here, the above-mentioned prompt information will be described with a specific example. In this example, assuming that the target archive data is archive data a, and similar archive data of the archive data a is archive data c, archive data d, archive data e and archive data b, wherein a, b, c, d, e each represent identification information of the archive data, the above-mentioned prompting information may be as shown in fig. 5, where the second similarity between the archive data a and the archive data c in fig. 5 is 0.95, the second similarity between the archive data a and the archive data d is 0.90, the second similarity between the archive data a and the archive data e is 0.90, and the second similarity between the archive data a and the archive data b is 0.85.

As shown in fig. 6, an embodiment of the present invention further provides an archive management device, which includes: an acquisition module 61, a first determination module 62 and a second determination module 63.

The obtaining module 61 is configured to respectively take each two archive data in the plurality of archive data as an archive data set, calculate a similarity value between archive cover images in the archive data set, and take the similarity value as a first similarity of the archive data set; each archive data comprises an archive cover image and an archive feature event;

A first determining module 62, configured to determine a target archive data set from the plurality of archive data sets according to the acquired first similarity;

The second determining module 63 is configured to determine similar archive data in the plurality of archive data according to the archive feature event of the archive data included in the determined target archive data set.

In the embodiment of the present invention, the file management apparatus 60 is an apparatus corresponding to the above file management method, which can improve the accuracy of searching similar files.

It should be noted that, the archive management device 60 includes all modules or units for implementing the archive management method, and in order to avoid excessive repetition, each module or unit of the archive management device 60 is not described herein.

As shown in fig. 7, an embodiment of the present invention further provides a archive management device, including a memory 71, a processor 72, and a computer program 73 stored in the memory 71 and executable on the processor 72, wherein the processor 72 implements the steps of the archive management method described above when executing the computer program 73.

Specifically, the processor 72 of the archive management device 70 executes the computer program 73 to implement the following steps: taking each two archive data in a plurality of archive data as an archive data set, calculating similarity values between archive cover images in the archive data set, and taking the similarity values as first similarity of the archive data set; each archive data comprises an archive cover image and an archive feature event; determining a target archive data set from the plurality of archive data sets according to the acquired first similarity; and determining similar archive data in the plurality of archive data according to the archive feature events of the archive data contained in the determined target archive data set.

Optionally, the processor 72 of the archive management device 70, when executing the computer program 73, further performs the following steps: judging whether the acquired first similarity is larger than a first preset threshold value or not; and when the acquired first similarity is larger than the first preset threshold value, taking the archival data set corresponding to the first similarity as a target archival data set.

Optionally, the processor 72 of the archive management device 70, when executing the computer program 73, further performs the following steps: respectively aiming at each determined target archive data set, acquiring second similarity of two archive data contained in the target archive data set according to archive feature events of the two archive data contained in the target archive data set, and acquiring a plurality of second similarity; and determining similar archival data in the plurality of archival data according to the plurality of second similarities.

Optionally, the processor 72 of the archive management device 70, when executing the computer program 73, further performs the following steps: extracting a plurality of file characteristic events of preset categories from file characteristic events of two file data contained in the target file data set respectively; respectively aiming at each file characteristic event of a preset category, calculating a third similarity of the file characteristic event of the preset category extracted from the two file data to obtain a plurality of third similarities; selecting a preset number of third semblances from the plurality of third semblances in order from large to small; wherein the preset number is less than or equal to the number of preset categories; and calculating the average value of the selected preset number of third similarities, and taking the average value as the second similarities of the two archive data contained in the target archive data set.

Optionally, the processor 72 of the archive management device 70, when executing the computer program 73, further performs the following steps: taking the archive data corresponding to the second similarity as target archive data; and respectively aiming at each target file data, taking the file data with the second similarity larger than a second preset threshold value in each target file data as the similar file data of the target file data.

Optionally, the processor 72 of the archive management device 70, when executing the computer program 73, further performs the following steps: displaying a prompt message for prompting that similar archive data exist in the target archive data according to each target archive data respectively; the prompt information comprises identification information of similar archive data of the target archive data.

Optionally, the processor 72 of the archive management device 70, when executing the computer program 73, further performs the following steps: and respectively taking every two archive data in the plurality of archive data as an archive data set by a plurality of similarity calculation units, calculating a similarity value between the archive cover images in each archive data set, and taking the similarity value as the first similarity of the archive data set.

That is, in the embodiment of the present invention, the steps of the file management method described above are implemented when the processor 72 of the file management apparatus 70 executes the computer program 73, so that the accuracy of searching for similar files can be improved.

By way of example, the above-described computer program 73 may be divided into one or more modules/units that are stored in the memory 71 and executed by the processor 72 to complete the present invention. And the one or more modules/units may be a series of computer program instruction segments capable of performing particular functions for describing the execution of computer program 73 in archive management device 70.

The archive management device 70 of the group relationship network may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The archive management device 70 of the group relationship network may include, but is not limited to, a processor 72, a memory 71. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of archive management device 70 and is not meant to be limiting of archive management device 70, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., archive management device 70 may also include input and output devices, network access devices, buses, etc.

The Processor 72 may be a central processing unit (CPU, central Processing Unit), but may also be other general purpose processors, digital signal processors (DSP, digital Signal Processor), application SPECIFIC INTEGRATED integrated circuits (ASIC), off-the-shelf Programmable gate arrays (FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor 72 may be any conventional processor or the like, the processor 72 being a control center of the archive management device 70, with various interfaces and lines connecting the various portions of the overall archive management device 70.

The memory 71 may be used to store computer programs 73 and/or modules, and the processor 72 performs various functions of the archive management device 70 by executing or executing the computer programs 73 and/or modules stored in the memory 71, and invoking data stored in the memory 71. Specifically, the memory 71 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, memory 71 may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart memory card (SMC, smart Media Card), secure Digital (SD) card, flash memory card (FLASH CARD), at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

It should be noted that, since the steps of the above-mentioned archive management method are implemented when the processor 72 of the archive management device 70 executes the computer program 73, all the embodiments of the archive management method described above are applicable to the archive management device 70, and the same or similar advantages can be achieved.

Furthermore, an embodiment of the present invention provides a computer-readable storage medium storing a computer program that implements the steps of the archive management method described above when executed by a processor.

That is, in the embodiment of the present invention, the steps of the file management method described above are implemented when the computer program of the computer readable storage medium is executed by the processor, so that the accuracy of searching for similar files can be improved.

The computer program of the computer readable storage medium may include, for example, computer program code, which may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.

It should be noted that, since the steps of the file management method described above are implemented when the computer program of the computer readable storage medium is executed by the processor, all embodiments of the file management method described above can be applied to the computer readable storage medium, and the same or similar beneficial effects can be achieved.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A method for archive management, comprising:

Respectively extracting a plurality of file characteristic events of preset categories from file characteristic events of two file data contained in each determined target file data group; the method comprises the steps of respectively aiming at file characteristic events of each preset category, obtaining characteristic values of file characteristic events of the preset category extracted from the two file data, calculating similarity of the two obtained characteristic values, and taking the similarity as third similarity of the file characteristic events of the preset category in the two file data to obtain a plurality of third similarities; selecting a preset number of third semblances from the plurality of third semblances in order from large to small; wherein the preset number is less than or equal to the number of preset categories; calculating the average value of the selected preset number of third similarity degrees, and taking the average value as the second similarity degrees of the two archive data contained in the target archive data set to obtain a plurality of second similarity degrees; and determining similar archival data in the plurality of archival data according to the plurality of second similarities.

2. The method of claim 1, wherein the step of determining the target profile data set from the plurality of profile data sets based on the acquired first similarity comprises:

Judging whether the acquired first similarity is larger than a first preset threshold value or not;

And when the acquired first similarity is larger than the first preset threshold value, taking the archival data set corresponding to the first similarity as a target archival data set.

3. The method of claim 1, wherein the step of determining similar profile data in the plurality of profile data based on the plurality of second similarities comprises:

taking the archive data corresponding to the second similarity as target archive data;

And respectively aiming at each target file data, taking the file data with the second similarity larger than a second preset threshold value in each target file data as the similar file data of the target file data.

4. A method according to claim 3, wherein after the step of determining similar profile data of the plurality of profile data based on profile feature events of profile data contained in the determined target profile data set, the method further comprises:

displaying a prompt message for prompting that similar archive data exist in the target archive data according to each target archive data respectively; the prompt information comprises identification information of similar archive data of the target archive data.

5. The method of claim 1, wherein the step of calculating a similarity value between the cover images of the archives in the archival data set using each two archival data of the plurality of archival data as an archival data set, and using the similarity value as the first similarity of the archival data set, comprises:

And respectively taking every two archive data in the plurality of archive data as an archive data set by a plurality of similarity calculation units, calculating a similarity value between the archive cover images in each archive data set, and taking the similarity value as the first similarity of the archive data set.

6. A archive management device, comprising:

The second determining module is used for respectively extracting a plurality of file characteristic events of preset categories from file characteristic events of two file data contained in each determined target file data group; the method comprises the steps of respectively aiming at file characteristic events of each preset category, obtaining characteristic values of file characteristic events of the preset category extracted from the two file data, calculating similarity of the two obtained characteristic values, and taking the similarity as third similarity of the file characteristic events of the preset category in the two file data to obtain a plurality of third similarities; selecting a preset number of third semblances from the plurality of third semblances in order from large to small; wherein the preset number is less than or equal to the number of preset categories; calculating the average value of the selected preset number of third similarity degrees, and taking the average value as the second similarity degrees of the two archive data contained in the target archive data set to obtain a plurality of second similarity degrees; and determining similar archival data in the plurality of archival data according to the plurality of second similarities.

7. Archive management device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the archive management method of any one of claims 1 to 5 when executing the computer program.

8. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the archive management method of any one of claims 1 to 5.