CN113408557B - File merging method and device and electronic equipment - Google Patents
File merging method and device and electronic equipment Download PDFInfo
- Publication number
- CN113408557B CN113408557B CN202010184910.5A CN202010184910A CN113408557B CN 113408557 B CN113408557 B CN 113408557B CN 202010184910 A CN202010184910 A CN 202010184910A CN 113408557 B CN113408557 B CN 113408557B
- Authority
- CN
- China
- Prior art keywords
- files
- file
- activity rule
- combined
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/105—Human resources
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a file merging method, a file merging device and electronic equipment, wherein the method comprises the following steps: detecting the activity rule identity between every two files to be combined in a plurality of files to be combined, wherein the files to be combined comprise activity rule features for calculating the activity rule identity and file features for calculating the file similarity; if detecting that the activity rule identity between two files to be combined in the plurality of files to be combined reaches a preset identity threshold, taking the two files to be combined as target files, and calculating the file similarity between the two target files; and merging the target files with file similarity reaching a preset similarity threshold. Therefore, the problem that the file merging effect is not ideal based on the file similarity can be well solved by the two threshold judgment conditions of the activity rule discrimination degree and the file similarity between the two files to be merged. The efficiency and the accuracy of file merging can be greatly improved.
Description
Technical Field
The present invention relates to the field of file management technologies, and in particular, to a method and an apparatus for merging files, and an electronic device.
Background
With the progress of society, personnel circulation is more common, so that the difficulty of urban large-scale personnel management is increased. Currently, some departments or systems can establish personnel files for each person to facilitate the management of the person. One of them is to take a person image by a camera and build a person profile based on the image taken under the camera. In order to manage personnel more effectively, corresponding personnel files are generally built for different people shot by a camera, and if a certain person is shot by the same camera again, the currently shot image is classified into the built personnel files. However, due to the problems of shielding, angles, light rays and the like in the actual production environment, the situation that the same person has a plurality of files in the process of clustering and archiving the pictures can occur. Thereby causing a plurality of persons to have a plurality of profiles, so that the data volume of the personal profile increases. So that the number of files is too large, such as in the billions, when handling large-scale personal files at the urban level. The method is inconvenient for large-scale file management in urban level, and a plurality of files of the same person are required to be combined, so that the number of personal files is reduced.
However, the mere combination of multiple files of the same person based on the degree of correspondence between the files is not ideal. Because the repeated file merging rate is low when the file identity threshold is set to be higher, the repeated file merging error is generated when the file identity threshold is set to be lower. Therefore, the existing file merging method has the problems of low file merging efficiency and low file merging accuracy.
Disclosure of Invention
The embodiment of the invention provides a file merging method, which can improve the efficiency and accuracy of file merging.
In a first aspect, an embodiment of the present invention provides a method for merging files, where the method includes the steps of:
detecting the activity rule identity between every two files to be combined in a plurality of files to be combined, wherein the files to be combined comprise activity rule features for calculating the activity rule identity and file features for calculating file similarity;
if detecting that the activity rule identity between two files to be combined in the plurality of files to be combined reaches a preset identity threshold, taking the two files to be combined as target files, and calculating the file similarity between the two target files;
and merging the target files with file similarity reaching a preset similarity threshold.
Optionally, the step of detecting the activity rule identity between each two files to be merged in the plurality of files to be merged includes:
extracting activity rule characteristics of each file to be combined;
calculating the activity rule feature identity between every two activity rule features;
and determining the activity rule recognition degree between each two files to be combined in the plurality of files to be combined based on the activity rule feature recognition degree between each two activity rule features.
Optionally, the activity rule feature includes a plurality of activity rule feature values corresponding to a plurality of times, and the step of calculating the activity rule identity between every two activity rule features includes:
calculating the degree of activity rule characteristic value acquaintance between two activity rule characteristic values under the same time in every two activity rule characteristics;
and determining the activity rule feature recognition degree between every two activity rule features according to at least one activity rule feature value recognition degree in every two activity rule features.
Optionally, the step of calculating the profile similarity between the two target profiles includes:
extracting file characteristics of two target files;
calculating file feature similarity between two file features;
and determining the file similarity of the two target files based on the file feature similarity between the two file features.
Optionally, the archive feature comprises a plurality of archive sub-features; the step of calculating the profile feature similarity between two profile features includes:
calculating file sub-feature similarity between each file sub-feature in the two file features according to a Cartesian product strategy;
and determining the file feature similarity between the two file features based on the calculated sub-feature similarity of each file in the two file features.
In a second aspect, an embodiment of the present invention further provides a file merging device, where the method includes the steps of:
the detection module is used for detecting the activity rule identity between every two files to be combined in the files to be combined, wherein the files to be combined comprise activity rule features for calculating the activity rule identity and file features for calculating the file similarity;
the computing module is used for taking the two files to be combined as target files and computing file similarity between the two target files if detecting that the activity rule similarity between the two files to be combined in the plurality of files to be combined reaches a preset similarity threshold;
and the merging module is used for merging the target files with file similarity reaching a preset similarity threshold.
Optionally, the detection module includes:
the first extraction unit is used for extracting the activity rule characteristics of each file to be combined;
the first calculation unit is used for calculating the activity rule characteristic identity between every two activity rule characteristics;
the first determining unit determines the activity rule correlation between every two files to be combined in the plurality of files to be combined based on the activity rule correlation between every two activity rule features.
Optionally, the activity rule feature includes a plurality of activity rule feature values corresponding to a plurality of times, and the computing unit includes:
the first calculating subunit is used for calculating the activity rule characteristic value acquaintance between two activity rule characteristic values under the same time every two activity rule characteristic values in every two activity rule characteristics;
the first determining subunit is configured to determine, according to the activity rule feature value correspondence of at least one of the two activity rule features, activity rule feature correspondence between each two activity rule features.
In a third aspect, an embodiment of the present invention further provides an electronic device, including: the file merging method comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the steps in the file merging method provided by the embodiment are realized when the processor executes the computer program.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements the steps in the archive combining method provided in the foregoing embodiment.
In the embodiment of the invention, by detecting the activity rule identity between every two files to be combined in the files to be combined, the files to be combined comprise activity rule features for calculating the activity rule identity and file features for calculating the file similarity; if detecting that the activity rule identity between two files to be combined in the plurality of files to be combined reaches a preset identity threshold, taking the two files to be combined as target files, and calculating the file similarity between the two target files; and merging the target files with file similarity reaching a preset similarity threshold. Therefore, the problem that the file merging effect is not ideal based on the file similarity can be well solved by the two threshold judgment conditions of the activity rule discrimination degree and the file similarity between the two files to be merged. Because the activity laws of the same person are substantially identical every day, the activity laws between the two profiles are substantially identical if they are profiles of the same person. And then, on the basis that the activity rule identity of the two files to be combined meets the preset identity, calculating the file similarity between the two files, so that the file combining efficiency and accuracy can be greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a file merging method according to an embodiment of the present invention;
FIG. 2 is a flow chart of one method provided by step 101 in the embodiment of FIG. 1;
FIG. 3 is a flow chart of one method provided by step 202 in the embodiment of FIG. 2;
FIG. 4 is a flow chart of one method provided by step 102 in the embodiment of FIG. 1;
FIG. 5 is a flow chart of one method provided by step 402 in the embodiment of FIG. 4;
FIG. 6 is a schematic diagram of a file merging device according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a configuration provided by the detection module of the embodiment of FIG. 6;
FIG. 8 is a schematic diagram of a configuration provided by the first computing unit of the embodiment of FIG. 7;
FIG. 9 is a schematic diagram of a configuration provided by the computing module of the embodiment of FIG. 6;
FIG. 10 is a schematic diagram of a configuration provided by the second computing unit in the embodiment of FIG. 9;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart of a file merging method according to an embodiment of the present invention, where the method includes the steps of:
step 101, detecting activity rule discrimination between every two files to be merged in the multiple files to be merged.
The files to be combined comprise activity rule features for calculating activity rule identity and file features for calculating file similarity.
The files to be combined are a plurality of files of the same person to be combined. One profile corresponds to the same person. Of course, if it is desired to merge multiple files of multiple persons, the files to be merged are multiple files of multiple persons. The files to be combined can be picture files and are archived by pictures, and of course, the files to be combined can also be text files or attribute files and the like. In this embodiment, the description mainly uses the picture files. The picture may be an image, a photograph, or the like.
The activity rule is the activity track rule of a person in each file to be merged. The pictures in each file to be merged are ordered according to the time stamp, so that the activity track rule of a person can be embodied. Each file to be merged has own activity rule, and when two files to be merged are files of the same person, the activity rules reflected in the two files to be merged are basically consistent, because the activity rules of the same person are basically consistent every day.
The activity rule recognition degree is the recognition degree between the activity rules of each file to be combined. The activity rule features are features that can represent the activity rule, for example, activity place features of an activity track rule, and the like. The activity rule recognition degree of the activity rule can be calculated through the activity rule characteristics.
The file similarity is the similarity between two files to be combined. The above-mentioned file features may be features that can represent files to be merged, for example, a cover feature of each file cover in a picture file, and when the file cover is a face cover, the file features are face features, and the file similarity is a similarity between two face features. Attribute characteristics of each archive attribute in the attribute archive. The profile attributes may be a person's fingerprint, DNA, etc., and may also be height, weight, body shape, etc. The attribute features may include fingerprint features, DNA features, etc. corresponding to the person.
Specifically, when a plurality of files to be merged are obtained, the degree of acquaintance between the activity laws of any two files to be merged in the plurality of files to be merged needs to be calculated.
It should be noted that, the plurality of files to be merged may be stored in the file database or in the file cloud space, and may also be stored in the file system, which naturally increases with the increase of the number of files to be merged. Each file to be combined can be obtained by profiling by different systems or different departments, for example, a camera in a certain district management system shoots a face image of a person, and the system builds the file of the person based on the shot face image and stores the file in a corresponding file database; and the camera in the other district management system shoots the face image of the person, establishes a corresponding file for the person, and stores the corresponding file in a corresponding file database. At this time, the person stores files in the file databases of the two cell management systems, that is, the person has two files at the same time, but the two files are the same person.
Or when a certain camera shoots a face image of a certain person and establishes a personal image file, and the camera shoots the face image of the person again, the camera considers that the camera is not the same person due to the problems of shielding, angles, light rays and the like in the actual production environment, so that a new image file is re-established for the person, and the person also has two files and is stored in a corresponding file database.
Thus, when the personnel files of the plurality of cell management systems need to be managed, the files in the file databases in the plurality of cell management systems need to be taken out for management, and thus, the situation that the same personnel has a plurality of files exists. And each district management system can establish files of a plurality of persons, when managing the files of a plurality of district management systems, the situation that a plurality of persons correspond to a plurality of files exists. For this reason, the above files to be combined may correspond to the files provided in the cell management systems in this example, and the files may be part of or all of the file data provided by the cell management systems. Each file to be merged can also be obtained by profiling different profiling terminals of the same system or the same department.
Step 102, if it is detected that the activity rule identity between two files to be merged among the plurality of files to be merged reaches the preset identity threshold, the two files to be merged are used as target files, and the file similarity between the two target files is calculated.
The preset threshold value of the degree of identity may be a preset threshold value of the degree of identity. The preset identity threshold is a standard for judging whether the activity rules in the two files to be combined are consistent, and can be set according to the requirement. The greater the activity rule identity of the two files to be merged, the more consistent the activity rule of the two files to be merged is, of course, when the activity rule identity of the two files to be merged meets the preset identity threshold under the condition of setting the preset identity threshold, the activity rule between the two files to be merged is basically consistent, and the two files to be merged are target files.
The above two files to be merged as target files means that the two files to be merged are files to be merged for which file similarity needs to be calculated.
Specifically, after detecting the activity rule identity between every two files to be combined in the multiple files to be combined, comparing and judging each detected activity rule identity with a preset identity threshold. The number of activity law identities may be one or more. When one or more activity rule identities are detected, each activity rule identity is compared with a preset identity threshold, and whether each activity rule identity meets the preset identity threshold is judged. If each activity rule identity is greater than the preset identity threshold, the activity rule identity is satisfied with the preset identity threshold, that is, the activity rules of the two files to be combined corresponding to the activity rule identity are basically consistent, the two files to be combined are reserved as target files, and the file similarity of the two files to be combined is calculated.
Of course, if the activity rule identity is smaller than or equal to the preset identity threshold, it is indicated that the activity rule identity does not meet the preset identity threshold, that is, the activity rules of the two files to be combined corresponding to the activity rule identity are inconsistent, the two files to be combined do not need to be reserved as target files, and file similarity between the two files to be combined does not need to be calculated.
It should be noted that, when the calculated multiple activity rule identities all meet the preset identities, multiple groups of target files are correspondingly obtained. And respectively calculating file similarity of multiple groups of target files. It also means that the file similarity number is plural.
Step 103, merging the two target files with file similarity reaching a preset similarity threshold.
The preset similarity threshold may be a preset similarity threshold. The preset similarity threshold is a condition for determining whether two files to be merged are files of the same person. The preset similarity threshold may be set as required, for example, the similarity threshold is set to be 0.90, and if the similarity between two files to be merged is 0.95, which indicates that the file similarity of the two files to be merged is 0.95 and is greater than the preset similarity threshold of 0.90, the two files to be merged are considered to be files of the same person. The merging refers to merging two files to be merged into one file, that is, merging two files to be merged belonging to the same person into one file.
Specifically, after the file similarity between the two target files is calculated, comparing and judging the calculated file similarity with a preset similarity threshold, and judging whether the file similarity meets the preset similarity threshold. If the file similarity is greater than the preset similarity threshold, the file similarity meets the preset similarity threshold, that is, the two files to be combined corresponding to the two target files corresponding to the file similarity are files of the same person, and then the files to be combined are combined to obtain a file, so that the files belonging to the same person can be combined into a file.
Of course, if the file similarity is smaller than or equal to the preset similarity threshold, it is indicated that the file similarity does not meet the preset similarity threshold, that is, it is indicated that two files to be merged corresponding to two target files corresponding to the file similarity are not files of the same person, and file merging is not performed on the two files to be merged.
Furthermore, when one of the files to be merged has multiple files at the same time, the files meeting the activity rule identity and the file similarity can be merged together into one file.
In the embodiment of the invention, the activity rule identity between every two files to be combined in the plurality of files to be combined is detected, and the files to be combined comprise activity rule features for calculating the activity rule identity and file features for calculating the file similarity; if detecting that the activity rule identity between two files to be combined in the plurality of files to be combined reaches a preset identity threshold, taking the two files to be combined as target files, and calculating the file similarity between the two target files; and merging the target files with file similarity reaching a preset similarity threshold. Therefore, the problem that the file merging effect is not ideal based on the file similarity can be well solved by the two threshold judgment conditions of the activity rule discrimination degree and the file similarity between the two files to be merged. Because the activity laws of the same person are substantially identical every day, the activity laws between the two profiles are substantially identical if they are profiles of the same person. And then, on the basis that the activity rule identity of the two files to be combined meets the preset identity, calculating the file similarity between the two files, so that the file combining efficiency and accuracy can be greatly improved.
Referring to fig. 2, fig. 2 is a flowchart of a method provided by step 101 in the embodiment of fig. 1, step 101 including the steps of:
step 201, extracting activity rule characteristics of each file to be merged.
Step 202, calculating the activity rule feature identity between every two activity rule features.
Step 203, obtaining the activity rule recognition degree between each two files to be combined in the plurality of files to be combined based on the activity rule feature recognition degree between each two activity rule features.
The activity rule feature recognition degree is the recognition degree between the activity rule features of the activity rules in the two files to be combined. The activity rule characteristic recognition degree is a condition for judging whether two activity rules are consistent or not.
Specifically, firstly, extracting activity rule characteristics of each file to be merged from a plurality of files to be merged. And then respectively calculating the degree of acquaintance between the corresponding activity rule features in the activity rules of every two files to be combined. And taking the activity rule characteristic recognition degree corresponding to the two activity rule characteristics as the rule recognition degree between the two activity rules, thereby obtaining the activity rule recognition degree between the two files to be combined.
Further, referring to fig. 3, fig. 3 is a flowchart of a method provided in step 202 in the embodiment of fig. 2, where the activity rule feature includes a plurality of activity rule feature values corresponding to a plurality of times, and step 202 includes:
step 301, calculating the degree of activity rule characteristic value acquaintance between two activity rule characteristic values at the same time in every two activity rule characteristics.
Step 302, obtaining the activity rule feature recognition degree between every two activity rule features according to at least one activity rule feature value recognition degree in every two activity rule features.
Wherein, the time can be set according to the year, month, day and time. The same time may be the same year, the same month, the same day, the same time, etc. The activity rule characteristic value is a characteristic value corresponding to the activity rule characteristic at a certain time in the activity rule characteristics. The number of the activity rule characteristic values may be one or more. Correspondingly, the number of the activity rule characteristic value acquaintances can be one or more, and when two activity rule characteristic values exist in a plurality of same time, the activity rule characteristic value acquaintances are also a plurality of.
Specifically, taking the activity rule feature values (the activity rule feature values are set to be 24-dimensional integer arrays and then can represent the activity rule feature values corresponding to 24 hours) under each date of the files A, A, B, C as an example, the activity rule feature values of the files A, A, B, C are provided as follows:
a (No. 1): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
a (No. 2): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
a (No. 3): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
a (No. 4): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
a (No. 5): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ].
A1 (No. 1): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
a1 (No. 3): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
a1 (No. 4): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
a1 (No. 5): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
a1 (No. 7): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ].
B (No. 1): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
b (No. 4): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
b (No. 6): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
b (No. 7): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
b (No. 11): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ].
C (No. 3): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
C (No. 5): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
c (No. 8): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
c (No. 12): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];
c (No. 23): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ].
Wherein, A (No. 1), A (No. 2) etc. are all expressed as the activity rule characteristics of No. 1 and No. 2 of A file. Of course, the number of dates in a (No. 1), a (No. 2), etc. can be determined according to the actual situation. A (No. 1): and [1,2,3,4,5,6,7,8,9,10,11,12,13, … ] is expressed as an activity rule characteristic value corresponding to the A file in 24 hours in the activity rule characteristic of the No. 1 file. In this embodiment, the values in [1,2,3,4,5,6,7,8,9,10,11,12,13, … ] are specific values of the activity rule feature values corresponding to 24 hours, and of course, the values in [1,2,3,4,5,6,7,8,9,10,11,12,13, … ] are merely exemplary, and the specific values can be set according to the actual activity rule feature values. Of course, when calculating the activity rule feature values among the activity rule feature values, the values in [1,2,3,4,5,6,7,8,9,10,11,12,13, … ] can be converted into binary number groups, and then calculated according to binary numbers. For example, assume that the activity law feature value of a (No. 1) is: [157 (1001 1101), 78 (0100 1110), 52 (0011 0100), 51 (0011 0011), … ]
It should be noted that, in this embodiment, the activity rule in each file to be merged is based on the camera mapping. Specifically, a camera number and a camera code corresponding to the camera number may be set, where the camera number is set as: camera8, camera7, camera6, camera5, camera4, camera3, camera2, camera1; the camera code is set as: 1. 1, 1.
It can be represented by an integer 255 because the binary number of 255 is 11111111, where each bit represents a camera. In implementations, a camera code 1 when someone is present in the camera and a camera code 0 when someone is not present in the camera. For example, when the activity rule feature value in the file a is 157 (1001 1101), it is indicated that the person in the file a appears in camera1, camera3, camera4, camera5, and camera 8.
Further, after the activity rule characteristic values of each of the files A, A1 and B, C under each time are obtained, the files A, A1 and B, C are respectively intersected with each other according to time to obtain the following activity rule characteristic values, taking the intersection of the file a and the file A1 according to time as an example, specifically as follows:
A∩A1:
A (No. 1) A (No. 2) A (No. 3) A (No. 4) A (No. 5)
∩
A1 (No. 1) A1 (No. 3) A1 (No. 4) A1 (No. 5) A1 (No. 7)
=a (No. 1) a (No. 3) a (No. 4) a (No. 5) and A1 (No. 1) A1 (No. 3) A1 (No. 4) A1 (No. 5).
Still further, assume that the characteristic value of a (No. 1) is: [157 (1001 1101), 78 (0100 1110), 52 (0011 0100), 51 (0011 0011), … ]
Let A1 (No. 1) feature values be: [156 (1001 1100), 76 (0100 1100), 52 (0011 0100), 50 (0011 0000), … ].
So a (No. 1) _a1 (No. 1) = (same number of binary digits per bit)/(total number of binary digits)
= (157 (1001 1101) and 156 (1001 1100) are 7) 7/8
++ (78 (0100 1110) and 76 (0100 1100) 7/8
++ (52 (0011 0100) and 52 (0011 0100) 8/8
++ (51 (0011 0011) and 50 (0011 0000) are 6) 6/8
+…
=7/8+7/8+8/8+6/8+ … (24-dimensional data in total)
=19.8。
Finally, 19.8/24=0.825 is used, so that the activity rule characteristic value recognition degree of the file A (No. 1) _A1 (No. 1) obtained through final calculation is 0.825.
The same is done to obtain:
a (No. 3) _a1 (No. 3) =0.86;
a (No. 4) _a1 (No. 4) =0.88;
a (No. 5) _a1 (No. 5) =0.90.
Finally, the activity rule of the archive A_A1 is obtained with the following identity degree: 0.825+0.86+0.88+0.90=3.465.
Similarly, obtain files A_B, A_C, A1_B, A1_C, and B_C, and the activity rule of the files A_B, A_C is as follows:
A_A1=3.456;
A_B=1.25;
A_C=0.105;
A1_B=1.9;
A1_C=1.2;
B_C=2.8。
And determining the calculated activity rule characteristic value acquaintance as the activity rule characteristic acquaintance between two activity rule characteristics, so as to obtain the activity rule acquaintance between two activity rules, and further obtain the activity rule acquaintance between two files to be combined.
If the preset recognition threshold is 2, the activity rule recognition degree of the files a_a1 and b_c is obtained from the four files A, A and B, C to satisfy the preset recognition threshold, which indicates that the activity rules of the files a and A1 are basically consistent, and the activity rules of the files B and C are basically consistent. And a_b=1.25; a_c=0.105; a1_b=1.9; a1_c=1.2 does not satisfy the preset discrimination threshold, and the activity rules of the two are inconsistent.
In this embodiment, the identity of the activity rules of the corresponding people in the two files to be merged may be calculated according to the activity rule features in the two files to be merged and the activity rule feature values at different corresponding times, so as to determine whether the activity rules of the corresponding people in the two files to be merged are consistent, and if so, it is indicated that the two files to be merged may be files of the same person, then the subsequent steps may be executed. If not, it means that the two files to be merged are not necessarily files of the same person, and no subsequent steps are required to be executed.
Referring to fig. 4, fig. 4 is a flow chart of one method provided by step 102 in the embodiment of fig. 1.
Step 102 comprises the steps of:
step 401, extracting archive features of two target archives.
Step 402, calculating the similarity of the archive features between the two archive features.
Step 403, obtaining file similarity of the two files to be merged based on the file feature similarity between the two file features.
The profile feature similarity is a profile feature similarity between profile features in two target profiles. Profile feature similarity is a condition for determining whether two target profiles are profiles of one person.
Specifically, when determining that the activity rule identity between two files to be merged meets the preset identity, determining that the two files to be merged are target files, and calculating the file similarity between the two files to be merged is required. For this purpose, it is necessary to extract the profile features of the two target profiles, calculate the profile feature similarity between the two profile features, and determine the profile feature similarity as the profile similarity between the two target profiles, because the profile features can represent the target profiles. For example, if the file features are file cover features, the file cover features corresponding to the two target files are extracted, and the file cover feature similarity between the two file cover features is calculated, so as to obtain the file similarity of the two target files.
Further, referring to fig. 5, fig. 5 is a flow chart of a method provided by step 402 in the embodiment of fig. 4. The archive feature comprises a plurality of archive sub-features, step 402 comprising the steps of:
step 501, calculating the similarity of file sub-features between each file sub-feature in the two file features according to the Cartesian product strategy.
Step 502, obtaining the file feature similarity between the two file features based on the calculated sub-feature similarity of each file in the two file features.
Where the above Cartesian product is also referred to as Cartesian product, cartesian product refers to, in mathematics, the Cartesian product (Cartesian product) of two sets X and Y, also known as direct product, denoted X Y, the first object being a member of X and the second object being one member of all possible ordered pairs of Y, assuming set A= { a, B }, set B= {0,1,2}, the Cartesian product of the two sets is { (a, 0), (a, 1), (a, 2), (B, 0), (B, 1), (B, 2) }. The Cartesian product strategy is a method for calculating the similarity between the sub-features of each file by adopting Cartesian products.
The file sub-feature may be a plurality of features in each file feature, for example, when the file feature is a file cover feature, if a plurality of cover representative pictures are provided on the file covers in the file to be combined, one file sub-feature is a feature corresponding to one cover representative picture.
The above-mentioned archive sub-feature similarity is the similarity between each archive sub-feature, and is a condition for judging whether each archive sub-feature is the same archive sub-feature. The number of archive sub-feature similarities may be one or more. When files to be combined are provided with only one file sub-feature, the obtained file sub-feature similarity is only one, and when files to be combined are provided with a plurality of file sub-features, the obtained file sub-feature similarity is also only a plurality of files.
Specifically, the file sub-feature similarity between the file sub-features in the target file is obtained through Cartesian product strategy calculation. And the average similarity of the sub-feature similarity of each file can be calculated, so that the file similarity between two target files can be obtained.
Exemplary, if archive E has an E1 archive sub-feature, an E2 archive sub-feature, an E3 archive sub-feature; file F has an F1 file sub-feature, an F2 file sub-feature, and an F3 file sub-feature. Then the similarity between profile a and profile B can be calculated according to the cartesian product strategy, for example:
e1_f1=0.91,e1_f2=0.94,e1_f3=0.93;
e2_f1=0.93,e2_f2=0.91,e2_f3=0.90;
e3_f1=0.95,e3_f2=0.90,e3_f3=0.96。
their average value = (0.91+0.94+0.93+0.93+0.91+0.90+0.95+0.90+0.96)/9=0.926 was found.
Specifically, if the file similarities of the files a_a1 and b_c can be obtained by the cartesian product strategy, the file similarities are: a1:0.818; b_c 0.77. If the predetermined similarity threshold is set to 0.816, wherein A_A1:0.818 is greater than the predetermined similarity threshold, so that the files A and A1 are files of the same person. Instead, profile B and profile C are not profiles of the same person.
In this embodiment, the file feature similarity between the two file features is obtained by calculating the file sub-feature similarity between the file sub-features corresponding to the file features in the two target files, so as to obtain the file similarity between the two target files. And whether the two files to be combined are files of the same person or not is judged by calculating the activity rule discrimination degree and the file similarity degree between the two files to be combined. If the files are files of the same person, the two files to be combined are combined into one file. Thereby improving the efficiency and accuracy of file merging.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a file merging device according to an embodiment of the present invention, and a file merging device 600 includes the steps of:
the detection module 601 is configured to detect activity rule identifiers between every two files to be combined in the plurality of files to be combined, where the files to be combined include activity rule features for calculating the activity rule identifiers and file features for calculating file similarity.
The calculating module 602 is configured to, if it is detected that the activity rule identity between two files to be merged among the plurality of files to be merged reaches the preset identity threshold, take the two files to be merged as target files and calculate a file similarity between the two target files.
The merging module 603 is configured to merge the target files whose file similarity reaches a preset similarity threshold.
Referring to fig. 7, fig. 7 is a schematic structural diagram provided by the detection module in the embodiment of fig. 6, where the detection module 601 includes:
the first extracting unit 6011 is configured to extract activity rule features of each file to be merged.
The first calculating unit 6012 is configured to calculate an activity rule feature identity between every two activity rule features.
The first determining unit 6013 determines the activity rule correspondence between each two files to be merged in the plurality of files to be merged based on the activity rule correspondence between each two activity rule features.
Referring to fig. 8, fig. 8 is a schematic structural diagram provided by the first computing unit in the embodiment of fig. 7, where the activity rule feature includes a plurality of activity rule feature values corresponding to a plurality of times, and the first computing unit 6012 includes:
the first calculating subunit 60121 is configured to calculate an activity rule feature value affinity between two activity rule feature values at the same time every two of the two activity rule features.
The first determining subunit 60122 is configured to determine, according to the identity of at least one activity rule feature value in each two activity rule features, the activity rule feature identity between each two activity rule features.
Referring to fig. 9, fig. 9 is a schematic structural diagram provided by the computing module in the embodiment of fig. 6, where the computing module 602 includes:
the second extraction unit 6021 is used for extracting file characteristics of the two target files.
The second computing unit 6022 is configured to compute a profile feature similarity between the two profile features.
The second determining unit 6023 is configured to determine file similarities of the two target files based on the file feature similarities between the two file features.
Referring to fig. 10, fig. 10 is a schematic diagram of a structure provided by the second computing unit in the embodiment of fig. 9, where the archive feature includes a plurality of archive sub-features; the second calculation unit 6022 includes:
the second computing subunit 60221 is configured to compute the archive sub-feature similarity between each of the two archive sub-features according to the cartesian product strategy.
The second determining subunit 60222 is configured to determine a profile feature similarity between the two profile features based on the calculated similarity of each profile sub-feature in the two profile features.
The archive combining device 600 provided in the embodiment of the present invention can implement each implementation manner and corresponding beneficial effects in the above method embodiments, and in order to avoid repetition, the description is omitted here.
Referring to fig. 11, fig. 11 is a schematic structural diagram of an electronic device 700 according to an embodiment of the present invention, where the electronic device 700 includes: the memory 702, the processor 701 and the computer program stored in the memory 702 and capable of running on the processor 701, when the processor 701 executes the computer program, the steps in the file merging method provided in the above embodiment are implemented, and the processor 701 executes the following steps:
detecting the activity rule identity between every two files to be combined in the plurality of files to be combined, wherein the files to be combined comprise activity rule features for calculating the activity rule identity and file features for calculating the file similarity.
If it is detected that the activity rule identity between two files to be combined in the plurality of files to be combined reaches the preset identity threshold, the two files to be combined are taken as target files, and file similarity between the two target files is calculated.
And merging the target files with file similarity reaching a preset similarity threshold.
Optionally, the step of detecting the activity rule identity between each two files to be merged in the plurality of files to be merged, performed by the processor 701, includes:
and extracting the activity rule characteristics of each file to be combined.
And calculating the activity rule feature recognition degree between every two activity rule features.
And determining the activity rule identity between every two files to be combined in the plurality of files to be combined based on the activity rule feature identity between every two activity rule features.
Optionally, the activity rule feature includes a plurality of activity rule feature values corresponding to a plurality of times, and the step of calculating the activity rule identity between every two activity rule features performed by the processor 701 includes:
calculating the degree of activity rule characteristic value acquaintance between two activity rule characteristic values under the same time in every two activity rule characteristics;
and determining the activity rule feature degree between every two activity rule features according to the activity rule feature value degree of at least one of every two activity rule features.
Optionally, the step of calculating the file similarity between the two target files performed by the processor 701 includes:
archive features of two target archives are extracted.
And calculating the similarity of the archive features between the two archive features.
Based on the profile feature similarity between the two profile features, the profile similarity of the two target profiles is determined.
Optionally, the archive feature comprises a plurality of archive sub-features; the step of calculating profile feature similarity between two profile features performed by the processor 701 includes:
and calculating the file sub-feature similarity between each file sub-feature in the two file features according to the Cartesian product strategy.
And determining the file feature similarity between the two file features based on the calculated sub-feature similarity of each file in the two file features.
It should be noted that, the electronic device 700 may be a smart terminal, a mobile phone, a tablet computer, etc. of an archive related department.
The electronic device 700 provided in the embodiment of the present invention can implement each implementation manner and corresponding beneficial effects in the foregoing method embodiment, and in order to avoid repetition, details are not repeated here.
The embodiment of the present invention further provides a computer readable storage medium, on which a computer program is stored, where the computer program when executed by the processor 701 implements each process of the archive combining method provided by the embodiment of the present invention, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM) or the like.
The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.
Claims (8)
1. A method of archive merge, the method comprising the steps of:
detecting the activity rule identity between every two files to be combined in a plurality of files to be combined, wherein the files to be combined comprise a plurality of activity rule features for calculating the activity rule identity and file features for calculating file similarity, one file to be combined corresponds to one person, each file to be combined is obtained by filing by different systems or different departments, and the number of the activity rule identities between every two files to be combined is a plurality;
If each activity rule identity degree in the activity rule identities degree between the two files to be combined is larger than a preset identity degree threshold, determining that the activity rule identity degree between the two files to be combined in the plurality of files to be combined reaches the preset identity degree threshold, taking the two files to be combined as target files, and calculating file similarity between the two target files;
merging the target files with file similarity reaching a preset similarity threshold;
the step of detecting the activity rule identity between every two files to be combined in the plurality of files to be combined comprises the following steps:
extracting activity rule characteristics of each file to be combined;
calculating the activity rule feature identity between every two activity rule features;
and determining the activity rule recognition degree between each two files to be combined in the plurality of files to be combined based on the activity rule feature recognition degree between each two activity rule features.
2. The archive merge method of claim 1 wherein the activity rule features comprise a plurality of activity rule feature values corresponding to a plurality of times, and the step of calculating an activity rule identity between each two activity rule features comprises:
Calculating the degree of activity rule characteristic value acquaintance between two activity rule characteristic values under the same time in every two activity rule characteristics;
and determining the activity rule feature recognition degree between every two activity rule features according to at least one activity rule feature value recognition degree in every two activity rule features.
3. The archive combining method of claim 1, wherein the step of calculating archive similarity between two target archives comprises:
extracting file characteristics of two target files;
calculating file feature similarity between two file features;
and determining the file similarity of the two target files based on the file feature similarity between the two file features.
4. The archive merge method of claim 3, wherein the archive feature comprises a plurality of archive sub-features; the step of calculating the profile feature similarity between two profile features includes:
calculating file sub-feature similarity between each file sub-feature in the two file features according to a Cartesian product strategy;
and determining the file feature similarity between the two file features based on the calculated sub-feature similarity of each file in the two file features.
5. A archive merge device, said device comprising the steps of:
the system comprises a detection module, a storage module and a storage module, wherein the detection module is used for detecting the activity rule identity between every two files to be combined in a plurality of files to be combined, the files to be combined comprise a plurality of activity rule features for calculating the activity rule identity and file features for calculating file similarity, one file to be combined corresponds to one person, each file to be combined is obtained by filing by different systems or different departments, and the number of the activity rule identities between every two files to be combined is a plurality;
the computing module is used for determining that the activity rule identity between two files to be combined in the plurality of files to be combined reaches a preset identity threshold value if each activity rule identity in the activity rule identity between the two files to be combined is larger than the preset identity threshold value, taking the two files to be combined as target files, and computing the file similarity between the two target files;
the merging module is used for merging the target files with file similarity reaching a preset similarity threshold;
the detection module comprises:
the first extraction unit is used for extracting the activity rule characteristics of each file to be combined;
The first calculation unit is used for calculating the activity rule characteristic identity between every two activity rule characteristics;
the first determining unit determines the activity rule correlation between every two files to be combined in the plurality of files to be combined based on the activity rule correlation between every two activity rule features.
6. The archive merge device as claimed in claim 5, wherein the activity rule feature comprises a plurality of activity rule feature values corresponding to a plurality of times, and the first calculation unit comprises:
the first calculating subunit is used for calculating the activity rule characteristic value acquaintance between two activity rule characteristic values under the same time every two activity rule characteristic values in every two activity rule characteristics;
the first determining subunit is configured to determine, according to the activity rule feature value correspondence of at least one of the two activity rule features, activity rule feature correspondence between each two activity rule features.
7. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the archive combining method of any of claims 1 to 4 when the computer program is executed.
8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps in the archive combining method of any one of claims 1 to 4.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010184910.5A CN113408557B (en) | 2020-03-17 | 2020-03-17 | File merging method and device and electronic equipment |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010184910.5A CN113408557B (en) | 2020-03-17 | 2020-03-17 | File merging method and device and electronic equipment |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN113408557A CN113408557A (en) | 2021-09-17 |
| CN113408557B true CN113408557B (en) | 2023-10-13 |
Family
ID=77677055
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010184910.5A Active CN113408557B (en) | 2020-03-17 | 2020-03-17 | File merging method and device and electronic equipment |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN113408557B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115168425B (en) * | 2022-07-21 | 2026-01-02 | 广州云从人工智能技术有限公司 | Methods, apparatus, computer equipment and computer-readable storage media for merging archives |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108563675A (en) * | 2018-02-28 | 2018-09-21 | 北京图铭视界科技有限公司 | Electronic record automatic generation method and device based on target body characteristics |
| CN109241378A (en) * | 2018-08-29 | 2019-01-18 | 北京旷视科技有限公司 | Archives method for building up, device, equipment and storage medium |
| CN109710780A (en) * | 2018-12-28 | 2019-05-03 | 上海依图网络科技有限公司 | A kind of archiving method and device |
| CN109800674A (en) * | 2018-12-28 | 2019-05-24 | 上海依图网络科技有限公司 | A kind of archiving method and device |
| CN109800668A (en) * | 2018-12-28 | 2019-05-24 | 上海依图网络科技有限公司 | A kind of archiving method and device |
| CN109815829A (en) * | 2018-12-28 | 2019-05-28 | 上海依图网络科技有限公司 | A kind of method and device of determining passerby track |
| CN110334232A (en) * | 2019-06-28 | 2019-10-15 | 深圳市商汤科技有限公司 | File application method and device, storage medium |
| CN110765134A (en) * | 2019-10-25 | 2020-02-07 | 四川东方网力科技有限公司 | File establishing method, equipment and storage medium |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2015073920A1 (en) * | 2013-11-15 | 2015-05-21 | Kofax, Inc. | Systems and methods for generating composite images of long documents using mobile video data |
| JP6561504B2 (en) * | 2015-03-11 | 2019-08-21 | 富士通株式会社 | Data arrangement program, data arrangement method, and data arrangement apparatus |
-
2020
- 2020-03-17 CN CN202010184910.5A patent/CN113408557B/en active Active
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108563675A (en) * | 2018-02-28 | 2018-09-21 | 北京图铭视界科技有限公司 | Electronic record automatic generation method and device based on target body characteristics |
| CN109241378A (en) * | 2018-08-29 | 2019-01-18 | 北京旷视科技有限公司 | Archives method for building up, device, equipment and storage medium |
| CN109710780A (en) * | 2018-12-28 | 2019-05-03 | 上海依图网络科技有限公司 | A kind of archiving method and device |
| CN109800674A (en) * | 2018-12-28 | 2019-05-24 | 上海依图网络科技有限公司 | A kind of archiving method and device |
| CN109800668A (en) * | 2018-12-28 | 2019-05-24 | 上海依图网络科技有限公司 | A kind of archiving method and device |
| CN109815829A (en) * | 2018-12-28 | 2019-05-28 | 上海依图网络科技有限公司 | A kind of method and device of determining passerby track |
| CN110334232A (en) * | 2019-06-28 | 2019-10-15 | 深圳市商汤科技有限公司 | File application method and device, storage medium |
| CN110765134A (en) * | 2019-10-25 | 2020-02-07 | 四川东方网力科技有限公司 | File establishing method, equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN113408557A (en) | 2021-09-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111859451B (en) | Multi-source multi-mode data processing system and method for applying same | |
| CN113987243B (en) | An image archiving method, an image archiving apparatus, and a computer-readable storage medium. | |
| CN113011319A (en) | Multi-scale fire target identification method and system | |
| CN113887366B (en) | Identity archiving method and device, electronic equipment and storage medium | |
| CN103399896A (en) | Method and system for recognizing association relationships among users | |
| CN107209860A (en) | Using block features to optimize multiclass image classification | |
| CN113627542A (en) | Event information processing method, server and storage medium | |
| WO2023082641A1 (en) | Electronic archive generation method and apparatus, and terminal device and storage medium | |
| CN108021908A (en) | Face age bracket recognition methods and device, computer installation and readable storage medium storing program for executing | |
| CN114168770A (en) | Deep learning-based method and device for searching images by images | |
| CN115251953A (en) | Recognition method, device, terminal device and storage medium of motor imagery EEG signal | |
| CN114648653A (en) | Classification method, classification device, classification equipment and storage medium | |
| Nagargoje et al. | Performance evaluation of the data clustering techniques and cluster validity indices for efficient toolpath development for incremental sheet forming | |
| CN114495139A (en) | An image-based job duplicate checking system and method | |
| CN113408557B (en) | File merging method and device and electronic equipment | |
| CN115661472A (en) | Image duplicate checking method and device, computer equipment and storage medium | |
| Usuyama et al. | ePillID dataset: a low-shot fine-grained benchmark for pill identification | |
| CN108090117A (en) | A kind of image search method and device, electronic equipment | |
| CN111753642B (en) | Method and device for determining key frame | |
| CN101599077A (en) | A Method for 3D Object Retrieval | |
| CN121327691A (en) | A method and apparatus for determining form categories in power systems | |
| CN111666441A (en) | Method, device and electronic system for determining personnel identity type | |
| CN110020617A (en) | A kind of personal identification method based on biological characteristic, device and storage medium | |
| CN106980878B (en) | Method and device for determining geometric style of 3D model | |
| CN111783869B (en) | Training data screening method and device, electronic equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |