CN111444307B - Similarity value-based entity encoding method, device, equipment and storage medium - Google Patents

Similarity value-based entity encoding method, device, equipment and storage medium Download PDF

Info

Publication number
CN111444307B
CN111444307B CN202010526992.7A CN202010526992A CN111444307B CN 111444307 B CN111444307 B CN 111444307B CN 202010526992 A CN202010526992 A CN 202010526992A CN 111444307 B CN111444307 B CN 111444307B
Authority
CN
China
Prior art keywords
entity
conflict
similarity value
similarity
directory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010526992.7A
Other languages
Chinese (zh)
Other versions
CN111444307A (en
Inventor
崔德冠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An International Smart City Technology Co Ltd
Original Assignee
Ping An International Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An International Smart City Technology Co Ltd filed Critical Ping An International Smart City Technology Co Ltd
Priority to CN202010526992.7A priority Critical patent/CN111444307B/en
Publication of CN111444307A publication Critical patent/CN111444307A/en
Application granted granted Critical
Publication of CN111444307B publication Critical patent/CN111444307B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of big data, is applied to intelligent government affairs, and discloses a similarity value-based entity coding method, a similarity value-based entity coding device, similarity value-based entity coding equipment and a similarity value-based entity coding storage medium, wherein the method comprises the following steps: acquiring an entity directory, traversing the entity directory, and executing deduplication processing on the entity directory to obtain a basic entity directory with uniqueness for each entity; and performing conflict detection on entities in the basic entity directory, judging the types of entity conflicts to obtain entity conflict types, then obtaining target entities according to the entity conflict types, and renaming and coding the entities corresponding to the entity conflicts according to the target entities. The invention also relates to a blockchain technique, the result data being stored in the blockchain. The invention identifies entity coding errors to improve entity coding accuracy.

Description

Similarity value-based entity encoding method, device, equipment and storage medium
Technical Field
The present application relates to the field of big data technologies, and in particular, to a similarity value-based entity encoding method, apparatus, device, and storage medium.
Background
An Entity (Entity) is an abstraction and generalization of a real-world object, and an Entity can be a person, a commodity, an enterprise, a project, and the like. In order to effectively manage and apply entity information, enterprises often assign a uniform code to entities (for example, management of personal information, each person has a unique identification number for identification). In some group companies or organizations with multiple sub-departments cooperating with each other, because of their complex internal structures, each sub-department (sub-organization) often adopts its own independent coding mode according to the business needs, so that the relationship between the entity and the coding of these group companies or organizations generally has the following characteristics: (1) each code identification is unique to one entity; (2) different entities may have the same name, but each entity has only one code; (3) even the same entity, there may be multiple different names; (4) multiple sets of coding schemes may exist for different systems, etc. This results in that in the actual management process of the entity information of the mass data, a conflict between the entity and the code occurs, that is, each entity does not correspond to a unique code.
In order to solve the problem that entity conflict occurs in the entity information management process of mass data, namely, the entity name and the entity code lack a unique corresponding relationship, the current solution for the occurrence of entity conflict is to extract the characteristic information and the position information of the occurrence of entity conflict, and match the characteristic information and the position information with a preset error coding table, so as to obtain the entity name and the entity code of the occurrence of entity conflict; however, for mass data, entity information comes from different channels, which causes difficulty in obtaining characteristic information and position information of an entity, thereby causing a decrease in accuracy of entity encoding. There is a need for a method capable of identifying entity coding errors and improving entity coding accuracy.
Disclosure of Invention
An object of the embodiments of the present application is to provide an entity encoding method based on similarity values, which can identify entity encoding errors, so as to improve the accuracy of entity encoding.
In order to solve the foregoing technical problem, an embodiment of the present application provides an entity encoding method based on a similarity value, including:
acquiring an entity directory, wherein the entity directory comprises a plurality of entities, and the entities comprise entity names and entity codes corresponding to the entity names;
traversing the entity directory, detecting the uniqueness of each entity in the entity directory, if an entity without uniqueness exists, executing deduplication processing to obtain a basic entity directory with uniqueness of each entity, wherein the uniqueness is determined by the corresponding relation between the entity name and the entity code;
performing conflict detection on the entities in the basic entity directory to obtain a conflict detection result, and if the conflict detection result is that entity conflict exists, judging the conflict type of the entity conflict to obtain an entity conflict type;
and obtaining a target entity according to the entity conflict type, and renaming and coding the entity corresponding to the entity conflict according to the target entity.
Further, before the obtaining the entity directory, the method further includes:
acquiring data to be coded, and preprocessing the data to be coded to obtain result data;
and performing regular matching on the result data in a regular matching mode, and generating the entity directory according to the successfully matched data.
Further, the obtaining a target entity according to the entity conflict type, and renaming and encoding the entity corresponding to the entity conflict according to the target entity includes:
if the entity conflict type is that the entity name lacks a corresponding entity code, taking the entity with the entity conflict as a first conflict entity, and taking the entity without the entity conflict as a first basic entity;
counting the similarity value of the first conflict entity and each first basic entity to obtain a first similarity value set;
and obtaining a similarity value with the maximum numerical value in the first similarity set as a target similarity value, judging whether the target similarity value is higher than a preset threshold value, if so, judging that two entities corresponding to the target similarity value are the same entity, and coding the first conflict entity through a first basic entity corresponding to the target similarity value.
Further, the obtaining a target entity according to the entity conflict type, and renaming and encoding the entity corresponding to the entity conflict according to the target entity further includes:
if the entity conflict types are that the same entity name corresponds to different entity codes, taking the entity with the entity conflict as a second conflict entity, and combining the second conflict entities in pairs to obtain a second conflict entity combination;
counting the similarity values of the two entities in each second conflict entity combination to obtain a second similarity set;
and judging whether a similarity value higher than a preset threshold value exists in the second similarity set, if so, judging that two entities in a second conflict combination corresponding to the similarity value higher than the preset threshold value are the same entity, and recoding the second conflict entity according to a preset mode.
Further, the obtaining a target entity according to the entity conflict type, and renaming and encoding the entity corresponding to the entity conflict according to the target entity further includes:
if the entity conflict type is that different entity names correspond to the same entity code, taking an entity with entity conflict as a third conflict entity, and combining the third conflict entity in pairs to obtain a third conflict entity combination;
counting the similarity values of the two entities in each third conflict entity combination to obtain a third similarity set;
and judging whether a similarity value lower than a preset threshold value exists in the third similarity set, if so, judging that two entities in a third conflict entity combination corresponding to the similarity value lower than the preset threshold value are different entities, and recoding the third conflict entity according to a preset mode.
Further, the obtaining a target entity according to the entity conflict type, and renaming and encoding the entity corresponding to the entity conflict according to the target entity further includes:
if the entity conflict type is that the entity code does not have the corresponding entity name, taking the entity with the entity conflict as a fourth conflict entity, and taking the entity without the entity conflict as a second basic entity;
counting the similarity value of the fourth conflict entity and each second basic entity to obtain a fourth similarity value set;
and obtaining the similarity value with the maximum numerical value in the fourth similarity set as a contrast similarity value, judging whether the contrast similarity value is higher than a preset threshold value, if so, judging that two entities corresponding to the contrast similarity value are the same entity, and naming the fourth conflict entity by the entity name corresponding to the contrast similarity value.
Further, the similarity value of the first conflicting entity to the entity for which the entity conflict did not occur is β,
Figure 77450DEST_PATH_IMAGE001
wherein, β∈ [0,1]K is the K characteristic factors for calculating the similarity value, K>=1,akIs the weight of the kth characteristic factor, ak>0 and
Figure 948585DEST_PATH_IMAGE002
=1,skis the similarity value of the kth factor.
In order to solve the technical problems, the invention adopts a technical scheme that: there is provided a similarity value-based entity encoding apparatus including:
the entity target acquisition module is used for acquiring an entity directory, wherein the entity directory comprises a plurality of entities, and each entity comprises an entity name and an entity code corresponding to the entity name;
the entity directory detection module is used for traversing the entity directory, detecting the uniqueness of each entity in the entity directory, and if an entity without uniqueness exists, executing deduplication processing to obtain a basic entity directory with each entity having uniqueness, wherein the uniqueness is determined by the corresponding relation between the entity name and the entity code;
an entity conflict detection module, configured to perform conflict detection on the entities in the basic entity directory to obtain a conflict detection result, and if the conflict detection result indicates that an entity conflict exists, determine a conflict type of the entity conflict to obtain an entity conflict type;
and the entity conflict coding module is used for obtaining a target entity according to the entity conflict type and renaming and coding the entity corresponding to the entity conflict according to the target entity.
In order to solve the technical problems, the invention adopts a technical scheme that: a computer device is provided that includes, one or more processors; a memory for storing one or more programs for causing the one or more processors to implement the similarity value-based entity encoding scheme as described in any one of the above.
In order to solve the technical problems, the invention adopts a technical scheme that: a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a similarity value based entity encoding scheme as described in any one of the above.
In the entity encoding method based on the similarity value in the scheme, the obtained entity directory is subjected to uniqueness detection and duplication elimination to obtain a basic entity directory, then the basic entity directory is subjected to conflict detection, the conflict type of the conflict detection is judged, a target entity is obtained according to different conflict types, and the entity corresponding to the entity conflict is renamed and encoded according to the target entity, so that the conflict of the basic entity directory is solved, the efficiency of solving the entity conflict of the entity directory of mass data is improved, and the accuracy of entity encoding is improved.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
Fig. 1 is a schematic application environment diagram of an entity encoding method based on similarity values according to an embodiment of the present application;
FIG. 2 is a flowchart of an implementation of a similarity-based entity encoding method according to an embodiment of the present disclosure;
fig. 3 is a flowchart of an implementation of the similarity value-based entity encoding method according to the embodiment of the present application before step S1;
fig. 4 is a flowchart illustrating an implementation of step S4 in the method for entity encoding based on similarity values according to the present application;
fig. 5 is a flowchart illustrating another implementation of step S4 in the method for entity encoding based on similarity values according to the embodiment of the present application;
fig. 6 is a flowchart illustrating another implementation of step S4 in the method for entity encoding based on similarity values according to the embodiment of the present application;
fig. 7 is a flowchart illustrating still another implementation of step S4 in the method for entity encoding based on similarity values according to the embodiment of the present application;
FIG. 8 is a schematic diagram of an apparatus for entity encoding based on similarity values according to an embodiment of the present disclosure;
fig. 9 is a schematic diagram of a computer device provided in an embodiment of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
The present invention will be described in detail below with reference to the accompanying drawings and embodiments.
Referring to fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a web browser application, a search-type application, an instant messaging tool, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, an entity encoding method based on a similarity value provided in the embodiments of the present application is generally performed by a server, and accordingly, an entity encoding apparatus based on a similarity value is generally disposed in the server.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring to fig. 2, fig. 2 shows an embodiment of a similarity-based entity encoding method.
It should be noted that, if the result is substantially the same, the method of the present invention is not limited to the flow sequence shown in fig. 2, and the method includes the following steps:
s1: the method comprises the steps of obtaining an entity directory, wherein the entity directory comprises a plurality of entities, and the entities comprise entity names and entity codes corresponding to the entity names.
Specifically, an entity name and an entity code corresponding to the entity name are used as an entity; in the entity directory, the entity directory is composed of entities with individual entity names and entity codes, and each entity name and corresponding entity code correspond to the same entity.
An Entity (Entity) is an abstraction and generalization of a real object, and the Entity may be a person, a commodity, an enterprise, a project, and the like.
S2: and traversing the entity directory, detecting the uniqueness of each entity in the entity directory, if an entity without uniqueness exists, executing deduplication processing to obtain a basic entity directory with uniqueness of each entity, wherein the uniqueness is determined by the corresponding relation between the entity name and the entity code.
Specifically, since the entity directory is constructed from different department, system or industry data, the same entity may exist in different department, system or industry data, and thus, in the process of constructing the entity directory, the same entity may have repeated codes, resulting in the existence of at least two entities with the same entity name and entity code. Therefore, in order to realize that the same entity only records one entity in the entity directory, only one entity is reserved for the repeated entities to obtain the basic entity directory.
The basic entity directory records different entities, each different entity only has one entity, and each entity is composed of a corresponding entity name and an entity code.
The duplication elimination processing means that when two or more entities in the entity directory have the same entity name and the same entity code, duplicate entities are deleted and only one entity is reserved.
The uniqueness means that each entity exists only one entity in the entity directory, and if two entities exist, the entity names are the same, and the entity codes are also the same, the two entities are duplicated entities and have no uniqueness.
S3: and if the conflict detection result is that entity conflict exists, judging the conflict type of the entity conflict to obtain the entity conflict type.
Specifically, since the entity directory is constructed from different department, system or industry data, and the data for constructing the entity directory is large, entities with conflicting entity names and codes may occur. Therefore, conflict detection is carried out on the basic entity directory, whether an entity with a conflicting entity name and a conflicting code exists is detected, and the entity conflict refers to the fact that each entity name and each entity code lack a unique corresponding relation. Whether entity conflict occurs in the basic entity directory is detected through detection of the basic entity directory, and which items have conflict can be locked, and which kind of conflict occurs in the entity conflict is judged, so that processing of subsequent items corresponding to the conflict is facilitated.
Wherein the types of entity conflicts include: the entity name has no corresponding entity code, one entity name corresponds to a plurality of entity codes, a plurality of entity names correspond to one entity code, and the entity code has no corresponding entity name. By judging which kind of conflict exists in the conflict entity, different processing means are adopted to solve the conflict of the existing entity.
The conflict detection means that a basic entity directory is traversed through a preset instruction, whether a unique corresponding relation exists between an entity name and an entity code in the basic entity directory is detected, and whether entity conflict exists in the entity directory is judged.
S4: and obtaining a target entity according to the entity conflict type, and renaming and coding the entity corresponding to the entity conflict according to the target entity.
Wherein, the target entity is the same entity as the entity corresponding to the entity conflict. The target entity has an entity name and an entity code.
According to the scheme, the uniqueness detection is carried out on the obtained entity directory, the duplication elimination processing is carried out to obtain a basic entity directory, then the conflict detection is carried out on the basic entity directory, the conflict type of the conflict detection is judged, the target entity is obtained according to different conflict types, and the entity corresponding to the entity conflict is renamed and coded according to the target entity, so that the conflict generated by the basic entity directory is solved, the efficiency of solving the entity conflict by the entity directory of mass data is improved, and the accuracy of entity coding is improved.
Referring to fig. 3, fig. 3 shows an embodiment before step S1, which includes:
s01: and acquiring data to be coded, and preprocessing the data to be coded to obtain result data.
Specifically, the data to be coded is obtained through integration of cross-department data, cross-system data, industry data and the like, the data to be coded is stored in a database, and if the data to be coded is massive big data, the data to be coded is stored in an HDFS distributed file system. And preprocessing the data to be coded to obtain result data.
The preprocessing comprises data cleaning and the like of the data to be coded, so that the data to be coded keep consistent.
The data to be coded is data which needs entity naming and entity coding, and therefore an entity directory is generated. Data for projects such as departments, cross-systems, industries, etc., each project corresponding to an entity, some of the projects having a project name and a project code; some have only item names and no item codes; some have only item codes, and no item names; the item name corresponds to the entity name of the present invention and the item code corresponds to the entity code of the present invention. And after the data to be coded needs to be sorted, generating an entity directory.
It is emphasized that, to further ensure the privacy and security of the result data, the result data may also be stored in a node of a blockchain.
S02: and performing regular matching on the result data in a regular matching mode, and generating an entity directory according to the successfully matched data.
Specifically, for the result data, an entity directory is constructed in a regular matching mode, and the data is unified, so that each entity has a unique entity name and an entity code.
The regular matching is to correspond the entity names and the entity codes of the entities one to one, and for the entities without the entity names or the entity codes, the entity names and the entity codes of all the entities are in one to one correspondence according to a preset naming rule or a preset coding rule. And then generating an entity target according to the one-to-one correspondence of the entity name and the entity code of the entity.
For example: and (3) data format processing: uniformly changing English format brackets into Chinese format brackets; connector processing: such as "-", "_" and the like; special characters: such as "#", space, tab key, line feed character, etc.; and (3) encoding rules: the same encoding rule is used, so that the encoding is unique.
In the embodiment, the data to be coded is obtained and preprocessed to obtain result data, the result data is stored in the block chain, the result data is subjected to regular matching in a regular matching mode, and the entity directory is generated according to the successfully matched data, so that an entity target is constructed, the entity conflict existing in the entity directory can be conveniently identified subsequently, and the accuracy of entity coding is improved.
Referring to fig. 4, fig. 4 shows a specific implementation manner of step S4, which includes:
s411, if the entity conflict type is that the entity name lacks the corresponding entity code, the entity with the entity conflict is taken as a first conflict entity, and the entity without the entity conflict is taken as a first basic entity.
Specifically, the type of the entity conflict is determined as the conflict that the entity name does not have the corresponding entity code, that is, the entity code corresponding to the entity name is empty and has no entity code in the basic entity directory, and the entity is taken as a first conflict entity, and the entity without the entity conflict is taken as a first basic entity.
S412, the similarity values of the first conflict entities and each first basic entity are counted to obtain a first similarity value set.
Specifically, the similarity value of the first conflict entity corresponding to the entity and the first basic entity is calculated to obtain the similarity value of the first conflict entity and the first basic entity, and the similarity values are collected in a set to obtain a first similarity value set.
Wherein the first set of similarity values is a set of similarity values of the first conflicting entity and the first base entity.
Wherein, the calculation of the similarity value includes but is not limited to: minkowski Distance (Minkowski Distance), Manhattan Distance (Manhattan Distance), Euclidean Distance (Euclidean Distance), cosine similarity, hamming Distance, and the like.
It should be noted that these distance calculation similarity dimensions are not uniform, and the specific similarity result needs to be mapped to the [0,1] interval, so that the larger the similarity value is, the more likely the two entities are the same entity.
S413: and obtaining the similarity value with the maximum numerical value in the first similarity set as a target similarity value, judging whether the target similarity value is higher than a preset threshold value, if so, judging that two entities corresponding to the target similarity value are the same entity, and coding the first conflict entity through a first basic entity corresponding to the target similarity value.
Specifically, the higher the similarity value between the first conflict entity and the first basic entity is, the closer the first conflict entity is to the first basic entity, the more the first conflict entity is, the same entity is. Therefore, the entity corresponding to the highest similarity value in the first similarity set is compared with the entity corresponding to the highest similarity value in the first similarity set, and whether the first conflict entity and the first basic entity corresponding to the highest similarity value are the same entity or not is judged. Therefore, whether the highest similarity value in the similarity set is higher than the set threshold value or not is judged, if so, the first conflict entity and the first basic entity corresponding to the highest similarity value are judged to be the same entity, and the first conflict entity is coded through the entity corresponding to the target similarity value.
The preset threshold is set according to actual conditions, and the preferable preset threshold is 0.85. Since the similarity value of two entities is higher than 0.85, the two are already highly similar, so the preset threshold value of 0.85 will be preferred.
Further, if the highest similarity value in the first similarity set is lower than or equal to the set threshold, the two entities corresponding to the highest similarity value in the first similarity set are not the same entity, the first conflict entity is recoded according to the rule for constructing the entity directory, recoding is not repeated with the existing entity coding, and the recoded format is consistent with the existing coding format.
In this embodiment, if the entity conflict type is that the entity name lacks a corresponding entity code, the entity with the entity conflict is used as a first conflict entity, the entity without the entity conflict is used as a first basic entity, the similarity value between the first conflict entity and each first basic entity is counted to obtain a first similarity value set, the similarity value with the largest value in the first similarity value set is obtained and used as a target similarity value, whether the target similarity value is higher than a preset threshold value is judged, if the target similarity value is higher than the preset threshold value, two entities corresponding to the target similarity value are determined to be the same entity, and the first conflict entity is coded through the first basic entity corresponding to the target similarity value, which is beneficial to identifying coding errors in an entity directory and improving the accuracy of entity coding.
Referring to fig. 5, fig. 5 shows another specific implementation manner of step S4, which includes:
s421: and if the entity conflict types are that the same entity name corresponds to different entity codes, taking the entity with the entity conflict as a second conflict entity, and combining the second conflict entities in pairs to obtain a second conflict entity combination.
Specifically, it is determined that the same entity name has corresponding different entity codes, and in this case, it is necessary to determine whether the entities corresponding to the same entity name are the same entity. Because it is also possible for the same entity name to have different entity codes, for example, citizens of the same name have the same identity number. Whether the two entities are the same entity is judged by counting the similarity value between every two entities corresponding to the same entity name.
S422: and counting the similarity values of the two entities in each second conflict entity combination to obtain a second similarity set.
Specifically, whether two entities are the same entity is determined by determining the similarity value of the two entities in the second conflict entity combination.
S423: and judging whether a similarity value higher than a preset threshold value exists in the second similarity set, if so, judging that two entities in the second conflict combination corresponding to the similarity value higher than the preset threshold value are the same entity, and recoding the second conflict entity according to a preset mode.
The preset mode is a mode of keeping the oldest or the newest mode and uniformly keeping the oldest or the newest codes.
Further, the similarity value higher than the set threshold does not exist in the second similarity set, and then the entities corresponding to the same entity name are different entities, and re-encoding is not needed.
For example, if three identical entity names a exist, and the corresponding entity codes are 001, 002, and 003, then the similarity values of the entities corresponding to the statistical entity codes 001 and 002, 001 and 003, and 002 and 003 are all 0.95, and are greater than 0.85 of the preset threshold, then it is determined that every two of the three identical entity names a are identical entities, that is, the three identical entity names a are identical entities, and the oldest (or newest) codes are uniformly reserved in a manner of reserving the oldest (or newest).
In this embodiment, if the entity conflict types are that the same entity name corresponds to different entity codes, the entity with the entity conflict is used as a second conflict entity, the second conflict entities are combined pairwise to obtain a second conflict entity combination, the similarity values of the two entities in each second conflict entity combination are counted to obtain a second similarity set, whether the similarity value higher than a preset threshold exists in the second similarity set is determined, if yes, the two entities in the second conflict combination with the similarity value higher than the preset threshold are determined to be the same entity, and the second conflict entity is re-encoded according to a preset mode, which is beneficial to identifying coding errors in an entity directory and improving the accuracy of the entity codes.
Referring to fig. 6, fig. 6 shows another specific implementation manner of step S4, including:
s431: and if the entity conflict types are different entity names corresponding to the same entity codes, taking the entity with the entity conflict as a third conflict entity, and combining the third conflict entities in pairs to obtain a third conflict entity combination.
Specifically, if the type of the entity conflict is determined to be that different entity names correspond to the same entity code, it is necessary to determine whether the entities corresponding to the different entity names are the same entity, because the different entity names may have the same entity code. For example, a citizen may have a unique identification number, but the name associated with the citizen may have a current name, a previous name, etc., which may result in different entity names corresponding to the same entity code. It is also possible that coding errors occur, resulting in different entity names corresponding to the same entity code. Therefore, it is necessary to determine whether different entity names correspond to the same entity code and whether the corresponding entities are the same entity.
S432: and counting the similarity values of the two entities in each third conflict entity combination to obtain a third similarity set.
Specifically, whether two entities are the same entity is determined by determining the similarity values of the two entities in the third conflicting entity combination.
S433: and judging whether a similarity value lower than a preset threshold value exists in the third similarity set, if so, judging that two entities in a third conflict entity combination corresponding to the similarity value lower than the preset threshold value are different entities, and recoding the third conflict entity according to a preset mode.
The preset mode is a mode of keeping the oldest or the newest mode and uniformly keeping the oldest or the newest codes.
Further, each similarity value lower than the set threshold does not exist in the third similarity set, and then the entities corresponding to different entity names are different entities, and re-encoding is not needed.
For example, if three identical entity codes are 001 and the corresponding entity names are A, B and C, then the similarity values of the entities corresponding to the entity names a and B, A and C, B and C are counted, the similarity values are 0.95 and are greater than 0.85 of a preset threshold, then it is determined that every two of the three identical entity codes 001 are the same entity, that is, the three identical entity codes 001 are the same entity, and the oldest (or newest) codes are uniformly reserved in a manner of reserving the oldest (or newest).
In this embodiment, if the entity conflict type is that different entity names correspond to the same entity code, the entity with the entity conflict is used as a third conflict entity, the third conflict entities are combined pairwise to obtain a third conflict entity combination, the similarity values of two entities in each third conflict entity combination are counted to obtain a third similarity set, whether the similarity value lower than the set threshold exists in the third similarity set is determined, if yes, the two entities in the third conflict entity combination with the similarity value lower than the set threshold are determined to be different entities, and the third conflict entity is re-encoded according to a preset mode to facilitate identification of a coding error in an entity directory and improve accuracy of the entity code.
Referring to fig. 7, fig. 7 shows another specific implementation manner of step S4, which includes:
s441: and if the entity conflict type is that the entity codes do not have corresponding entity names, taking the entity with the entity conflict as a fourth conflict entity, and taking the entity without the entity conflict as a second basic entity.
Specifically, if the type of the entity conflict is that the entity code does not have a corresponding entity name, the basic entity directory inevitably has an error, and the basic entity directory needs to be modified. Because the entity directory is set in such a way that each entity has an entity name and an entity code, the situation that only the entity code exists does not exist, and if the situation exists, the basic entity directory has errors.
S442: and counting the similarity value of the fourth conflict entity and each second basic entity to obtain a fourth similarity value set.
Specifically, by obtaining the fourth similarity value set, the entity identical to the fourth conflicting entity is searched.
S443: and obtaining the similarity value with the maximum numerical value in the fourth similarity set as a comparison similarity value, judging whether the comparison similarity value is higher than a preset threshold value, if so, judging that two entities corresponding to the comparison similarity value are the same entity, and naming the fourth conflict entity by the entity name corresponding to the comparison similarity value.
Specifically, whether an entity identical to the fourth conflicting entity exists is determined by calculating the entity similarity value, and if the identical entity is found, the fourth conflicting entity is named according to the entity name of the identical entity.
Further, if the contrast similarity value is lower than the preset threshold, it is determined that the two entities corresponding to the contrast similarity value are different entities, and a new entity name is regenerated by the server and corresponds to the entity without the entity name.
In this embodiment, if the entity conflict type is that the entity code does not have a corresponding entity name, the entity with the entity conflict is used as a fourth conflict entity, the entity without the entity conflict is used as a second basic entity, the similarity values between the fourth conflict entity and each second basic entity are counted to obtain a fourth similarity value set, the similarity value with the largest value in the fourth similarity value set is obtained and used as a comparison similarity value, whether the comparison similarity value is higher than a preset threshold value is judged, if the comparison similarity value is higher than the preset threshold value, two entities corresponding to the comparison similarity value are determined to be the same entity, and the fourth conflict entity is named by comparing the entity name corresponding to the similarity value, which is beneficial to identifying the conflict in the entity directory and improving the accuracy of the entity code.
Further, the entity coding method based on the similarity value further includes:
the similarity value between the first conflicting entity and the entity without entity conflict is β, then
Figure 197164DEST_PATH_IMAGE001
Wherein, β∈ [0,1]K is the K characteristic factors for calculating the similarity value, K>=1,akIs the weight of the kth characteristic factor, ak>0 and
Figure 795635DEST_PATH_IMAGE002
=1,skis the similarity value of the kth factor.
Further, the second similarity value set, the third similarity value set and the fourth similarity value set can be obtained by the similarity value calculation method.
In this embodiment, by the above method for calculating the similarity value, the same entity can be identified by the similarity values of the conflicting entity and other entities, so as to solve the entity conflict and improve the accuracy of entity encoding.
Further, the entity coding method based on the similarity value further includes:
and constructing a new entity directory, and storing the data subjected to entity conflict processing and the data without entity conflict into the new entity directory.
Specifically, if the entity conflict occurs in the entity directory, the entity name and the entity code are corrected in the new entity directory. After the conflicting data is corrected in the new entity directory, the conflicting data is identified in the corresponding field, so that the source tracing of the historical data is facilitated.
Specific fields of the new entity directory include, but are not limited to, the following: original name, original code, new name, new code, whether name changed, whether code changed. The results of each conflict type should be output to the new entity directory table. The new entity directory is shown in table one:
numbering Original name Original code New name New codes Whether or not the name is changed Whether the code is changed
1 A 1001 A 1001 0 0
2 B 1002 B 1002 0 0
3 C A 1001 1 1
4 D D 1003 0 1
Watch 1
In this embodiment, by constructing a new entity directory, the data subjected to entity conflict processing and the data not subjected to entity conflict are stored in the new entity directory, which is convenient for tracing the source of the historical data and improves the accuracy of entity coding.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
Referring to fig. 8, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an entity encoding apparatus based on similarity values, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.
As shown in fig. 8, the entity encoding apparatus based on similarity value of the present embodiment includes: an entity target obtaining module 51, an entity directory detecting module 52, an entity conflict detecting module 53 and an entity conflict encoding module 54, wherein:
an entity target obtaining module 51, configured to obtain an entity directory, where the entity directory includes a plurality of entities, and each entity includes an entity name and an entity code corresponding to the entity name;
the entity directory detection module 52 is configured to traverse the entity directory, detect uniqueness of each entity in the entity directory, and if an entity without uniqueness exists, perform deduplication processing to obtain a basic entity directory in which each entity has uniqueness, where the uniqueness is determined by a correspondence between an entity name and an entity code;
an entity conflict detection module 53, configured to perform conflict detection on an entity in the basic entity directory to obtain a conflict detection result, and if the conflict detection result indicates that an entity conflict exists, determine a conflict type of the entity conflict to obtain an entity conflict type;
and the entity conflict encoding module 54 is configured to obtain a name and a code of the target entity according to the entity conflict type, and rename and encode the entity corresponding to the entity conflict according to the name and the code of the target entity.
Further, the entity target obtaining module 51 further includes:
the result data acquisition module is used for acquiring data to be encoded and preprocessing the data to be encoded to obtain result data;
and the entity directory generation module is used for performing regular matching on the result data in a regular matching mode and generating the entity directory according to the successfully matched data.
Further, the entity conflict encoding module 54 includes:
a first conflict entity determining unit, configured to, if the entity conflict type is that the entity name lacks a corresponding entity code, take an entity with an entity conflict as a first conflict entity, and take an entity without an entity conflict as a first basic entity;
the first similarity value set counting unit is used for counting the similarity values of the first conflict entities and each first basic entity to obtain a first similarity value set;
and the first conflict entity encoding unit is used for acquiring the similarity value with the largest numerical value in the first similarity set as a target similarity value, judging whether the target similarity value is higher than a preset threshold value, if so, judging that two entities corresponding to the target similarity value are the same entity, and encoding the first conflict entity through a first basic entity corresponding to the target similarity value.
Further, the entity conflict encoding module 54 further includes:
a second conflict entity combination unit, configured to, if the entity conflict types are that the same entity name corresponds to different entity codes, take the entity with the entity conflict as a second conflict entity, and combine the second conflict entities two by two to obtain a second conflict entity combination;
the second similarity set counting unit is used for counting the similarity values of the two entities in each second conflict entity combination to obtain a second similarity set;
and the second conflict entity encoding unit is used for judging whether a similarity value higher than a preset threshold exists in the second similarity set or not, if so, judging that two entities in a second conflict combination corresponding to the similarity value higher than the preset threshold are the same entity, and recoding the second conflict entity according to a preset mode.
Further, the entity conflict encoding module 54 further includes:
a third conflict entity combination unit, configured to, if the entity conflict type is that different entity names correspond to the same entity code, take the entity with the entity conflict as a third conflict entity, and combine the third conflict entities two by two to obtain a third conflict entity combination;
the third similarity set counting unit is used for counting the similarity values of the two entities in each third conflict entity combination to obtain a third similarity set;
and the third conflict entity encoding unit is used for judging whether a similarity value lower than a preset threshold exists in the third similarity set or not, if so, judging that two entities in a third conflict entity combination corresponding to the similarity value lower than the preset threshold are different entities, and recoding the third conflict entity according to a preset mode.
Further, the entity conflict encoding module 54 further includes:
a fourth conflict entity determining unit, configured to, if the entity conflict type is that the entity code does not have a corresponding entity name, use the entity with the entity conflict as a fourth conflict entity, and use the entity without the entity conflict as a second basic entity;
a fourth similarity value set statistic unit, configured to count similarity values between the fourth conflict entity and each second basic entity to obtain a fourth similarity value set;
and the fourth conflict entity naming unit is used for acquiring the similarity value with the maximum numerical value in the fourth similarity set as a comparison similarity value, judging whether the comparison similarity value is higher than a preset threshold value, if so, judging that two entities corresponding to the comparison similarity value are the same entity, and naming the fourth conflict entity by the entity name corresponding to the comparison similarity value.
Further, the similarity value-based entity encoding apparatus further includes:
a similarity value calculation module, configured to calculate a similarity value between the first conflicting entity and the entity without entity conflict, if the similarity value is β
Figure 62538DEST_PATH_IMAGE001
K is the K characteristic factors for calculating the similarity value, K>=1,akIs the weight of the kth characteristic factor, ak>0 and
Figure 88262DEST_PATH_IMAGE002
=1,skis the similarity of the kth factor.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 9, fig. 9 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 6 includes a memory 61, a processor 62, and a network interface 63 communicatively connected to each other via a system bus. It is noted that only the computer device 6 having three components memory 61, processor 62, network interface 63 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 61 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 6. Of course, the memory 61 may also include both internal and external storage devices for the computer device 6. In this embodiment, the memory 61 is generally used for storing an operating system installed in the computer device 6 and various types of application software, such as program codes of an entity encoding method based on the similarity value. Further, the memory 61 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 62 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to execute the program code stored in the memory 61 or process data, for example, execute a program code of an entity encoding method based on the similarity value.
Network interface 63 may include a wireless network interface or a wired network interface, with network interface 63 typically being used to establish communication connections between computer device 6 and other electronic devices.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing a server maintenance program, which is executable by at least one processor to cause the at least one processor to perform the steps of a similarity value-based entity encoding method as described above.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method of the embodiments of the present application.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (9)

1. An entity coding method based on similarity value, comprising:
acquiring an entity directory, wherein the entity directory comprises a plurality of entities, and the entities comprise entity names and entity codes corresponding to the entity names;
traversing the entity directory, detecting the uniqueness of each entity in the entity directory, if an entity without uniqueness exists, executing deduplication processing to obtain a basic entity directory with uniqueness of each entity, wherein the uniqueness is determined by the corresponding relation between the entity name and the entity code;
performing conflict detection on the entities in the basic entity directory to obtain a conflict detection result, and if the conflict detection result is that an entity conflict exists, judging a conflict type of the entity conflict to obtain an entity conflict type, wherein the entity conflict type comprises: the entity name does not correspond to the entity code, one entity name corresponds to a plurality of entity codes, a plurality of entity names corresponds to one entity code, and the entity code does not correspond to the entity name;
obtaining a target entity according to the entity conflict type, and renaming and coding the entity corresponding to the entity conflict according to the target entity;
obtaining a target entity according to the entity conflict type, and renaming and encoding the entity corresponding to the entity conflict according to the target entity comprises:
if the entity conflict type is that the entity name lacks a corresponding entity code, taking the entity with the entity conflict as a first conflict entity, and taking the entity without the entity conflict as a first basic entity;
counting the similarity value of the first conflict entity and each first basic entity to obtain a first similarity value set;
and obtaining a similarity value with the maximum numerical value in the first similarity set as a target similarity value, judging whether the target similarity value is higher than a preset threshold value, if so, judging that two entities corresponding to the target similarity value are the same entity, and coding the first conflict entity through a first basic entity corresponding to the target similarity value.
2. The similarity value-based entity encoding method according to claim 1, wherein before the obtaining the entity directory, the method further comprises:
acquiring data to be coded, and preprocessing the data to be coded to obtain result data;
and performing regular matching on the result data in a regular matching mode, and generating the entity directory according to the successfully matched data.
3. The method according to claim 1, wherein the entity conflict types are that the same entity name corresponds to different entity codes, and the method further comprises:
taking an entity with entity conflict as a second conflict entity, and combining the second conflict entities in pairs to obtain a second conflict entity combination;
counting the similarity values of the two entities in each second conflict entity combination to obtain a second similarity set;
and judging whether a similarity value higher than a preset threshold value exists in the second similarity set, if so, judging that two entities in a second conflict combination corresponding to the similarity value higher than the preset threshold value are the same entity, and recoding the second conflict entity according to a preset mode.
4. The method according to claim 1, wherein the entity conflict types are different entity names corresponding to the same entity code, and the method further comprises:
taking an entity with entity conflict as a third conflict entity, and combining the third conflict entities in pairs to obtain a third conflict entity combination;
counting the similarity values of the two entities in each third conflict entity combination to obtain a third similarity set;
and judging whether a similarity value lower than a preset threshold value exists in the third similarity set, if so, judging that two entities in a third conflict entity combination corresponding to the similarity value lower than the preset threshold value are different entities, and recoding the third conflict entity according to a preset mode.
5. The method according to claim 1, wherein the entity conflict type is that the entity code has no corresponding entity name, and the method further comprises:
taking an entity with entity conflict as a fourth conflict entity, and taking an entity without entity conflict as a second basic entity;
counting the similarity value of the fourth conflict entity and each second basic entity to obtain a fourth similarity value set;
and obtaining the similarity value with the maximum numerical value in the fourth similarity set as a contrast similarity value, judging whether the contrast similarity value is higher than a preset threshold value, if so, judging that two entities corresponding to the contrast similarity value are the same entity, and naming the fourth conflict entity by the entity name corresponding to the contrast similarity value.
6. The similarity-value-based entity coding method according to claim 1,
the similarity value of the first conflicting entity to the entity for which no conflict of the entities occurred is β,
Figure FDA0002627973210000031
wherein, β∈ [0,1]K is the K characteristic factors for calculating the similarity value, K>=1,akIs the weight of the kth characteristic factor, ak>0 and
Figure FDA0002627973210000032
skis the similarity value of the kth factor.
7. An apparatus for entity coding based on similarity values, comprising:
the entity target acquisition module is used for acquiring an entity directory, wherein the entity directory comprises a plurality of entities, and each entity comprises an entity name and an entity code corresponding to the entity name;
the entity directory detection module is used for traversing the entity directory, detecting the uniqueness of each entity in the entity directory, and if an entity without uniqueness exists, executing deduplication processing to obtain a basic entity directory with each entity having uniqueness, wherein the uniqueness is determined by the corresponding relation between the entity name and the entity code;
an entity conflict detection module, configured to perform conflict detection on an entity in the basic entity directory to obtain a conflict detection result, and if the conflict detection result indicates that an entity conflict exists, determine a conflict type of the entity conflict to obtain an entity conflict type, where the entity conflict type includes: the entity name does not correspond to the entity code, one entity name corresponds to a plurality of entity codes, a plurality of entity names corresponds to one entity code, and the entity code does not correspond to the entity name;
the entity conflict coding module is used for obtaining a target entity according to the entity conflict type and renaming and coding the entity corresponding to the entity conflict according to the target entity;
wherein: the entity conflict encoding module comprises:
a first conflict entity determining unit, configured to, if the entity conflict type is that an entity name lacks a corresponding entity code, take an entity with an entity conflict as a first conflict entity, and take an entity without an entity conflict as a first basic entity;
a first similarity value set counting unit, configured to count similarity values of the first conflict entity and each first basic entity to obtain a first similarity value set;
and the first conflict entity encoding unit is used for acquiring the similarity value with the largest numerical value in the first similarity set as a target similarity value, judging whether the target similarity value is higher than a preset threshold value, if so, judging that two entities corresponding to the target similarity value are the same entity, and encoding the first conflict entity through a first basic entity corresponding to the target similarity value.
8. A computer device comprising a memory in which a computer program is stored and a processor that implements the similarity value-based entity encoding method according to any one of claims 1 to 6 when the processor executes the computer program.
9. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, implements the similarity value-based entity encoding method according to any one of claims 1 to 6.
CN202010526992.7A 2020-06-11 2020-06-11 Similarity value-based entity encoding method, device, equipment and storage medium Active CN111444307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010526992.7A CN111444307B (en) 2020-06-11 2020-06-11 Similarity value-based entity encoding method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010526992.7A CN111444307B (en) 2020-06-11 2020-06-11 Similarity value-based entity encoding method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111444307A CN111444307A (en) 2020-07-24
CN111444307B true CN111444307B (en) 2020-09-22

Family

ID=71655351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010526992.7A Active CN111444307B (en) 2020-06-11 2020-06-11 Similarity value-based entity encoding method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111444307B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1307541C (en) * 2004-06-17 2007-03-28 威盛电子股份有限公司 Computer system and method for producing program code
CN105373481A (en) * 2015-12-18 2016-03-02 浪潮电子信息产业股份有限公司 Automated object library building and maintaining method
US11347691B2 (en) * 2016-08-10 2022-05-31 Netapp, Inc. Methods for managing storage in a distributed de-duplication system and devices thereof
CN107958142B (en) * 2016-10-17 2020-04-21 财付通支付科技有限公司 User account generation method and device
US10956388B2 (en) * 2018-07-10 2021-03-23 EMC IP Holding Company LLC Eventual consistency in a deduplicated cloud storage system
CN109614615B (en) * 2018-12-04 2022-04-22 联想(北京)有限公司 Entity matching method and device and electronic equipment

Also Published As

Publication number Publication date
CN111444307A (en) 2020-07-24

Similar Documents

Publication Publication Date Title
CN108694657B (en) Client identification apparatus, method and computer-readable storage medium
CN114600420A (en) Pruning entries in a tamper-resistant data storage device
CN108009435B (en) Data desensitization method, device and storage medium
CN111949550B (en) Method, device, equipment and storage medium for automatically generating test data
Franke et al. Parallel Privacy-preserving Record Linkage using LSH-based Blocking.
CN111258799A (en) Error reporting information processing method, electronic device and computer readable storage medium
CN112559526A (en) Data table export method and device, computer equipment and storage medium
CN112507212A (en) Intelligent return visit method and device, electronic equipment and readable storage medium
CN111639077B (en) Data management method, device, electronic equipment and storage medium
CN111813845A (en) ETL task-based incremental data extraction method, device, equipment and medium
CN112417475A (en) Fingerprint image encryption method and device, electronic equipment and readable storage medium
CN111752944A (en) Data allocation method and device, computer equipment and storage medium
CN111835776A (en) Network traffic data privacy protection method and system
CN111666087A (en) Operation rule updating method and device, computer system and readable storage medium
CN114356898A (en) Data storage method and device, electronic equipment and storage medium
CN111444307B (en) Similarity value-based entity encoding method, device, equipment and storage medium
CN110019193A (en) Similar account number recognition methods, device, equipment, system and readable medium
CN117133006A (en) Document verification method and device, computer equipment and storage medium
Wurzenberger et al. Discovering insider threats from log data with high-performance bioinformatics tools
CN114614972A (en) Data alignment method, system, electronic device and storage medium
CN115562934A (en) Service flow switching method based on artificial intelligence and related equipment
CN115102770A (en) Resource access method, device and equipment based on user permission and storage medium
CN114840872A (en) Secret text desensitization method and device, computer equipment and readable storage medium
CN114265835A (en) Data analysis method and device based on graph mining and related equipment
CN113064984A (en) Intention recognition method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant