CN111158666B - Entity normalization processing method, device, equipment and storage medium - Google Patents

Entity normalization processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN111158666B
CN111158666B CN201911379440.1A CN201911379440A CN111158666B CN 111158666 B CN111158666 B CN 111158666B CN 201911379440 A CN201911379440 A CN 201911379440A CN 111158666 B CN111158666 B CN 111158666B
Authority
CN
China
Prior art keywords
comparison
entity
target attribute
rule
entity normalization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911379440.1A
Other languages
Chinese (zh)
Other versions
CN111158666A (en
Inventor
王冠朝
方舟
江涛
仲夏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201911379440.1A priority Critical patent/CN111158666B/en
Publication of CN111158666A publication Critical patent/CN111158666A/en
Application granted granted Critical
Publication of CN111158666B publication Critical patent/CN111158666B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code

Abstract

The application discloses an entity normalization processing method, an entity normalization processing device, entity normalization processing equipment and a storage medium, and relates to an entity normalization processing technology. The specific implementation scheme is as follows: receiving a rule parameter related to an entity normalization strategy input by a user; generating a program code corresponding to the entity normalization strategy according to the rule parameters and a preset code generation rule; and running a program code corresponding to the entity normalization strategy, and carrying out normalization judgment on the entities in the preset entity data set so as to cluster the same entities. The user only needs to input the rule parameters related to the entity normalization strategy, the program codes corresponding to the entity normalization strategy can be automatically generated according to the rule parameters and the preset code generation rules, the user programming is not needed, the manpower development cost and the learning cost are reduced, the threshold of data production is reduced, the entity normalization strategy is convenient to modify, the entity normalization processing efficiency is improved, and the method can be applied to entity normalization processing of data in any field.

Description

Entity normalization processing method, device, equipment and storage medium
Technical Field
The application relates to the technical field of data processing, in particular to an entity normalization processing technology.
Background
In the construction of knowledge graph data, since the construction of knowledge graph often needs to use a plurality of different data sources, it is an important task to perform normalization fusion on the same entity in the different data sources. For example, the data of the movie "weather child" is derived from three different websites, and the related attributes of the data are respectively 2019-11-01 (China), 2019-07-19 (Japanese), 2019-11-01 (China), director is new haas, etc., so that the data refer to the same entity, and the disambiguation of the entity is required. The entity disambiguation process is divided into two steps of entity normalization and fusion, wherein the entity normalization is that the same entity is normalized into the same set; and the entities in the same set are fused during fusion, and the attributes are preferred by using the strategy, so that the entity is finally fused.
In the existing entity normalization method, a research engineer is usually required to program according to data of earlier investigation, and entity normalization is realized by running a program code; or training an entity normalization model by training data in a model training mode, and realizing entity normalization by the entity normalization model. In the existing entity normalization method, a research and development engineer self-programming mode is needed, a large amount of labor cost is needed to be consumed, the learning difficulty is high, and standardized guarantee is absent; the model is adopted to carry out entity normalization, a large amount of labeling data is needed in the model training process, professional algorithm engineers are needed to carry out iteration, the model training process is difficult to be applied to commercial scenes, the industry universality is poor, and the applicability is poor.
Disclosure of Invention
The application provides an entity normalization processing method, device, equipment and storage medium, which are used for automatically generating corresponding program codes according to rule parameters related to an entity normalization strategy input by a user, so that the manpower development cost and the learning cost are reduced.
A first aspect of the present application provides a method comprising:
receiving a rule parameter related to an entity normalization strategy input by a user;
generating program codes corresponding to entity normalization strategies according to the rule parameters and preset code generation rules;
and running a program code corresponding to the entity normalization strategy, and carrying out normalization judgment on the entities in the preset entity data set so as to cluster the same entities.
According to the embodiment, the user only needs to input the rule parameters related to the entity normalization strategy, the program codes corresponding to the entity normalization strategy can be automatically generated, user programming is not needed, labor development cost and learning cost are reduced, the threshold of data production is reduced, the entity normalization strategy is convenient to modify, the efficiency of entity normalization processing is improved, and the method and the device can be applied to entity normalization processing of data in any field.
In one possible design, the rule parameters include at least one target attribute to be compared, a comparison condition parameter corresponding to the target attribute, and a comparison rule combined between comparison conditions corresponding to the target attributes.
In one possible design, the generating the program code corresponding to the entity normalization policy according to the rule parameter and the preset code generation rule includes:
aiming at any target attribute to be compared, acquiring a comparison function of the target attribute according to the type of the target attribute and a comparison condition parameter corresponding to the target attribute;
calling a corresponding comparison function according to each comparison rule, and determining a logic operation type to obtain a program code of the comparison rule;
and obtaining the program codes corresponding to the entity normalization strategy according to the program codes of each comparison rule.
In one possible design, the comparison condition parameters corresponding to the target attribute include a type of the target attribute, a comparison condition corresponding to the target attribute, and a comparison process severity.
In one possible design, the obtaining the comparison function of the target attribute according to the type of the target attribute and the comparison condition parameter corresponding to the target attribute includes:
determining comparison method parameters in the comparison function according to the type of the target attribute;
determining a multi-value comparison condition parameter in the comparison function according to the comparison process strictness, wherein the multi-value comparison condition parameter comprises: multiple values are identical, at least one of the same, or different;
Determining supplementary parameters in the comparison function according to the comparison conditions and/or a preset data cleaning instruction;
and obtaining a comparison function of the target attribute according to the target attribute, the comparison method parameter, the multi-value comparison condition parameter and the supplementary parameter.
In one possible design, the obtaining the program code corresponding to the entity normalization policy according to the program code of each comparison rule includes:
and receiving the priority order of the comparison rules set by the user, and setting the priority of the program codes of each comparison rule according to the priority order of the comparison rules so as to operate the program codes of each comparison rule according to the priority when operating the program codes corresponding to the entity normalization strategy.
In one possible design, the running the program code corresponding to the entity normalization policy further includes:
receiving a starting instruction of a user, and running a program code corresponding to the entity normalization strategy according to the running instruction; and/or
Receiving a stopping instruction of a user, and stopping running program codes corresponding to the entity normalization strategy according to the stopping instruction;
after the same entity is clustered, the method further comprises the following steps:
And receiving a checking result instruction of a user, and displaying a clustering result according to the checking result instruction.
A second aspect of the present application provides an entity normalization processing device, including:
the input module is used for receiving the entity normalization strategy related rule parameters input by the user;
the processing module is used for generating a program code corresponding to the entity normalization strategy according to the rule parameters and a preset code generation rule;
and the operation module is used for operating the program codes corresponding to the entity normalization strategy, and carrying out normalization judgment on the entities in the preset entity data set so as to cluster the same entities.
In one possible design, the rule parameters include at least one target attribute to be compared, a comparison condition parameter corresponding to the target attribute, and a comparison rule combined between comparison conditions corresponding to the target attributes.
In one possible design, the processing module is to:
aiming at any target attribute to be compared, acquiring a comparison function of the target attribute according to the type of the target attribute and a comparison condition parameter corresponding to the target attribute;
calling a corresponding comparison function according to each comparison rule, and determining a logic operation type to obtain a program code of the comparison rule;
And obtaining the program codes corresponding to the entity normalization strategy according to the program codes of each comparison rule.
In one possible design, the comparison condition parameters corresponding to the target attribute include a type of the target attribute, a comparison condition corresponding to the target attribute, and a comparison process severity.
In one possible design, the processing module is to:
determining a comparison device parameter in the comparison function according to the type of the target attribute;
determining a multi-value comparison condition parameter in the comparison function according to the comparison process strictness, wherein the multi-value comparison condition parameter comprises: multiple values are identical, at least one of the same, or different;
determining supplementary parameters in the comparison function according to the comparison conditions and/or a preset data cleaning instruction;
and obtaining a comparison function of the target attribute according to the target attribute, the comparison device parameter, the multi-value comparison condition parameter and the supplementary parameter.
In one possible design, the processing module is to:
and receiving the priority order of the comparison rules set by the user, and setting the priority of the program codes of each comparison rule according to the priority order of the comparison rules so as to operate the program codes of each comparison rule according to the priority when operating the program codes corresponding to the entity normalization strategy.
In one possible design, the input module is further configured to receive a start instruction from a user;
the operation module is also used for operating the program codes corresponding to the entity normalization strategy according to the operation instruction; and/or
The input module is also used for receiving a stopping instruction of a user;
the operation module is further used for stopping operating the program codes corresponding to the entity normalization strategy according to the stopping instruction;
the input module is also used for receiving a viewing result instruction of a user;
the operation module is also used for displaying the clustering result according to the checking result instruction.
A third aspect of the present application provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.
A fourth aspect of the present application provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of the first aspect.
A fifth aspect of the present application provides a computer program comprising program code for performing the method of the first aspect when the computer program runs on a computer.
A sixth aspect of the present application provides an entity normalization processing method, including:
receiving a rule parameter related to an entity normalization strategy input by a user;
acquiring an entity normalization strategy according to the rule parameters;
and carrying out normalization judgment on the entities in the preset entity data set according to the entity normalization strategy, and outputting normalization judgment results.
One embodiment of the above application has the following advantages or benefits: the user only needs to input the rule parameters related to the entity normalization strategy, the program codes corresponding to the entity normalization strategy can be automatically generated according to the rule parameters and the preset code generation rules, the user programming is not needed, the manpower development cost and the learning cost are reduced, the threshold of data production is reduced, the entity normalization strategy is convenient to modify, the entity normalization processing efficiency is improved, and the method can be applied to entity normalization processing of data in any field. The visual operation is carried out through the user interaction interface, so that the data production cost and threshold are greatly reduced, the entity normalization strategy is convenient to formulate and modify, and convenience is provided for the user to flexibly process the entity data.
Other effects of the above alternative will be described below in connection with specific embodiments.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:
fig. 1 is a flowchart of an entity normalization processing method provided in an embodiment of the present application;
FIG. 2 is a flowchart of an entity normalization processing method according to another embodiment of the present application;
FIG. 3 is a block diagram of an entity normalization processing device according to another embodiment of the present application;
fig. 4 is a block diagram of an electronic device used to implement the entity normalization processing method according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the existing entity normalization method, a research and development engineer self-programming mode is needed, a large amount of labor cost is needed to be consumed, the learning difficulty is high, and standardized guarantee is absent; the model is adopted to carry out entity normalization, a large amount of labeling data is needed in the model training process, professional algorithm engineers are needed to carry out iteration, the model training process is difficult to be applied to commercial scenes, the industry universality is poor, and the applicability is poor. Aiming at the technical problems of the existing entity normalization method, the user only needs to input the rule parameters related to the entity normalization strategy, and the program codes corresponding to the entity normalization strategy can be automatically generated according to the rule parameters and the preset code generation rules, so that the user programming is not needed, the manpower development cost and the learning cost are reduced, the threshold of data production is reduced, the entity normalization strategy is convenient to modify, the entity normalization processing efficiency is improved, and the method can be applied to entity normalization processing of data in any field.
The entity normalization process is described in detail below in connection with specific embodiments.
An embodiment of the present application provides a method for entity normalization processing, and fig. 1 is a flowchart of the method for entity normalization processing provided in the embodiment of the present invention. The execution body may be an electronic device, as shown in fig. 1, and the entity normalization processing method specifically includes the following steps:
s101, receiving entity normalization strategy related rule parameters input by a user.
In this embodiment, rule parameters related to an entity normalization policy may be input by a user, where the entity normalization policy may include at least one comparison rule, and each comparison rule may include at least one rule parameter; the rule parameters may be semantic rule parameters which are input by a user through natural language and are convenient to understand, in this embodiment, a first user interaction interface for inputting the rule parameters may be provided, where the rule parameters include at least one target attribute to be compared, comparison condition parameters corresponding to the target attribute, and comparison rules combined between comparison conditions corresponding to the target attributes, and after the rule parameters are input by the user through the first user interaction interface, the user may obtain one comparison rule. For example, with the dubbing actor as the target attribute to be compared, the comparison condition parameters corresponding to the target attribute may include, but are not limited to, the type of the target attribute (text, number, time, etc.), the comparison condition corresponding to the target attribute (identical, inclusion relation, edit distance, semantic similarity, etc.), and the comparison process strictness (loose, strict, etc.), the comparison rule combined between the comparison conditions corresponding to the respective target attributes (for example, "when condition 1, condition 2, and condition 3 are satisfied simultaneously, each entity is regarded as the same entity", and the logical operation between the three comparison conditions is "and"). In addition, the first user interaction interface also provides functions of adding rule parameters and deleting rule parameters, and after receiving instructions of the user on adding rule parameters or deleting rule parameters, corresponding adding or deleting actions can be executed.
In addition, the embodiment also provides a second user interaction interface including (but not limited to) functions of starting, closing, editing, deleting, prioritizing and the like for any comparison rule; wherein clicking the edit button in the second user interaction interface by the user jumps to the first user interaction interface. Visual and easy-to-understand entity normalization strategies can be obtained through the second user interaction interface.
S102, generating program codes corresponding to the entity normalization strategy according to the rule parameters and a preset code generation rule.
In this embodiment, after the rule parameters related to the entity normalization policy are obtained, the rule may be generated according to a preset code, and the rule parameters related to the entity normalization policy may be automatically translated into the program code corresponding to the entity normalization policy, and further, the program code may be stored or directly run. The code generation rule preset in this embodiment may be a rule of how to generate the comparison function according to rule parameters, and may first define a unified template of the comparison function, where some necessary parameters of the comparison function (such as a comparison method, comparison conditions, supplementary parameters, etc.) are included, and these parameters may be determined according to rule parameters input by a user.
In addition, the embodiment can be convenient for the user to modify the comparison rule, and the user can automatically update the corresponding program codes according to the modified rule parameters only by modifying the rule parameters, so that the labor cost is reduced, and the efficiency is improved.
S103, running a program code corresponding to the entity normalization strategy, and performing normalization judgment on the entities in the preset entity data set so as to cluster the same entities.
In this embodiment, before the program code corresponding to the entity normalization policy is run, the entity data set to be normalized may be determined first, and the user may select the data source through the predetermined user interaction interface to determine the entity data set to be normalized, so that when the code is run, any two entity data in the entity data set may be compared according to the comparison rule of the normalization policy, to determine whether the entity data is the same entity, and further cluster the same entity, to obtain the result of normalizing the entity data set.
Further, a starting instruction of a user can be received, and program codes corresponding to the entity normalization strategy are operated according to the operation instruction; and/or receiving a stopping instruction of a user, and stopping running the program codes corresponding to the entity normalization strategy according to the stopping instruction. That is, in this embodiment, the user can control the running and stopping of the program code at any time as needed. In this embodiment, a third user interaction interface is provided, and a task management function is provided on the third user interaction interface, so that the running and stopping of the program code can be controlled by the user.
Further, after the same entity is clustered, a checking result instruction of a user is received, and a clustering result is displayed according to the checking result instruction. For example, the user may click on the view results button, and a cluster result may be presented, where the cluster result includes relevant information of the same entity, such as a data source, etc.
According to the entity normalization processing method provided by the embodiment, the rule parameters related to the entity normalization strategy input by the user are received; generating program codes corresponding to entity normalization strategies according to the rule parameters and preset code generation rules; and running a program code corresponding to the entity normalization strategy, and carrying out normalization judgment on the entities in the preset entity data set so as to cluster the same entities. According to the embodiment, the user only needs to input the rule parameters related to the entity normalization strategy, the program codes corresponding to the entity normalization strategy can be automatically generated, user programming is not needed, labor development cost and learning cost are reduced, the threshold of data production is reduced, the entity normalization strategy is convenient to modify, the efficiency of entity normalization processing is improved, and the method and the device can be applied to entity normalization processing of data in any field.
On the basis of any embodiment, the rule parameters include at least one target attribute to be compared, a comparison condition parameter corresponding to the target attribute, and a comparison rule combined between comparison conditions corresponding to the target attributes.
Further, as shown in fig. 2, in the foregoing embodiment, the generating the program code corresponding to the entity normalization policy according to the rule parameter and the preset code generation rule in S102 may specifically include:
s201, aiming at any target attribute to be compared, acquiring a comparison function of the target attribute according to the type of the target attribute and a comparison condition parameter corresponding to the target attribute.
In this embodiment, a unified template of the comparison function in the preset code generation rule may be predefined, where some necessary parameters of the comparison function (such as a comparison method, a comparison condition, a supplementary parameter, etc.) are included, and these parameters may be determined according to rule parameters input by the user. And obtaining a comparison function of the target attribute according to the type of the target attribute and the comparison condition parameter corresponding to the target attribute.
In an alternative embodiment, the comparison condition parameters corresponding to the target attribute include the type of the target attribute (e.g., text, number, time, etc.), the exact identity of the comparison condition corresponding to the target attribute, the inclusion relationship, the edit distance, the semantic similarity, etc.), and the severity of the comparison process (loose, strict, etc.).
Further, the obtaining the comparison function of the target attribute according to the type of the target attribute and the comparison condition parameter corresponding to the target attribute includes:
determining comparison method parameters in the comparison function according to the type of the target attribute;
determining a multi-value comparison condition parameter in the comparison function according to the comparison process strictness, wherein the multi-value comparison condition parameter comprises: multiple values are identical, at least one of the same, or different;
determining supplementary parameters in the comparison function according to the comparison conditions and/or a preset data cleaning instruction;
and obtaining a comparison function of the target attribute according to the target attribute, the comparison method parameter, the multi-value comparison condition parameter and the supplementary parameter.
In this embodiment, the function name of the comparison function may be used as a key when the comparison function is called, so each comparison function has a unique function name. The compare function may have cmpattr, multicmp, singlecmp, compconf four parameters.
The cmpattr parameter is used to identify target attributes to be compared, such as specifically identifying that the comparison function is a duration attribute, a region attribute, a dubbing actor attribute, etc. of the entity to be compared.
The multicmp parameter is a multi-value comparison condition parameter, for example, 2 entities to be compared are "weather sub", the target attribute of the entity 1 is "dubbing actor" is [ "show by new design", "Send seven dish" ], the target attribute of the entity 2 is [ "show by new design", "Send seven dish", "Xiaoban" ], that is, there are a plurality of values in one target attribute, and the comparison is needed according to the multicmp. The multicmp parameters may specifically include "identical", "at least one identical", "completely different", etc., and may be determined according to the severity of the comparison process, e.g., if the comparison process is more severe, the multicmp parameters are determined to be "identical" or "completely different", and if the comparison process is more relaxed, the multicmp parameters are determined to be "at least one identical".
The single parameter is used to specify the method of comparison in a single value comparison process, e.g., a single parameter of "Float" indicates a floating point comparison, and a threshold of "threshold:0.25" for the floating point comparison is defined in the supplemental parameter, i.e., the difference between the floating points of the two entities when the floating point comparison is made needs to be less than the threshold. In this embodiment, the comparison method of the singecmp parameter specifically includes: the single value is subjected to accurate comparison, edit distance comparison, word string relation comparison, time comparison, floating point number comparison, telephone number comparison, semantic similarity comparison and the like.
The compconf parameter is used to configure some supplementary parameters of the comparison function, such as the threshold "threshold:0.25", and then, if "clean: true" indicates that the target attribute value is cleaned, the redundant characters are removed, and other supplementary parameters may be included, which will not be described herein.
The final return value of the comparison function is a number from 0 to 1 for validation in the comparison rule.
S202, according to each comparison rule, calling a corresponding comparison function, determining a logic operation type and obtaining the program code of the comparison rule.
In this embodiment, the comparison RULE (prio_run) is specifically used to combine the comparison functions, and perform a logic operation according to the result of the comparison function, so as to finally determine whether the entity is the same entity or not.
For each comparison rule, two elements may be included, the first element is a logical operation between calling the comparison function and the comparison function result, for example, calling the comparison function using the time length as the target attribute through the function name of the comparison function and calling the comparison function using the dubbing actor as the target attribute, the logical operation between the comparison function results is "and", that is, the two entities are determined to be the same entity only when the time length is the same and the dubbing actor is the same, and then calling the comparison function using the region as the target attribute through the function name of the comparison function, determining that the comparison function result is 0 is not the same entity, that is, the regions of the two entities are not the same entity at the same time; the second element is an element that identifies whether it is the same entity, 1 is denoted as the same entity, and 0 is not. The final output result value of the comparison rule may be the boolean value TRUE or FALSE.
S203, obtaining the program codes corresponding to the entity normalization strategy according to the program codes of the comparison rules.
In this embodiment, after the program codes of each comparison rule are obtained, the program codes corresponding to the entity normalization policy may be finally combined.
Further, a comparison rule priority order set by a user can be received, and the program codes of each comparison rule are set to be priority according to the comparison rule priority order, so that the program codes of each comparison rule are operated according to the priority when the program codes corresponding to the entity normalization strategy are operated. In this embodiment, when the program code of each comparison rule is run according to the priority, if the comparison rule with a high priority can determine that two entities to be compared are the same entity or different entities, the program code of the comparison rule with a lower priority is not run.
On the basis of any embodiment, the rule parameters include at least one target attribute to be compared, a comparison condition parameter corresponding to the target attribute, and a comparison rule combined between comparison conditions corresponding to the target attributes.
According to the entity normalization processing method provided by the embodiments, a user only needs to input the rule parameters related to the entity normalization strategy, the program codes corresponding to the entity normalization strategy can be automatically generated, user programming is not needed, labor development cost and learning cost are reduced, the threshold of data production is reduced, the entity normalization strategy is convenient to modify, the entity normalization processing efficiency is improved, and the entity normalization processing method can be applied to entity normalization processing of data in any field. The visual operation is carried out through the user interaction interface, so that the data production cost and threshold are greatly reduced, the entity normalization strategy is convenient to formulate and modify, and convenience is provided for the user to flexibly process the entity data.
An embodiment of the present application provides an entity normalization processing device, and fig. 3 is a structural diagram of the entity normalization processing device provided by the embodiment of the present invention. As shown in fig. 3, the entity normalization processing device 300 specifically includes: an input module 301, a processing module 302 and a running module 303.
An input module 301, configured to receive a rule parameter related to an entity normalization policy input by a user;
the processing module 302 is configured to generate a program code corresponding to the entity normalization policy according to the rule parameter and a preset code generation rule;
and the operation module 303 is configured to operate program codes corresponding to the entity normalization policy, and perform normalization determination on the entities in the preset entity data set, so as to cluster the same entities.
On the basis of the embodiment, the rule parameters include at least one target attribute to be compared, a comparison condition parameter corresponding to the target attribute, and a comparison rule combined between comparison conditions corresponding to the target attributes.
On the basis of the above embodiment, the processing module 302 is configured to:
aiming at any target attribute to be compared, acquiring a comparison function of the target attribute according to the type of the target attribute and a comparison condition parameter corresponding to the target attribute;
Calling a corresponding comparison function according to each comparison rule, and determining a logic operation type to obtain a program code of the comparison rule;
and obtaining the program codes corresponding to the entity normalization strategy according to the program codes of each comparison rule.
On the basis of the embodiment, the comparison condition parameters corresponding to the target attribute comprise the type of the target attribute, the comparison condition corresponding to the target attribute and the strictness of the comparison process.
On the basis of the above embodiment, the processing module 302 is configured to:
determining a comparison device parameter in the comparison function according to the type of the target attribute;
determining a multi-value comparison condition parameter in the comparison function according to the comparison process strictness, wherein the multi-value comparison condition parameter comprises: multiple values are identical, at least one of the same, or different;
determining supplementary parameters in the comparison function according to the comparison conditions and/or a preset data cleaning instruction;
and obtaining a comparison function of the target attribute according to the target attribute, the comparison device parameter, the multi-value comparison condition parameter and the supplementary parameter.
On the basis of the above embodiment, the processing module 302 is configured to:
And receiving the priority order of the comparison rules set by the user, and setting the priority of the program codes of each comparison rule according to the priority order of the comparison rules so as to operate the program codes of each comparison rule according to the priority when operating the program codes corresponding to the entity normalization strategy.
On the basis of the above embodiment, the input module 301 is further configured to receive a start instruction of a user;
the operation module 303 is further configured to operate program codes corresponding to the entity normalization policy according to the operation instruction; and/or
The input module 301 is further configured to receive a stop instruction from a user;
the operation module 303 is further configured to stop operating program codes corresponding to the entity normalization policy according to the stop instruction;
the input module 301 is further configured to receive a view result instruction of a user;
the operation module 303 is further configured to display a clustering result according to the view result instruction.
The entity normalization processing device provided in this embodiment may be specifically configured to execute the entity normalization processing method embodiment provided in the foregoing figure, and specific functions are not provided herein.
The entity normalization processing device provided by the embodiment receives the rule parameters related to the entity normalization strategy input by the user; generating program codes corresponding to entity normalization strategies according to the rule parameters and preset code generation rules; and running a program code corresponding to the entity normalization strategy, and carrying out normalization judgment on the entities in the preset entity data set so as to cluster the same entities. According to the embodiment, the user only needs to input the rule parameters related to the entity normalization strategy, the program codes corresponding to the entity normalization strategy can be automatically generated, user programming is not needed, labor development cost and learning cost are reduced, the threshold of data production is reduced, the entity normalization strategy is convenient to modify, the efficiency of entity normalization processing is improved, and the method and the device can be applied to entity normalization processing of data in any field.
According to embodiments of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 4, a block diagram of an electronic device according to an entity normalization processing method according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 4, the electronic device includes: one or more processors 401, memory 402, and interfaces for connecting the components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 401 is illustrated in fig. 4.
Memory 402 is a non-transitory computer-readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the entity normalization processing method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the entity normalization processing method provided by the present application.
The memory 402 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the input module 301, the processing module 302, and the execution module 303 shown in fig. 3) corresponding to the entity normalization processing method in the embodiments of the present application. The processor 401 executes various functional applications of the server and data processing, i.e., implements the entity normalization processing method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 402.
Memory 402 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device of the entity normalization processing method, and the like. In addition, memory 402 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 402 may optionally include memory remotely located with respect to processor 401, which may be connected to the electronic device of the entity normalized processing method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the entity normalization processing method may further include: an input device 403 and an output device 404. The processor 401, memory 402, input device 403, and output device 404 may be connected by a bus or otherwise, for example in fig. 4.
The input device 403 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the entity normalization process, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. input devices. The output device 404 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the rule parameters related to the strategy are normalized through receiving the entity input by the user; generating program codes corresponding to entity normalization strategies according to the rule parameters and preset code generation rules; and running a program code corresponding to the entity normalization strategy, and carrying out normalization judgment on the entities in the preset entity data set so as to cluster the same entities. According to the embodiment, the user only needs to input the rule parameters related to the entity normalization strategy, the program codes corresponding to the entity normalization strategy can be automatically generated, user programming is not needed, labor development cost and learning cost are reduced, the threshold of data production is reduced, the entity normalization strategy is convenient to modify, the efficiency of entity normalization processing is improved, and the method and the device can be applied to entity normalization processing of data in any field.
The present application also provides a computer program comprising program code which, when executed by a computer, performs the entity normalization processing method as described in the above embodiments.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (13)

1. An entity normalization processing method, which is characterized by comprising the following steps:
receiving visual entity normalization strategy related rule parameters input by a user, wherein the rule parameters are input by the user through a first user interaction interface, and comprise at least one target attribute to be compared, comparison condition parameters corresponding to the target attribute and comparison rules combined among comparison conditions corresponding to the target attributes;
Generating program codes corresponding to entity normalization strategies according to the rule parameters and preset code generation rules;
running a program code corresponding to the entity normalization strategy, and carrying out normalization judgment on the entities in a preset entity data set so as to cluster the same entities;
the generating the program code corresponding to the entity normalization strategy according to the rule parameters and the preset code generation rule comprises the following steps:
aiming at any target attribute to be compared, acquiring a comparison function of the target attribute according to the type of the target attribute and a comparison condition parameter corresponding to the target attribute;
calling a corresponding comparison function according to each comparison rule, and determining a logic operation type to obtain a program code of the comparison rule;
and obtaining the program codes corresponding to the entity normalization strategy according to the program codes of each comparison rule.
2. The method of claim 1, wherein the comparison condition parameters corresponding to the target attribute include a type of the target attribute, a comparison condition corresponding to the target attribute, and a comparison process severity.
3. The method according to claim 2, wherein the obtaining the comparison function of the target attribute according to the type of the target attribute and the comparison condition parameter corresponding to the target attribute includes:
Determining comparison method parameters in the comparison function according to the type of the target attribute;
determining a multi-value comparison condition parameter in the comparison function according to the comparison process strictness, wherein the multi-value comparison condition parameter comprises: multiple values are identical, at least one of the same, or different;
determining supplementary parameters in the comparison function according to the comparison conditions and/or a preset data cleaning instruction;
and obtaining a comparison function of the target attribute according to the target attribute, the comparison method parameter, the multi-value comparison condition parameter and the supplementary parameter.
4. The method according to claim 1, wherein the obtaining the program code corresponding to the entity normalization policy according to the program code of each comparison rule includes:
and receiving the priority order of the comparison rules set by the user, and setting the priority of the program codes of each comparison rule according to the priority order of the comparison rules so as to operate the program codes of each comparison rule according to the priority when operating the program codes corresponding to the entity normalization strategy.
5. The method of claim 1, wherein the running the program code corresponding to the entity normalization policy further comprises:
Receiving a starting instruction of a user, and running a program code corresponding to the entity normalization strategy according to the starting instruction; and/or
Receiving a stopping instruction of a user, and stopping running program codes corresponding to the entity normalization strategy according to the stopping instruction;
after the same entity is clustered, the method further comprises the following steps:
and receiving a checking result instruction of a user, and displaying a clustering result according to the checking result instruction.
6. An entity normalization processing device, comprising:
the input module is used for receiving visual entity normalization strategy related rule parameters input by a user, wherein the rule parameters comprise at least one target attribute to be compared, comparison condition parameters corresponding to the target attribute and comparison rules combined among comparison conditions corresponding to the target attributes;
the processing module is used for generating a program code corresponding to the entity normalization strategy according to the rule parameters and a preset code generation rule;
the operation module is used for operating the program codes corresponding to the entity normalization strategy, and carrying out normalization judgment on the entities in the preset entity data set so as to cluster the same entities;
The processing module is used for:
aiming at any target attribute to be compared, acquiring a comparison function of the target attribute according to the type of the target attribute and a comparison condition parameter corresponding to the target attribute;
calling a corresponding comparison function according to each comparison rule, and determining a logic operation type to obtain a program code of the comparison rule;
and obtaining the program codes corresponding to the entity normalization strategy according to the program codes of each comparison rule.
7. The apparatus of claim 6, wherein the comparison condition parameters corresponding to the target attribute include a type of the target attribute, a comparison condition corresponding to the target attribute, and a comparison process severity.
8. The apparatus of claim 7, wherein the processing module is to:
determining a comparison device parameter in the comparison function according to the type of the target attribute;
determining a multi-value comparison condition parameter in the comparison function according to the comparison process strictness, wherein the multi-value comparison condition parameter comprises: multiple values are identical, at least one of the same, or different;
determining supplementary parameters in the comparison function according to the comparison conditions and/or a preset data cleaning instruction;
And obtaining a comparison function of the target attribute according to the target attribute, the comparison device parameter, the multi-value comparison condition parameter and the supplementary parameter.
9. The apparatus of claim 6, wherein the processing module is to:
and receiving the priority order of the comparison rules set by the user, and setting the priority of the program codes of each comparison rule according to the priority order of the comparison rules so as to operate the program codes of each comparison rule according to the priority when operating the program codes corresponding to the entity normalization strategy.
10. The apparatus of claim 6, wherein the device comprises a plurality of sensors,
the input module is also used for receiving a starting instruction of a user;
the operation module is also used for operating the program codes corresponding to the entity normalization strategy according to the starting instruction; and/or
The input module is also used for receiving a stopping instruction of a user;
the operation module is further used for stopping operating the program codes corresponding to the entity normalization strategy according to the stopping instruction;
the input module is also used for receiving a viewing result instruction of a user;
the operation module is also used for displaying the clustering result according to the checking result instruction.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.
13. An entity normalization processing method, which is characterized by comprising the following steps:
receiving visual entity normalization strategy related rule parameters input by a user, wherein the rule parameters are input by the user through a first user interaction interface, and comprise at least one target attribute to be compared, comparison condition parameters corresponding to the target attribute and comparison rules combined among comparison conditions corresponding to the target attributes;
acquiring an entity normalization strategy according to the rule parameters;
and carrying out normalization judgment on the entities in the preset entity data set according to the entity normalization strategy, and outputting normalization judgment results.
CN201911379440.1A 2019-12-27 2019-12-27 Entity normalization processing method, device, equipment and storage medium Active CN111158666B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911379440.1A CN111158666B (en) 2019-12-27 2019-12-27 Entity normalization processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911379440.1A CN111158666B (en) 2019-12-27 2019-12-27 Entity normalization processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111158666A CN111158666A (en) 2020-05-15
CN111158666B true CN111158666B (en) 2023-07-04

Family

ID=70558565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911379440.1A Active CN111158666B (en) 2019-12-27 2019-12-27 Entity normalization processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111158666B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112910923A (en) * 2021-03-04 2021-06-04 麦荣章 Intelligent financial big data processing system
CN113295842A (en) * 2021-04-08 2021-08-24 湖南科技大学 Accurate evaluation system of mine side slope rock mass engineering stability
CN113190670A (en) * 2021-05-08 2021-07-30 重庆第二师范学院 Information display method and system based on big data platform
CN114167198B (en) * 2021-10-18 2024-03-01 国网山东省电力公司平原县供电公司 Method and platform for measuring synchronous line loss data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050162A (en) * 2013-03-11 2014-09-17 富士通株式会社 Data processing method and data processing device
CN107562859A (en) * 2017-08-29 2018-01-09 武汉斗鱼网络科技有限公司 A kind of disaggregated model training system and its implementation
CN107632842A (en) * 2017-09-26 2018-01-26 携程旅游信息技术(上海)有限公司 Rule configuration and dissemination method, system, equipment and storage medium
CN108469977A (en) * 2018-03-26 2018-08-31 张�林 A kind of interface data management method
CN108804093A (en) * 2018-06-15 2018-11-13 联想(北京)有限公司 A kind of code generating method and electronic equipment
CN109582837A (en) * 2018-11-30 2019-04-05 长城计算机软件与系统有限公司 A kind of visualized data processing method based on cloud and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100576207C (en) * 2007-05-29 2009-12-30 北大方正集团有限公司 Remove the method for repeating objects based on metadata
US9411864B2 (en) * 2008-08-26 2016-08-09 Zeewise, Inc. Systems and methods for collection and consolidation of heterogeneous remote business data using dynamic data handling
US9886084B2 (en) * 2014-11-11 2018-02-06 Intel Corporation User input via elastic deformation of a material

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050162A (en) * 2013-03-11 2014-09-17 富士通株式会社 Data processing method and data processing device
CN107562859A (en) * 2017-08-29 2018-01-09 武汉斗鱼网络科技有限公司 A kind of disaggregated model training system and its implementation
CN107632842A (en) * 2017-09-26 2018-01-26 携程旅游信息技术(上海)有限公司 Rule configuration and dissemination method, system, equipment and storage medium
CN108469977A (en) * 2018-03-26 2018-08-31 张�林 A kind of interface data management method
CN108804093A (en) * 2018-06-15 2018-11-13 联想(北京)有限公司 A kind of code generating method and electronic equipment
CN109582837A (en) * 2018-11-30 2019-04-05 长城计算机软件与系统有限公司 A kind of visualized data processing method based on cloud and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"An Integrated Healthcare Information System for End-to-End Standardized Exchange and Homogeneous Management of Digital ECG Formats";Jesús Daniel Trigo;《 IEEE Transactions on Information Technology in Biomedicine 》;第16卷(第4期);第518-529页 *
"基于变值测量的心电数据序列可视化应用研究";吉艳;《中国优秀硕士学位论文全文数据库 医药卫生科技辑》(2017年第02期);第E062-34页 *

Also Published As

Publication number Publication date
CN111158666A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
CN111158666B (en) Entity normalization processing method, device, equipment and storage medium
JP2021082308A (en) Multimodal content processing method, apparatus, device and storage medium
CN110806923B (en) Parallel processing method and device for block chain tasks, electronic equipment and medium
JP2022018095A (en) Multi-modal pre-training model acquisition method, apparatus, electronic device and storage medium
US20210287044A1 (en) Method for updating parameter of model, distributed training system and electric device
JP7095209B2 (en) Methods, Programs and Devices for Pre-Training Graph Neural Networks
JP7269913B2 (en) Knowledge graph construction method, device, electronic device, storage medium and computer program
JP2021111417A (en) Method, device, electronic apparatus, and storage medium for extracting spo
CN112270413B (en) Operator merging method, device, electronic equipment and storage medium
EP3822815A1 (en) Method and apparatus for mining entity relationship, electronic device, storage medium, and computer program product
CN112560499B (en) Pre-training method and device for semantic representation model, electronic equipment and storage medium
CN111061743B (en) Data processing method and device and electronic equipment
CN111666372B (en) Method, device, electronic equipment and readable storage medium for analyzing query word query
CN111126063B (en) Text quality assessment method and device
CN111340219A (en) Neural network model searching method and device, image processing method and processor
CN112016524B (en) Model training method, face recognition device, equipment and medium
CN112580723B (en) Multi-model fusion method, device, electronic equipment and storage medium
CN112561332B (en) Model management method, device, electronic equipment, storage medium and program product
CN111125451B (en) Data production processing method and device, electronic equipment and storage medium
JP2022013658A (en) Optimizer learning method and apparatus, electronic device, readable storage medium, and computer program
CN111611364B (en) Intelligent response method, device, equipment and storage medium
JP2021128779A (en) Method, device, apparatus, and storage medium for expanding data
CN112817582A (en) Code processing method and device, computer equipment and storage medium
CN111783872B (en) Method, device, electronic equipment and computer readable storage medium for training model
CN112270412B (en) Network operator processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant