WO2021151303A1

WO2021151303A1 - Named entity alignment device and apparatus, and electronic device and readable storage medium

Info

Publication number: WO2021151303A1
Application number: PCT/CN2020/119085
Authority: WO
Inventors: 阮晓雯; 邓攀; 徐亮; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-06-19
Filing date: 2020-09-29
Publication date: 2021-08-05
Also published as: CN111738005A

Abstract

The invention relates to big data technology, and disclosed therein is a named entity alignment method, comprising: performing standardization processing on named entities to be aligned to obtain standard named entities to be aligned (S1) performing sampling processing on the named entity test set to obtain named entity test sub-sets (S2); using each named entity test sub-set to train a pre-set neural network models to obtain a named entity alignment model set (S3); on the basis of the named entity alignment model set, performing model alignment on the named entities to be aligned, to obtain an alignment result (S4). The present invention further relates to blockchain technology. The data used for model training can be stored in a blockchain. Further provided are a named entity alignment apparatus, an electronic device and a computer-readable storage medium. The present method is able to improve the accuracy of alignment of named entities.

Description

Named entity alignment method, device, electronic equipment and readable storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 19, 2020, the application number is 202010564906.1, and the invention title is "Named Entity Alignment Method, Apparatus, Electronic Equipment, and Readable Storage Medium", and its entire contents Incorporated in this application by reference.

Technical field

This application relates to the field of big data, and in particular to a method, device, electronic device, and readable storage medium for aligning named entities.

Background technique

With the advent of the era of big data, how to efficiently acquire and process the knowledge in it is an important research topic. The research on named entity alignment in the field of natural language processing aims to unify and standardize different expressions of the same concept, which can greatly facilitate users' understanding and application of knowledge.

The inventor realizes that there are currently two main types of named entity alignment methods. One is based on the morphological features of different entities, but some morphological feature alignments lose their semantic features and have low accuracy; the other is based on entities for semantics. Alignment requires a lot of training data for training, but the training data is not easy to obtain, which leads to the low accuracy of this method.

Summary of the invention

A named entity alignment method provided by this application includes:

Acquiring a named entity to be aligned, and standardizing the named entity to be aligned to obtain a standard named entity to be aligned;

Acquire a test named entity set, perform sampling processing on the test named entity set, and obtain a test named entity subset;

Use each test named entity subset to train a preset neural network model to obtain a named entity alignment model set;

Perform model alignment on the named entities to be aligned according to the named entity alignment model set to obtain an alignment result.

The present application also provides a named entity alignment device, the device includes:

The standardization module is used to obtain a named entity to be aligned, and perform standardization processing on the named entity to be aligned to obtain a standard named entity to be aligned;

The model training module is used to obtain a test named entity set, sample the test named entity set to obtain a test named entity subset; use each test named entity subset to train a preset neural network model to obtain a named entity alignment Model collection

The model alignment module is configured to perform model alignment on the named entity to be aligned according to the named entity alignment model set to obtain an alignment result.

This application also provides an electronic device, which includes:

Memory, storing at least one instruction; and

The processor executes the instructions stored in the memory to implement the following steps:

This application also provides a computer-readable storage medium, including a storage data area and a storage program area. The storage data area stores data created according to the use of blockchain nodes, and the storage program area stores a computer program, which is readable by the computer. At least one instruction is stored in the storage medium, and the at least one instruction is executed by the processor in the electronic device to implement the following steps:

Description of the drawings

FIG. 1 is a schematic flowchart of a named entity alignment method provided by an embodiment of this application;

2 is a schematic diagram of modules of a named entity alignment device provided by an embodiment of this application;

3 is a schematic diagram of the internal structure of an electronic device for implementing a named entity alignment method provided by an embodiment of the application;

The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

Detailed ways

It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.

This application provides a named entity alignment method. Referring to FIG. 1, it is a schematic flowchart of a named entity alignment method provided by an embodiment of this application. The method can be executed by a device, and the device can be implemented by software and/or hardware.

In this embodiment, the named entity alignment method includes:

S1. Obtain a named entity to be aligned, and perform standardization processing on the named entity to be aligned to obtain a standard named entity to be aligned;

In the embodiment of the present application, the named entities are names of persons, organizations, places, and all other entities identified by names, and the named entities to be aligned are named entities that do not use a uniform identification. For example: "Ali Group" and "Alibaba" are different entity identification names, both of which represent the entity "Alibaba Network Technology Co., Ltd.", then "Ali Group" and "Alibaba" are named entities to be aligned. The named entity to be aligned can be obtained from the Internet.

Further, in the embodiment of the present application, standardization processing is performed on the named entity to be aligned to obtain the standard named entity to be aligned.

In detail, the standardization process includes: performing unified conversion between simplified and traditional Chinese, unified case conversion, and deletion of special characters on the named entity to be aligned to obtain the standard named entity to be aligned. Wherein, the special characters refer to meaningless symbols in the named entities to be aligned, such as spaces, brackets, etc.

S2. Obtain a test named entity set, and perform sampling processing on the test named entity set to obtain a test named entity subset;

In the embodiment of the present application, the test named entity set is a set of multiple named entities.

Preferably, in the embodiment of the present application, the data in the test named entity set is less and difficult to obtain, and the test named entity set is sampled to obtain the test named entity subset, which expands the data for subsequent model training.

S3. Use each of the test named entity subsets to train a preset neural network model to obtain a named entity alignment model set;

In this embodiment of the application, each test named entity in the test named entity subset is converted into a test named entity vector to obtain a test named entity vector subset, the test named entity vector subset is determined as a training set, and the test named entity vector subset is determined as a training set. The test named entity vector subset is labeled to obtain a label set, and the neural network model is trained using the training set and the label set to obtain a named entity alignment model.

Specifically, marking the test named entity vector subset in the embodiment of the present application includes:

S11. Convert the standard named entity in the pre-built standard named entity library into a standard named entity vector;

In detail, the standard named entity library is a collection of standard named entities, and the standard named entities are named entities that are officially uniformly identified.

S12. Use the standard named entity vector to mark the test named entity vector corresponding to the test named entity vector subset.

For example: the test named entity "Alibaba" in the test named entity subset corresponds to the standard named entity "Alibaba Network Technology Co., Ltd." in the pre-built standard named entity library, then the standard named entity "Alibaba Network Technology" is used The standard named entity vector transformed by "Technology Co., Ltd." marks the test named entity vector transformed by the test named entity "Alibaba" in the test named entity vector subset.

In detail, in the embodiment of the present application, the above-mentioned sampling process can obtain multiple test named entity subsets, and each test named entity subset can train a neural network model to obtain the named entity alignment model, for example: The test named entity set is sampled to obtain 5 test named entity subsets, each test named entity subset is trained with a preset neural network model to obtain a named entity alignment model, and a total of 5 named entity alignment models are obtained.

Preferably, the test named entity subset described in the embodiment of this application has less data, and a too deep neural network will cause the model to be over-fitted and the model effect is poor. Therefore, the neural network model described in the embodiment of this application can be used as a shallow layer. Convolutional neural network is constructed.

In detail, using the training set and the label set to train the neural network model includes:

A: Perform a convolution pooling operation on the training set according to the preset number of convolution pooling to obtain a dimensionality reduction data set;

B: Perform a deconvolution operation on the dimensionality reduction data set according to the preset number of deconvolutions to obtain an increase dimensionality data set;

C: Use a preset activation function to calculate the dimension-up data set to obtain a predicted value, and use the predicted value and the label value contained in the label set as the input parameters of the pre-built loss function to calculate the loss value;

D: Compare the magnitude of the loss value with the preset loss threshold, if the loss value is greater than or equal to the loss threshold, return to A; if the loss value is less than the loss threshold, obtain the named entity alignment model .

Further, all the named entity alignment models are summarized to obtain the named entity alignment model set.

In another embodiment of the present application, the training data of each model in the named entity alignment model set can be stored in the blockchain.

S4. Perform model alignment on the named entity to be aligned according to the named entity alignment model set to obtain an alignment result.

In the embodiment of the present application, in order to improve the efficiency of alignment, before performing model alignment on the named entity to be aligned according to the named entity alignment model set, the standard named entity to be aligned is used to form a pre-built standard named entity library. Alignment, wherein the standard named entity library is a collection of standard named entities, and the standard named entity is a named entity with an official uniform identification.

In detail, the use of the standard named entities to be aligned to perform morphological alignment in a pre-built standard named entity library includes:

S21: Use the standard named entities to be aligned to perform morphological alignment in a pre-built standard named entity library, and if the morphological alignment is successful, obtain the alignment result;

S22: If the morphological alignment is unsuccessful, perform model alignment on the standard named entity to be aligned according to the named entity alignment model set.

Preferably, the embodiment of the present application uses the edit distance method to perform the morphological alignment.

In detail, the morphological alignment includes:

S211: Calculate the edit distance between the standard named entity to be aligned and each standard named entity in the standard named entity library;

S212: When there is a target edit distance equal to a preset edit distance value in the edit distance, it is determined that the alignment is successful, and the standard named entity corresponding to the target edit distance is selected as the alignment result.

Preferably, the preset edit distance value is 0.

In detail, the edit distance refers to at least how many times of processing is required to change one character string into another character string. For example: the standard named entity to be aligned is "Alibaba Network Technology Co., Ltd.", and the standard named entity library contains "Alibaba Network Technology Co., Ltd.", then "Alibaba Network Technology Co., Ltd." needs 0 The second processing can be converted to "Alibaba Network Technology Co., Ltd.", so the edit distance between the two is 0.

Further, in this embodiment of the present application, performing model alignment on the named entity to be aligned according to the named entity alignment model set includes:

S31. Convert each word in the standard to-be-aligned named entity into a word vector of a predetermined dimension, and calculate the average value of the word vectors corresponding to all the words in the standard to-be-aligned named entity to obtain a standard to-be-aligned named entity vector;

Preferably, in the embodiment of the present application, each word in the standard named entity to be aligned is converted into a word vector of a predetermined dimension by using an embedding (word embedding) method.

S32. Use each named entity alignment model in the named entity alignment model set to perform alignment processing on the standard to-be-aligned named entity vector to obtain a predicted alignment entity vector;

S33. Convert each standard named entity in the standard named entity library into a standard named entity vector, and summarize all the standard named entity vectors to obtain a standard named entity vector library;

In detail, in the embodiment of the present application, converting each standard named entity in the standard named entity library into a standard named entity vector includes:

S331: Convert each word in the standard named entity into a word vector of a predetermined dimension;

S332: Calculate the average value of the word vectors corresponding to all characters in the standard named entity to obtain the standard named entity vector.

S34. Perform similarity calculation and analysis processing on the predicted alignment entity vector and each standard named entity vector in the standard named entity vector library to obtain the alignment result.

Further, in the embodiment of the present application, the similarity calculation analysis processing includes:

S41. Calculate the similarity value between the predicted aligned entity vector and each of the standard named entity vectors in the standard named entity vector library;

Preferably, the embodiment of the present application uses cosine similarity to calculate the similarity value.

In detail, the similarity value can be calculated by the following formula:

Where x represents the predicted aligned entity vector, y represents the standard named entity vector, x _i represents the i-th vector value of the predicted aligned entity vector, and y _i represents the i-th vector of the standard named entity vector The value, i is a positive integer, and n represents the vector dimension of the predicted alignment entity vector and the standard named entity vector.

S42. Summarize all the similarity values to obtain a similarity set, and determine the largest similarity value in the similarity set;

S43. Select the standard named entity vector corresponding to the maximum similarity value in the standard named entity vector library as a target vector, and select the standard named entity corresponding to the target vector in the standard named entity library as a target vector. Alignment result

S44. Summarize all the results to be aligned to obtain the set of results to be aligned;

S45. Use a majority voting mechanism to screen the set of results to be aligned to obtain the alignment result.

Specifically, in this embodiment of the present application, using a majority voting mechanism to screen the entities to be aligned in the result set to be aligned includes:

S51. Record the number of occurrences of each result to be aligned in the set of results to be aligned; select the result to be aligned with the most occurrences as the candidate alignment result, and determine the number of the candidate alignment results; if the number is If it is one, the candidate alignment result is determined as the alignment result.

In the embodiment of this application, for example, there are five results to be aligned in the set of results to be aligned, of which 3 are "Alibaba" and 2 are "Ali Company", so "Alibaba" has the most number of times, so only There is one candidate alignment result "Alibaba", so the number of the candidate alignment results is one, and the candidate alignment result "Alibaba" is the alignment result.

S52. If the number is greater than one, summarize the similarity values corresponding to each result to be aligned in the result set to be aligned to obtain a similarity set of results to be aligned, and select the largest similarity value in the similarity set of results to be aligned The corresponding result to be aligned is used as the alignment result.

In the embodiment of this application, for example, there are five results to be aligned in the set of results to be aligned, including 2 "Alibaba", 2 "Ali Company", 1 "Ali Group", then "Alibaba" "" and "Ali Company" have the largest number of times, so there are only two candidate alignment results "Alibaba" and "Ali Company", so the number of candidate alignment results is two greater than one, select these five results to be aligned The alignment result corresponding to the similarity is used as the alignment result.

In the embodiment of the application, the standardized processing of the named entities to be aligned is performed to obtain the standardized named entities to be aligned, and the influence of the aligned named entity format and irrelevant characters is eliminated; the test named entity set is sampled and processed to obtain the test named entity subset , Use each of the test named entity subsets to train a preset neural network model to obtain a named entity alignment model set, use the multiple test named entity subsets obtained by sampling to train to obtain a named entity alignment model set, and align according to the named entity The model set performs model alignment on the named entities to be aligned, and uses multiple models to perform model alignment respectively, which improves the accuracy of named entity alignment.

As shown in Fig. 2, it is a functional block diagram of the named entity alignment device of the present application.

The named entity alignment apparatus 100 described in this application can be installed in an electronic device. According to the implemented functions, the named entity alignment device may include a standardization module 101, a model training module 102, and a model alignment module 103. The module described in the present invention can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can complete fixed functions, and are stored in the memory of the electronic device.

In this embodiment, the functions of each module/unit are as follows:

The standardization module 101 is configured to obtain a named entity to be aligned, and perform standardization processing on the named entity to be aligned to obtain a standard named entity to be aligned.

In the embodiment of the present application, the named entity is a person's name, an organization name, a place name, and all other entities identified by a name, and the named entity to be aligned is a named entity that does not use a uniform identification. For example: "Ali Group" and "Alibaba" are different entity identification names, both of which refer to the entity "Alibaba Network Technology Co., Ltd.", then "Ali Group" and "Alibaba" are named entities to be aligned. The named entity to be aligned can be obtained from the Internet.

In detail, the standardization process performed by the standardization module 101 includes: performing unified conversion between simplified and traditional characters, unified case conversion, and deletion of special characters on the named entity to be aligned to obtain the standard named entity to be aligned. Wherein, the special characters refer to meaningless symbols in the named entities to be aligned, such as spaces, brackets, etc.

The model training module 102 is used to obtain a test named entity set, perform sampling processing on the test named entity set to obtain a test named entity subset; use each of the test named entity subsets to train a preset neural network model to obtain Named entity alignment model collection.

Preferably, the test named entity set in the embodiment of the present application has less data in the test named entity set and is difficult to obtain. The model training module 102 performs sampling processing on the test named entity set to obtain the test named entity subset and expand subsequent models. Training data.

The model training module 102 described in this embodiment of the application converts each test named entity in the test named entity subset into a test named entity vector to obtain a test named entity vector subset; the model training module 102 names the test The entity vector subset is determined as the training set; the model training module 102 marks the test named entity vector subset to obtain a label set; the model training module 102 uses the training set and the label set to The neural network model is trained to obtain a named entity alignment model.

Specifically, the model training module 102 in the embodiment of the present application uses the following means to mark the subset of test named entity vectors:

Convert the standard named entity in the pre-built standard named entity library into a standard named entity vector;

The standard named entity vector is used to mark the corresponding test named entity vector in the test named entity vector subset.

In detail, the model training module 102 uses the following means to train the neural network model:

C: Use a preset activation function to calculate the dimension-up data set to obtain a predicted value, and use the predicted value and the label value contained in the label set as input parameters of the pre-built loss function to calculate the loss value;

Further, the model training module 102 summarizes all the named entity alignment models to obtain the named entity alignment model set.

In another embodiment of the present application, the data used for training of each model in the named entity alignment model set can be stored in a blockchain.

The model alignment module 103 is configured to perform model alignment on the named entity to be aligned according to the named entity alignment model set to obtain an alignment result.

In the embodiment of the present application, in order to improve the efficiency of alignment, the model alignment module 103 uses the pre-built standard naming of the standard named entities to be aligned before performing model alignment on the named entities to be aligned according to the named entity alignment model set. The morphological alignment is performed in the entity library, wherein the standard named entity library is a collection of standard named entities, and the standard named entity is a named entity with an official uniform identification.

In detail, the model alignment module 103 uses the following methods to perform morphological alignment in a pre-built standard named entity library:

Use the standard named entities to be aligned to perform morphological alignment in a pre-built standard named entity library, and if the morphological alignment is successful, obtain the alignment result;

If the morphological alignment is unsuccessful, perform model alignment on the standard named entity to be aligned according to the named entity alignment model set.

In detail, the model alignment module 103 uses the following methods to perform morphing:

Calculating the edit distance between the standard named entity to be aligned and each standard named entity in the standard named entity library;

When there is a target edit distance equal to a preset edit distance value in the edit distance, it is determined that the alignment is successful, and the standard named entity corresponding to the target edit distance is selected as the alignment result.

Preferably, the preset edit distance value is 0.

Further, in the embodiment of the present application, the model alignment module 103 uses the following methods to perform model alignment on the named entities to be aligned:

Convert each word in the standard named entity to be aligned into a word vector of a predetermined dimension, and calculate an average value of the word vectors corresponding to all words in the standard named entity to be aligned to obtain a standard named entity vector to be aligned;

Preferably, in the embodiment of the present application, an embedding (word embedding) method is used to convert each character in the standard named entity to be aligned into a character vector of a predetermined dimension.

Using each named entity alignment model in the named entity alignment model set to perform alignment processing on the standard to-be-aligned named entity vector to obtain a predicted alignment entity vector;

Converting each standard named entity in the standard named entity library into a standard named entity vector, and summarizing all the standard named entity vectors to obtain a standard named entity vector library;

In detail, in the embodiment of the present application, the model alignment module 103 uses the following means to convert each standard named entity in the standard named entity library into a standard named entity vector:

Converting each word in the standard named entity into a word vector of a predetermined dimension;

Calculating the average value of the word vectors corresponding to all characters in the standard named entity to obtain the standard named entity vector.

Perform similarity calculation and analysis processing on the predicted aligned entity vector and each standard named entity vector in the standard named entity vector library to obtain the alignment result.

Further, in the embodiment of the present application, the model alignment module 103 uses the following means to perform similarity calculation analysis processing:

Calculating a similarity value between the predicted aligned entity vector and each of the standard named entity vectors in the standard named entity vector library;

Preferably, the model alignment module 103 according to the embodiment of the present application uses cosine similarity to calculate the similarity value.

In detail, the model alignment module 103 uses the following formula to calculate the similarity value:

Summarize all the similarity values to obtain a similarity set, and determine the largest similarity value in the similarity set;

The standard named entity vector corresponding to the maximum similarity value in the standard named entity vector library is selected as a target vector, and the standard named entity corresponding to the target vector in the standard named entity library is selected as the result to be aligned ；

Summarize all the results to be aligned to obtain the set of results to be aligned;

The majority voting mechanism is used to screen the set of results to be aligned to obtain the alignment result.

Specifically, in the embodiment of the present application, the model alignment module 103 uses the following methods to screen the result entities to be aligned in the result set to be aligned:

Record the number of occurrences of each result to be aligned in the set of results to be aligned; select the result to be aligned with the most occurrences as the candidate alignment result, and determine the number of the candidate alignment results; if the number is one , Determining the candidate alignment result as the alignment result.

If the number is greater than one, summarize the similarity value corresponding to each result to be aligned in the result set to be aligned to obtain the similarity set of the result to be aligned, and select the one corresponding to the largest similarity value in the similarity set of the result to be aligned The result to be aligned is used as the alignment result.

In the embodiment of this application, for example, there are five results to be aligned in the set of results to be aligned, including 2 "Alibaba", 2 "Ali Company", 1 "Ali Group", then "Alibaba" "" and "Ali Company" have the largest number of times, so there are only two candidate alignment results "Alibaba" and "Ali Company", so the number of candidate alignment results is two greater than one, and the five selected The alignment result corresponding to the similarity among the results to be aligned is used as the alignment result.

As shown in FIG. 3, it is a schematic diagram of the structure of an electronic device implementing the named entity alignment method of the present application.

The electronic device 1 may include a processor 10, a memory 11, and a bus, and may also include a computer program stored in the memory 11 and running on the processor 10, such as a named entity alignment program.

Wherein, the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (such as SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc. The memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, for example, a mobile hard disk of the electronic device 1. In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart media card (SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 can be used not only to store application software and various types of data installed in the electronic device 1, such as the code of a named entity alignment program, etc., but also to temporarily store data that has been obtained or will be obtained.

The processor 10 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Combinations of central processing unit (CPU), microprocessor, digital processing chip, graphics processor, and various control chips, etc. The processor 10 is the control unit of the electronic device, which uses various interfaces and lines to connect the various components of the entire electronic device, and runs or executes programs or modules (for example, named Entity alignment program, etc.), and call the data stored in the memory 11 to execute various functions of the electronic device 1 and process data.

The bus may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus and so on. The bus is configured to implement connection and communication between the memory 11 and at least one processor 10 and the like.

FIG. 3 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Components, or combinations of certain components, or different component arrangements.

For example, although not shown, the electronic device 1 may also include a power source (such as a battery) for supplying power to various components. Preferably, the power source may be logically connected to the at least one processor 10 through a power management device, thereby controlling power The device implements functions such as charge management, discharge management, and power consumption management. The power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators. The electronic device 1 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

Further, the electronic device 1 may also include a network interface. Optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.

Optionally, the electronic device 1 may also include a user interface. The user interface may be a display (Display) and an input unit (such as a keyboard (Keyboard)). Optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc. Among them, the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.

It should be understood that the embodiments are only for illustrative purposes, and are not limited by this structure in the scope of the patent application.

The named entity alignment program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can realize:

Specifically, for the specific implementation method of the above-mentioned instructions by the processor 10, reference may be made to the description of the relevant steps in the embodiment corresponding to FIG. 1, which will not be repeated here.

Further, if the integrated module/unit of the electronic device 1 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. It can be non-volatile or volatile. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .

Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function, etc.; the storage data area may store a block chain node Use the created data, etc.

In the several embodiments provided in this application, it should be understood that the disclosed equipment, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.

The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware, or in the form of hardware plus software functional modules.

For those skilled in the art, it is obvious that the present application is not limited to the details of the foregoing exemplary embodiments, and the present application can be implemented in other specific forms without departing from the spirit or basic characteristics of the present application.

Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of this application is defined by the appended claims rather than the above description, and therefore it is intended to fall into the claims. All changes in the meaning and scope of the equivalent elements of are included in this application. Any reference signs in the claims should not be regarded as limiting the claims involved.

The blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

In addition, it is obvious that the word "including" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units or devices stated in the system claims can also be implemented by one unit or device through software or hardware. The second class words are used to indicate names, and do not indicate any specific order.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application and not to limit them. Although the application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the application can be Make modifications or equivalent replacements without departing from the spirit and scope of the technical solution of the present application.

Claims

A named entity alignment method, wherein the method includes:

Acquiring a named entity to be aligned, and standardizing the named entity to be aligned to obtain a standard named entity to be aligned;

Acquire a test named entity set, perform sampling processing on the test named entity set, and obtain a test named entity subset;

Use each test named entity subset to train a preset neural network model to obtain a named entity alignment model set;

Perform model alignment on the named entities to be aligned according to the named entity alignment model set to obtain an alignment result.
The named entity alignment method according to claim 1, wherein said training a preset neural network model using each of said test named entity subsets to obtain a named entity alignment model set comprises:

Converting each test named entity in the test named entity subset into a test named entity vector to obtain a test named entity vector subset;

Determining the subset of test named entity vectors as a training set;

Marking the subset of test named entity vectors to obtain a label set;

Training the neural network model by using the training set and the label set to obtain a named entity alignment model;

Summarize all the named entity alignment models to obtain the named entity alignment model set.
5. The named entity alignment method according to claim 1, wherein before said performing model alignment on said standard named entity to be aligned according to the named entity alignment model set, the method further comprises:

Use the standard named entities to be aligned to perform morphological alignment in a pre-built standard named entity library, and if the morphological alignment is successful, obtain the alignment result;

If the morphological alignment is unsuccessful, perform model alignment on the standard named entity to be aligned according to the named entity alignment model set.
3. The named entity alignment method according to claim 3, wherein the standard named entity to be aligned is used to perform morphological alignment in a pre-built standard named entity library, and if the morphological alignment is successful, the alignment result is obtained, include:

Calculating the edit distance between the standard named entity to be aligned and each standard named entity in the standard named entity library;

When there is a target edit distance equal to the preset edit distance value in the edit distance, it is determined that the alignment is successful, and a standard named entity corresponding to the target edit distance is selected as the alignment result.
5. The named entity alignment method according to claim 3, wherein said performing model alignment on said named entity to be aligned according to a named entity alignment model set to obtain an alignment result comprises:

Convert each word in the standard named entity to be aligned into a word vector of a predetermined dimension, and calculate an average value of the word vectors corresponding to all words in the standard named entity to be aligned to obtain a standard named entity vector to be aligned;

Using each named entity alignment model in the named entity alignment model set to perform alignment processing on the standard to-be-aligned named entity vector to obtain a predicted alignment entity vector;

Converting each standard named entity in the standard named entity library into a standard named entity vector, and summarizing all the standard named entity vectors to obtain a standard named entity vector library;

Perform similarity calculation and analysis processing on the predicted aligned entity vector and each standard named entity vector in the standard named entity vector library to obtain the alignment result.
The named entity alignment method according to claim 5, wherein the said predicted aligned entity vector and each of the standard named entity vectors in the standard named entity vector library are similarly calculated and analyzed to obtain an alignment result ,include:

Calculating a similarity value between the predicted aligned entity vector and each of the standard named entity vectors in the standard named entity vector library;

Summarize all the similarity values to obtain a similarity set;

Determine the maximum similarity value in the similarity set;

Selecting the standard named entity vector corresponding to the maximum similarity value in the standard named entity vector library as the target vector;

Selecting the standard named entity corresponding to the target vector in the standard named entity library as the result to be aligned;

Summarize all the results to be aligned to obtain a set of results to be aligned;

The majority voting mechanism is used to screen the set of results to be aligned to obtain the alignment result.
7. The named entity alignment method according to claim 6, wherein said using a majority voting mechanism to filter said set of results to be aligned to obtain an alignment result comprises:

Record the number of occurrences of each result to be aligned in the result set to be aligned;

Select the result to be aligned with the most occurrences as the candidate alignment result;

Determining the number of candidate alignment results;

If the number is one, determine the candidate alignment result as the alignment result;

If the number is greater than one, summarize the similarity value corresponding to each result to be aligned in the result set to be aligned to obtain the similarity set of the result to be aligned, and select the one corresponding to the largest similarity value in the similarity set of the result to be aligned The result to be aligned is used as the alignment result.
A named entity alignment device, wherein the device includes:

The standardization module is used to obtain a named entity to be aligned, and perform standardization processing on the named entity to be aligned to obtain a standard named entity to be aligned;

The model training module is used to obtain a test named entity set, sample the test named entity set to obtain a test named entity subset; use each test named entity subset to train a preset neural network model to obtain a named entity alignment Model collection

The model alignment module is configured to perform model alignment on the named entity to be aligned according to the named entity alignment model set to obtain an alignment result.
An electronic device, wherein the electronic device includes:

At least one processor; and,

A memory communicatively connected with the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the following steps:

Acquiring a named entity to be aligned, and standardizing the named entity to be aligned to obtain a standard named entity to be aligned;

Acquire a test named entity set, perform sampling processing on the test named entity set, and obtain a test named entity subset;

Use each test named entity subset to train a preset neural network model to obtain a named entity alignment model set;

Perform model alignment on the named entities to be aligned according to the named entity alignment model set to obtain an alignment result.
The electronic device according to claim 9, wherein said training a preset neural network model using each subset of said test named entities to obtain a named entity alignment model set comprises:

Converting each test named entity in the test named entity subset into a test named entity vector to obtain a test named entity vector subset;

Determining the subset of test named entity vectors as a training set;

Marking the subset of test named entity vectors to obtain a label set;

Training the neural network model by using the training set and the label set to obtain a named entity alignment model;

Summarize all the named entity alignment models to obtain the named entity alignment model set.
9. The electronic device according to claim 9, wherein before the model alignment of the standard named entities to be aligned according to the named entity alignment model set, the method further comprises:

Use the standard named entities to be aligned to perform morphological alignment in a pre-built standard named entity library, and if the morphological alignment is successful, obtain the alignment result;

If the morphological alignment is unsuccessful, perform model alignment on the standard named entity to be aligned according to the named entity alignment model set.
11. The electronic device according to claim 11, wherein said using said standard named entities to be aligned to perform morphological alignment in a pre-built standard named entity library, and if said morphological alignment is successful, obtaining said alignment result comprises:

Calculating the edit distance between the standard named entity to be aligned and each standard named entity in the standard named entity library;

When there is a target edit distance equal to the preset edit distance value in the edit distance, it is determined that the alignment is successful, and a standard named entity corresponding to the target edit distance is selected as the alignment result.
11. The electronic device according to claim 11, wherein said performing model alignment on said to-be-aligned named entity according to a named entity alignment model set to obtain an alignment result comprises:

Convert each word in the standard named entity to be aligned into a word vector of a predetermined dimension, and calculate an average value of the word vectors corresponding to all words in the standard named entity to be aligned to obtain a standard named entity vector to be aligned;

Using each named entity alignment model in the named entity alignment model set to perform alignment processing on the standard to-be-aligned named entity vector to obtain a predicted alignment entity vector;

Converting each standard named entity in the standard named entity library into a standard named entity vector, and summarizing all the standard named entity vectors to obtain a standard named entity vector library;

Perform similarity calculation and analysis processing on the predicted aligned entity vector and each standard named entity vector in the standard named entity vector library to obtain the alignment result.
The electronic device according to claim 13, wherein said performing similarity calculation and analysis processing on said predicted aligned entity vector and each of said standard named entity vectors in said standard named entity vector library to obtain an alignment result comprises :

Calculating a similarity value between the predicted aligned entity vector and each of the standard named entity vectors in the standard named entity vector library;

Summarize all the similarity values to obtain a similarity set;

Determine the maximum similarity value in the similarity set;

Selecting the standard named entity vector corresponding to the maximum similarity value in the standard named entity vector library as the target vector;

Selecting the standard named entity corresponding to the target vector in the standard named entity library as the result to be aligned;

Summarize all the results to be aligned to obtain a set of results to be aligned;

The majority voting mechanism is used to screen the set of results to be aligned to obtain the alignment result.
The electronic device according to claim 14, wherein said using a majority voting mechanism to filter said set of results to be aligned to obtain an alignment result comprises:

Record the number of occurrences of each result to be aligned in the result set to be aligned;

Select the result to be aligned with the most occurrences as the candidate alignment result;

Determining the number of candidate alignment results;

If the number is one, determine the candidate alignment result as the alignment result;

If the number is greater than one, summarize the similarity value corresponding to each result to be aligned in the result set to be aligned to obtain the similarity set of the result to be aligned, and select the one corresponding to the largest similarity value in the similarity set of the result to be aligned The result to be aligned is used as the alignment result.
A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the following steps:

Acquiring a named entity to be aligned, and standardizing the named entity to be aligned to obtain a standard named entity to be aligned;

Acquire a test named entity set, perform sampling processing on the test named entity set, and obtain a test named entity subset;

Use each test named entity subset to train a preset neural network model to obtain a named entity alignment model set;

Perform model alignment on the named entities to be aligned according to the named entity alignment model set to obtain an alignment result.
15. The computer-readable storage medium of claim 16, wherein said training a preset neural network model using each subset of said test named entities to obtain a named entity alignment model set comprises:

Converting each test named entity in the test named entity subset into a test named entity vector to obtain a test named entity vector subset;

Determining the subset of test named entity vectors as a training set;

Marking the subset of test named entity vectors to obtain a label set;

Training the neural network model by using the training set and the label set to obtain a named entity alignment model;

Summarize all the named entity alignment models to obtain the named entity alignment model set.
15. The computer-readable storage medium according to claim 16, wherein before the model alignment of the standard named entities to be aligned according to the named entity alignment model set, the method further comprises:

Use the standard named entities to be aligned to perform morphological alignment in a pre-built standard named entity library, and if the morphological alignment is successful, obtain the alignment result;

If the morphological alignment is unsuccessful, perform model alignment on the standard named entity to be aligned according to the named entity alignment model set.
The computer-readable storage medium according to claim 18, wherein the standard named entity to be aligned is used to perform morphological alignment in a pre-built standard named entity library, and if the morphological alignment is successful, the alignment result is obtained ,include:

Calculating the edit distance between the standard named entity to be aligned and each standard named entity in the standard named entity library;

When there is a target edit distance equal to the preset edit distance value in the edit distance, it is determined that the alignment is successful, and a standard named entity corresponding to the target edit distance is selected as the alignment result.
18. The computer-readable storage medium according to claim 18, wherein said performing model alignment on said named entity to be aligned according to a named entity alignment model set to obtain an alignment result comprises:

Convert each word in the standard named entity to be aligned into a word vector of a predetermined dimension, and calculate an average value of the word vectors corresponding to all words in the standard named entity to be aligned to obtain a standard named entity vector to be aligned;

Using each named entity alignment model in the named entity alignment model set to perform alignment processing on the standard to-be-aligned named entity vector to obtain a predicted alignment entity vector;

Converting each standard named entity in the standard named entity library into a standard named entity vector, and summarizing all the standard named entity vectors to obtain a standard named entity vector library;

Perform similarity calculation and analysis processing on the predicted aligned entity vector and each standard named entity vector in the standard named entity vector library to obtain the alignment result.