CN113901821A

CN113901821A - Entity naming identification method, device, equipment and storage medium

Info

Publication number: CN113901821A
Application number: CN202111217330.2A
Authority: CN
Inventors: 谷坤
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-10-19
Filing date: 2021-10-19
Publication date: 2022-01-07

Abstract

The application relates to the field of artificial intelligence and digital medical treatment, and discloses an entity naming identification method, device, equipment and storage medium, wherein the method comprises the following steps: acquiring a text to be processed; randomly selecting a preset parallel corpus from a preset corpus, and replacing a text to be processed according to a preset proportion to obtain a parallel text; vector conversion is carried out on the parallel text and a vector conversion layer in the text input recognition model to be processed, word vectors corresponding to the text to be processed and the parallel text are obtained, and the similarity between the parallel text and the word vectors corresponding to the text to be processed is calculated; when the similarity is larger than a preset numerical value, inputting a word vector corresponding to the text to be processed into an LSTM layer in the recognition model to obtain type distribution probability corresponding to each word in the text to be processed; and inputting the type distribution probability into a conditional random field layer in the recognition model to obtain an entity name. The application also relates to a blockchain technology, and the text data to be processed is stored in the blockchain. The method and the device can improve the accuracy of entity naming recognition.

Description

Entity naming identification method, device, equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method, an apparatus, a device, and a storage medium for entity name identification.

Background

With the continuous development of artificial intelligence technology and the massive emergence of demand scenes of daily work, life and the like, the entity recognition NER is widely applied to the work and life demands of people, such as recognizing the name of a product, the name of a medicine and the like in characters input in an e-commerce website, and recognizing the name of a person, a place and the like in a sentence of description. In the prior art, the main technical methods for entity identification are: the statistical-based method mainly includes Hidden Markov Models (HMMs), maximum entropy (MaxmiumEntropy), Support Vector Machines (SVMs), and the like, but the scheme in the prior art has the recognition accuracy for some entity words when the entity words have a lexical meaning, so that how to solve the problem that the recognition accuracy is not high when some entity words have the lexical meaning becomes an urgent problem to be solved.

Disclosure of Invention

The application provides an entity naming identification method, device, equipment and storage medium, and aims to solve the problem that in the prior art, identification of entity naming is inaccurate due to the fact that certain words are ambiguous.

In order to solve the above problem, the present application provides an entity naming identification method, including:

acquiring a text to be processed;

randomly selecting a preset parallel corpus from a preset corpus, and replacing the text to be processed according to a preset proportion to obtain a parallel text;

inputting the text to be processed and the parallel text into a vector conversion layer in a recognition model for vector conversion to obtain word vectors corresponding to the text to be processed and the parallel text, and calculating the similarity between the text to be processed and the word vectors corresponding to the parallel text;

when the similarity is larger than a preset numerical value, inputting the word vector corresponding to the text to be processed into an LSTM layer in the recognition model to obtain the type distribution probability corresponding to each word in the text to be processed;

and inputting the type distribution probability to a conditional random field layer in the recognition model to obtain the entity name in the text to be processed.

Further, the randomly selecting a preset parallel corpus from a preset corpus, and replacing the text to be processed according to a preset proportion includes:

determining keywords of the text to be processed based on a preset keyword set and the text to be processed;

and replacing the content in the text to be processed in a preset proportion by using a preset parallel corpus according to the text to be processed and the keywords to obtain a parallel text.

Further, before the vector conversion is performed on the text to be processed and the parallel text input to the vector conversion layer in the recognition model, the method further includes:

the text to be processed and the parallel text are processed by an embedding layer in the recognition model to obtain a corresponding first vector and a corresponding second vector;

the step of inputting the text to be processed and the parallel text into a vector conversion layer in a recognition model for vector conversion to obtain word vectors corresponding to the text to be processed and the parallel text comprises:

an attention layer in the vector conversion layer respectively calculates attention vectors of words in the text to be processed and the parallel text for the keywords according to the first vector and the second vector;

and processing the attention vector by two full-connection layers in the vector conversion layer to obtain word vectors corresponding to words in the parallel text and the text to be processed.

Further, the attention layer in the vector conversion layer calculates, according to the first vector, an attention vector of each word in the text to be processed for the keyword, including:

multiplying the first vector by a parameter matrix obtained after pre-training respectively to obtain a corresponding Q matrix, a K matrix and a V matrix;

performing point multiplication on the Q matrix and the K matrix, dividing a first result obtained by the point multiplication by the evolution of the corresponding dimension of the Q matrix to obtain a second result, and performing Softmax calculation on the second result to obtain a weight matrix;

multiplying the V matrix of the weight matrix to obtain a first matrix;

and processing the first matrix through a full connection layer in the vector conversion layer to obtain the attention vector corresponding to the text to be processed.

Further, the calculating the similarity between the text to be processed and the word vector corresponding to the parallel text includes:

acquiring a first word vector corresponding to a replaced text in the parallel text and a second word vector corresponding to a replaced text in the text to be processed;

and calculating the cosine similarity of the first word vector and the second word vector to obtain the similarity between the word vectors corresponding to the text to be processed and the parallel text.

Further, before the inputting the word vector corresponding to the text to be processed into the LSTM layer in the recognition model, the method further includes:

and processing the word vector corresponding to the text to be processed through a Dropout layer in the recognition model so as to inhibit overfitting of the recognition model.

Further, before the inputting the type distribution probability to the conditional random field layer in the recognition model, the method further includes:

the type distribution probabilities are also processed through a Dropout layer and a linear transformation layer in the recognition model.

In order to solve the above problem, the present application further provides an entity naming identification apparatus, including:

the acquisition module is used for acquiring a text to be processed;

the replacing module is used for randomly selecting a preset parallel corpus from a preset corpus and replacing the text to be processed according to a preset proportion to obtain a parallel text;

the similarity calculation module is used for inputting the text to be processed and the parallel text into a vector conversion layer in a recognition model for vector conversion to obtain word vectors corresponding to the text to be processed and the parallel text, and calculating the similarity between the text to be processed and the word vectors corresponding to the parallel text;

the probability calculation module is used for inputting the word vector corresponding to the text to be processed into an LSTM layer in the recognition model when the similarity is larger than a preset numerical value, so as to obtain the type distribution probability corresponding to each word in the text to be processed;

and the entity extraction module is used for inputting the type distribution probability to a conditional random field layer in the recognition model to obtain an entity name in the text to be processed.

In order to solve the above problem, the present application also provides a computer device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the entity naming identification method as described above.

In order to solve the above problem, the present application also provides a non-volatile computer readable storage medium having computer readable instructions stored thereon, which when executed by a processor implement the entity name recognition method as described above.

Compared with the prior art, the entity naming identification method, the entity naming identification device, the entity naming identification equipment and the entity naming identification storage medium provided by the embodiment of the application have the following beneficial effects:

obtaining a text to be processed, then randomly selecting a preset parallel corpus from a preset corpus, replacing the text to be processed according to a certain proportion to obtain a parallel text, so that the comparison between the parallel text and the text to be processed can be conveniently realized, then carrying out vector conversion on the text to be processed and a vector conversion layer of a parallel text input recognition model to obtain a corresponding word vector, calculating the similarity between the text to be processed and the parallel text according to the word vector to judge whether the entity name is a polysemous, judging whether the meaning of the entity name is the same as that in the parallel text, when the similarity is greater than a preset numerical value, proving that an entity possibly exists in the text to be processed, then inputting the word vector corresponding to the text to be processed into an LSTM layer in the recognition model to obtain the word distribution probability corresponding to each word in the text to be processed, and then, the entity in the text to be processed is obtained through conditional random field layer processing, so that the entity name can be accurately judged under the condition that the entity name has a word meaning, and the accuracy of entity name identification is improved.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for describing the embodiments of the present application, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without inventive effort.

Fig. 1 is a schematic flowchart of an entity naming identification method according to an embodiment of the present application;

fig. 2 is a schematic block diagram of an entity naming identification apparatus according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. One skilled in the art will explicitly or implicitly appreciate that the embodiments described herein can be combined with other embodiments.

The application provides an entity naming identification method. Referring to fig. 1, a schematic flow chart of an entity naming identification method according to an embodiment of the present application is shown.

In this embodiment, the entity naming identification method includes:

s1, acquiring a text to be processed;

in the application, the text to be processed, namely the text to be recognized, is obtained from the database, or the text to be processed is directly input by the user.

Further, the acquiring the text to be processed includes:

sending a calling request to a preset knowledge base, wherein the calling request carries a signature checking token;

and receiving a signature verification result returned by the knowledge base, and calling the text to be processed in the knowledge base when the signature verification result is passed, wherein the signature verification result is obtained by verifying the knowledge base in an RSA (rivest-Shamir-Adleman) asymmetric encryption mode according to the signature verification token.

Due to the fact that the to-be-processed text may relate to the privacy data of the user, the to-be-processed text is stored in the preset database, and therefore when the to-be-processed text is obtained, the database can perform a signature verification step, safety of the data is guaranteed, and problems of data leakage and the like are avoided.

The whole process is that the client calculates a first message digest of the message m, encrypts the first message digest by using an RSA asymmetric encryption mode (by using a private key of the client) to obtain a signature s, reuses the message m and the signature s by using a public key of a knowledge base to obtain a ciphertext c, sends the ciphertext c to the knowledge base, decrypts the ciphertext c by using the private key of the knowledge base to obtain the message m and the signature s, and decrypts the signature s by using the public key of the client by using the knowledge base to obtain the first message digest; meanwhile, the knowledge base extracts the message m by the same method to obtain a second message abstract, judges whether the first message abstract and the second message abstract are the same, and if the first message abstract and the second message abstract are the same, the verification is successful; otherwise, the authentication fails.

Through when the data is called, the signature needs to be checked, the safety of the data stored in the database is guaranteed, and data leakage is avoided.

S2, randomly selecting a preset parallel corpus from a preset corpus, and replacing the text to be processed according to a preset proportion to obtain a parallel text;

specifically, a preset parallel corpus is randomly selected from a preset corpus, and the parallel corpus is substituted for a text to be processed according to a preset proportion to obtain a parallel text, wherein the parallel text comprises part of the text to be processed and the parallel corpus, so that the parallel text is obtained. A large number of parallel corpora are stored in the preset corpus, and the parallel corpora are sentences containing entity names.

Specifically, as the method mainly identifies company entities or trademarks, a keyword library is preset, and company names or short names thereof, trademark names and the like are stored in the keyword library. Matching a text to be processed with data in a keyword library to obtain keywords in the text to be processed, and replacing the text to be processed in a preset proportion when the text to be processed is replaced by using parallel linguistic data, wherein the replaced part of the text to be processed does not include the keywords. And for the replacement operation, sentences are taken as a reference, for example, the method replaces 'the red Fuji apple is really delicious, and the Beijing Changping peach is good', so that the 'the red Fuji apple is really delicious, the glorious mobile phone is really good and practical', the sentences are taken as the minimum replacement unit, and the single words in the sentences are not directly replaced. And when the text to be processed only has one sentence, directly and randomly selecting a preset parallel corpus, and directly adding the parallel corpus to the text to be processed so as to obtain the parallel text.

The parallel text is obtained by replacing or adding part of the text to be processed, so that the text to be processed is conveniently compared with the parallel text in the follow-up process to pre-judge the keywords in the text to be processed.

S3, inputting the text to be processed and the parallel text into a vector conversion layer in a recognition model for vector conversion to obtain word vectors corresponding to the text to be processed and the parallel text, and calculating the similarity between the text to be processed and the word vectors corresponding to the parallel text;

specifically, a parallel text and a text to be processed are input into a vector conversion layer in a recognition model for processing, so as to obtain word vectors corresponding to the text to be processed and the parallel text, wherein each word vector contains word vectors of other words in the sentence, and the vector conversion layer is obtained by training based on a bert (language Representation model); and pre-judging whether the keywords in the text to be processed belong to a company entity or a trademark entity by calculating the similarity between the word vectors corresponding to the parallel text and the text to be processed.

and processing the attention vector by two fully-connected layers in the vector conversion layer to obtain the word vector corresponding to each word in the text to be processed and the parallel text.

Specifically, first, a first vector and a second vector corresponding to the text to be processed and the parallel text are obtained through processing of an embedding layer in a recognition model, the first vector and the second vector are common word vectors, after the common vectors are obtained, attention vectors of words in the text to be processed and the parallel text to the keywords are obtained through calculation through an attention layer in a vector conversion layer, and then the attention vectors are processed through the two full connection layers to obtain word vectors corresponding to the words in the text to be processed and the parallel text. And the word vector corresponding to each word contains the information of all the word vectors in the current sentence.

And for the attention layer and the two full-connection layers in the recognition model, a plurality of groups can be repeatedly set to obtain better word vectors corresponding to words in the parallel text and the text to be processed. The attention layer and the two fully connected layers in the recognition model described in this application are repeated in 12 sets.

The text to be processed and the parallel text are subjected to common embedding layer processing to obtain a first vector and a second vector, and the first vector and the second vector are subjected to attention layer processing and full-connection layer processing to obtain word vectors which can represent the text to be processed and the parallel text better, so that the accuracy of subsequent similarity judgment and final result output is improved.

Still further, the calculating, by the attention layer in the vector conversion layer, the attention vector of each word in the text to be processed for the keyword according to the first vector includes:

multiplying the V matrix of the weight matrix to obtain a first matrix;

Specifically, before the first training of the recognition model, the parameter matrix is randomly generated, or randomly generated according to normal distribution or uniform distribution, and the parameter matrix is continuously changed and converged during the continuous training of the recognition model until the training of the recognition model is completed, and the parameter matrix tends to be stable. And in the subsequent utilization of the trained recognition model, directly utilizing the stable parameter matrix of multiple batches.

Multiplying the pre-trained parameter matrix by the first vector to obtain three matrices of Q, K and V, wherein the formula is that Q is x₄·W^QK＝x₄·W^K,V＝x₄·W^VWherein W^Q，W^K，W^VNamely the parameter matrix, and then utilizing three matrixes of Q, K and V to calculate the proportion of each word in the input text, wherein the weight of the word only receiving the information is 0, and the specific calculation formula is

Wherein A represents a weight matrix, d_kRepresenting the dimensionality of Q, K or V matrix, multiplying the weight matrix with the V matrix of the corresponding batch to obtain a first matrix, and processing the first matrix by a full connection layer in the vector conversion layer to obtain the attention vector corresponding to the text to be processed. The fully-connected layer includes two basic fully-connected networks. Correspondingly, the attention vector corresponding to the parallel text is obtained according to the processing mode of the second vector corresponding to the parallel text.

By introducing an attention mechanism, word vectors which can represent the text to be processed and the parallel text more effectively are obtained, and accuracy of subsequent similarity judgment and final result output is improved.

Specifically, after word vectors of the parallel text and the text to be processed are obtained, cosine similarity between the first word vector and the second word vector is calculated by obtaining a first word vector of a replaced text in the parallel text and a second word vector of a replaced text in the text to be processed, namely, word vectors corresponding to different parts of the parallel text and the text to be processed are respectively extracted. For example, the similarity between the text to be processed "red apple is really delicious, the Beijing Changping peach is good" and the parallel text "red apple is really delicious and glorious mobile phone is really good", and the similarity between the "Beijing Changping peach in the text to be processed" and the "glorious mobile phone in the parallel text" is extracted to calculate the similarity, more specifically, the similarity between the "peach, not wrong" and the "glorious mobile phone are good".

Whether the keywords in the text to be processed are named entities or not is judged in advance by calculating the similarity, so that the judgment speed is increased, and the entity naming identification is carried out in a mode of combining the pre-judgment and the follow-up judgment, so that the identification accuracy is increased.

S4, when the similarity is larger than a preset value, inputting the word vector corresponding to the text to be processed into an LSTM (Long-Short Term Memory, Long-Short Term Memory model recurrent neural network) layer in the recognition model to obtain the type distribution probability corresponding to each word in the text to be processed;

specifically, when the similarity is smaller than a preset value, that is, the keyword is proved not to be a company entity or a trademark entity, a result can be directly output, that is, the to-be-processed text does not contain the company entity or the trademark entity; and when the similarity is larger than a preset value, inputting the vector corresponding to the text to be processed into an LSTM layer in the recognition model, wherein the LSTM layer is a bidirectional LSTM neural network, the LSTM layer outputs probability distribution of the text to be processed, the probability distribution is marked into each type, type distribution probability corresponding to each word in the text to be processed is obtained, and the LSTM layer can learn the semantic relation of the text.

In the present application, overfitting of the recognition model is suppressed by placing a Dropout layer between the vector translation layer and the LSTM layer in the recognition model.

Model overfitting was suppressed by setting the Dropout layer.

And S5, inputting the type distribution probability to a conditional random field layer in the recognition model to obtain the entity name in the text to be processed.

Specifically, the Conditional Random Field layer, i.e., the CRF (Conditional Random Field) layer, may add some constraints to the resulting type distribution probability to ensure that the output result is compliant. These constraints can be learned automatically by the conditional random field layer during the training of the recognition model.

Specifically, a Layer Norm Layer, a Dropout Layer and a linear transformation Layer are arranged between the LSTM Layer and the conditional random field Layer to perform normalization and suppress model overfitting.

Normalization is performed by providing a Layer Norm Layer, a Dropout Layer, and a linear transformation Layer, while suppressing model overfitting.

It is emphasized that, in order to further ensure the privacy and security of the data, the text data to be processed may also be stored in a node of a block chain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The embodiment also provides an entity name recognition device, as shown in fig. 2, which is a functional block diagram of the entity name recognition device according to the present application.

The entity naming identification apparatus 100 can be installed in an electronic device. According to the implemented functions, the entity naming recognition apparatus 100 may include an obtaining module 101, a replacing module 102, a similarity calculation module 103, a probability calculation module 104, and an entity extraction module 105. A module, which may also be referred to as a unit in this application, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

an obtaining module 101, configured to obtain a text to be processed;

specifically, the obtaining module 101 obtains the text to be processed, i.e. the text to be recognized, from the database, or directly inputs the text to be processed by the user.

Further, the obtaining module 101 includes a request sending sub-module and a data calling sub-module;

the request sending submodule is used for sending a calling request to a preset knowledge base, and the calling request carries a signature checking token;

and the data calling submodule is used for receiving the signature checking result returned by the knowledge base and calling the text to be processed in the knowledge base when the signature checking result passes, wherein the signature checking result is obtained by verifying the knowledge base in an RSA (rivest-Shamir-Adleman) asymmetric encryption mode according to the signature checking token.

Due to the fact that the text to be processed may relate to the privacy data of the user, the text data to be processed is stored in the preset database, and therefore when the text to be processed is obtained, the database performs a signature verification step, safety of the data is guaranteed, and problems of data leakage and the like are avoided.

When the data is called, the request sending submodule and the data calling submodule are matched, and the signature check is needed, so that the safety of the data stored in the database is ensured, and the data leakage is avoided.

The replacing module 102 is configured to randomly select a preset parallel corpus from a preset corpus, and replace the text to be processed according to a preset proportion to obtain a parallel text;

specifically, the replacing module 102 randomly selects a preset parallel corpus from the preset corpus, and replaces the text to be processed with the parallel corpus according to a preset proportion to obtain a parallel text, where the parallel text includes both a part of the text to be processed and the parallel corpus, so as to obtain the parallel text.

Further, the replacement module 102 includes a keyword determination sub-module and a proportional replacement sub-module;

the keyword determining submodule is used for determining keywords of the text to be processed based on a preset keyword set and the text to be processed;

and the proportion replacement submodule is used for replacing the content in the text to be processed with a preset proportion by using a preset parallel corpus according to the text to be processed and the keywords to obtain a parallel text.

Specifically, before replacing the text to be processed, the keywords in the text to be processed are extracted, specifically, since the application mainly identifies the company entity or the trademark, a keyword library is preset, and the name of each company or its abbreviation, the trademark name, and the like are stored in the keyword library. The keyword determining submodule obtains keywords in the text to be processed by matching the text to be processed with data in a keyword library, and the proportion replacing submodule replaces the text to be processed in a preset proportion when the text to be processed is replaced by parallel linguistic data, wherein the replaced part of the text to be processed does not include the keywords. And for the replacement operation is a minimum replacement unit of sentences.

And replacing or increasing part of texts of the text to be processed by matching the keyword determination sub-module and the proportion replacement sub-module to obtain a parallel text, so that the text to be processed is compared with the parallel text in the follow-up process to pre-judge the keywords in the text to be processed.

The similarity calculation module 103 is configured to input the text to be processed and the parallel text into a vector conversion layer in a recognition model for vector conversion, obtain word vectors corresponding to the text to be processed and the parallel text, and calculate a similarity between the text to be processed and the word vectors corresponding to the parallel text;

specifically, the similarity calculation module 103 inputs a parallel text and a text to be processed into a vector conversion layer in a recognition model for processing, so as to obtain word vectors corresponding to the text to be processed and the parallel text, where each word vector contains word vectors of other words in the sentence, and the vector conversion layer is obtained based on training of a bert model; and pre-judging whether the keywords in the text to be processed belong to a company entity or a trademark entity by calculating the similarity between the word vectors corresponding to the parallel text and the text to be processed.

Further, the entity name recognition device 100 includes a vectorization module; the similarity calculation module 103 comprises an attention vector submodule and a connection submodule;

the vectorization module is used for processing the text to be processed and the parallel text through an embedding layer in the recognition model to obtain a corresponding first vector and a corresponding second vector;

the attention vector submodule is used for the attention layer in the vector conversion layer to respectively calculate the attention vectors of the words in the text to be processed and the parallel text for the keywords according to the first vector and the second vector;

and the connection sub-module is used for processing the attention vector by two full connection layers in the vector conversion layer to obtain the text to be processed and the word vector corresponding to each word in the parallel text.

Specifically, the vectorization module obtains a first vector and a second vector corresponding to the text to be processed and the parallel text through embedded layer processing in a recognition model, the first vector and the second vector are common word vectors, after obtaining the common vectors, the attention vector submodule obtains the attention vectors of the text to be processed and the words in the parallel text for the keywords through calculation by an attention layer in the vector conversion layer, and the connection submodule obtains the word vectors corresponding to the words in the parallel text and the text to be processed through processing in the two full connection layers.

Through the cooperation of the vectorization module, the attention vector submodule and the connection submodule, the text to be processed and the parallel text are subjected to common embedding layer processing to obtain a first vector and a second vector, and the first vector and the second vector are subjected to attention layer processing and full connection layer processing to obtain word vectors which can represent the text to be processed and the parallel text more effectively, so that the accuracy of subsequent similarity judgment and final result output is improved.

Still further, the attention vector submodule comprises a matrix multiplication unit, a first calculation unit, a second calculation unit and a full connection unit;

the matrix multiplication unit is used for multiplying the first vector by a parameter matrix obtained after pre-training respectively to obtain a corresponding Q matrix, a K matrix and a V matrix;

the first calculating unit is configured to perform dot multiplication on the Q matrix and the K matrix, divide a first result obtained by the dot multiplication by an evolution of a corresponding dimension of the Q matrix to obtain a second result, and perform Softmax calculation on the second result to obtain a weight matrix;

the second calculating unit is configured to multiply the V matrix of the weight matrix to obtain a first matrix;

and the full-connection unit is used for processing the first matrix through a full-connection layer in the vector conversion layer to obtain the attention vector.

Specifically, the matrix multiplication unit multiplies the pre-trained parameter matrix by the first vector to obtain three matrices Q, K, and V, where the formula is Q ═ x₄·W^QK＝x₄·W^K,V＝x₄·W^VWherein W^Q，W^K，W^VNamely the parameter matrix, the proportion of each word in the input text is calculated by the first calculating unit by utilizing three matrixes of Q, K and V, wherein the weight of the word only receiving the above information is 0, and the specific calculation formula is

Wherein A represents a weight matrix, d_kRepresenting the dimensionality of Q, K or V matrix, multiplying the weight matrix by the V matrix of the corresponding batch by the second calculating unit to obtain a first matrix, and processing the first matrix by a full connection layer in the vector conversion layer by the full connection unit to obtain the attention vector. The above-mentionedThe fully-connected layer includes two underlying fully-connected networks.

Through the cooperation of the matrix multiplication unit, the first calculation unit, the second calculation unit and the full connection unit, an attention mechanism is introduced, word vectors which can represent texts to be processed and parallel texts are obtained, and the accuracy of subsequent similarity judgment and final result output is improved.

Further, the similarity calculation module 103 includes a vector acquisition sub-module and a cosine similarity operator module;

the vector obtaining submodule is used for obtaining a first word vector corresponding to a replaced text in the parallel text and a second word vector corresponding to a replaced text in the text to be processed;

and the cosine similarity calculation operator module is used for calculating the cosine similarity of the first word vector and the second word vector to obtain the similarity between the text to be processed and the word vector corresponding to the parallel text.

Specifically, after word vectors of the parallel text and the text to be processed are obtained, the vector obtaining submodule extracts word vectors corresponding to different parts of the parallel text and the text to be processed respectively by obtaining a first word vector of a replacement text in the parallel text and a second word vector of the replaced text in the text to be processed, and the cosine similarity calculation submodule calculates cosine similarity between the first word vector and the second word vector.

Whether the keywords in the text to be processed are named entities or not is judged in advance by calculating the similarity through the cooperation of the vector acquisition submodule and the cosine similarity operator module, so that the judgment speed is increased, and the entity naming identification is carried out in a mode of combining the pre-judgment and the follow-up judgment, so that the identification accuracy is increased.

A probability calculation module 104, configured to, when the similarity is greater than a preset value, input a word vector corresponding to the text to be processed into an LSTM layer in the recognition model, to obtain a type distribution probability corresponding to each word in the text to be processed;

specifically, when the similarity is smaller than a preset value, that is, the keyword is proved not to be a company entity or a trademark entity, the probability calculation module 104 may directly output a result, that is, the to-be-processed text does not contain a company entity or a trademark entity; and when the similarity is larger than a preset value, inputting the vector corresponding to the text to be processed into an LSTM layer in the recognition model, wherein the LSTM layer is a bidirectional LSTM neural network, the LSTM layer outputs probability distribution of the text to be processed, the probability distribution is marked into each type, type distribution probability corresponding to each word in the text to be processed is obtained, and the LSTM layer can learn the semantic relation of the text.

Further, the entity name identifying apparatus 100 includes an over-fitting prevention module;

and the over-fitting prevention module is used for processing the word vector corresponding to the text to be processed through a Dropout layer in the recognition model so as to inhibit the recognition model from over-fitting.

Model overfitting was suppressed by setting the Dropout layer.

And the entity extraction module 105 is configured to input the type distribution probability to a conditional random field layer in the recognition model, so as to obtain an entity name in the text to be processed.

Further, the entity naming recognition apparatus 100 includes a linear transformation module;

and the linear transformation module is used for processing the type distribution probability by a Dropout layer and a linear transformation layer in the identification model.

By adopting the device, the entity naming recognition device 100 obtains the text to be processed by the cooperation of the obtaining module 101, the replacing module 102, the similarity calculating module 103, the probability calculating module 104 and the entity extracting module 105, then randomly selects the preset parallel corpora from the preset corpus, and replaces the text to be processed according to the certain proportion to obtain the parallel text, thereby being convenient for realizing the comparison between the parallel text and the text to be processed, then carries out vector conversion on the text to be processed and the vector conversion layer of the parallel text input recognition model to obtain the corresponding word vector, calculates the similarity between the text to be processed and the parallel text according to the word vector to judge whether the entity naming is word polysemy or not, and whether the meaning of the entity naming is the same as that in the parallel text or not, when the similarity is larger than the preset value, and proving that entities possibly exist in the text to be processed, inputting the word vectors corresponding to the text to be processed into an LSTM layer in a recognition model to obtain type distribution probabilities corresponding to the words in the text to be processed, and immediately processing through a conditional random field layer to obtain the entities in the text to be processed, so that the entity names can be accurately judged under the condition that the entity names have one-word ambiguity, and the accuracy of entity name recognition is improved.

The embodiment of the application also provides computer equipment. Referring to fig. 3, fig. 3 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as computer readable instructions of an entity naming recognition method. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as executing computer readable instructions of the entity name recognition method.

The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.

In this embodiment, when the processor executes the computer readable instructions stored in the memory, the steps of the entity naming identification method in the above embodiments are implemented, by obtaining a text to be processed, then randomly selecting a preset parallel corpus from the preset corpus, and replacing the text to be processed according to a certain proportion to obtain a parallel text, so as to facilitate the implementation of a comparison between the parallel text and the text to be processed, then performing vector conversion on the text to be processed and a vector conversion layer of the parallel text input identification model to obtain a corresponding word vector, and calculating the similarity between the text to be processed and the parallel text according to the word vector to determine whether the entity naming is word polysemy, and whether the meaning of the entity naming is the same as that in the parallel text, and when the similarity is greater than a preset value, it is proved that an entity may exist in the text to be processed, and then inputting the word vector corresponding to the text to be processed into an LSTM layer in a recognition model to obtain the type distribution probability corresponding to each word in the text to be processed, and immediately processing the word vector through a conditional random field layer to obtain the entity in the text to be processed, so that the entity name can be accurately judged under the condition that the entity name has a word ambiguity, and the accuracy of entity name recognition is improved.

An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable instructions are stored, and the computer-readable instructions can be executed by at least one processor, so that the at least one processor performs the steps of the entity naming identification method as described above, obtaining a parallel text by obtaining a text to be processed, then randomly selecting a preset parallel corpus from a preset corpus, and replacing the text to be processed according to a certain proportion, thereby obtaining a parallel text, facilitating implementation of comparison between the parallel text and the text to be processed, then performing vector transformation on the text to be processed and a vector transformation layer of a parallel text input identification model to obtain a corresponding word vector, and calculating similarity between the text to be processed and the parallel text according to the word vector to determine whether the entity naming is ambiguous or not, and whether the meaning of the entity name is the same as that in a parallel text or not is judged, when the similarity is greater than a preset value, the fact that an entity possibly exists in the text to be processed is proved, then the word vector corresponding to the text to be processed is input into an LSTM layer in a recognition model, the type distribution probability corresponding to each word in the text to be processed is obtained, and then the entity in the text to be processed is obtained through conditional random field layer processing, so that the entity name can be accurately judged under the condition that the entity name has a word meaning, and the accuracy of entity name recognition is improved.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

The entity name recognition device, the computer device, and the computer-readable storage medium according to the embodiments of the present application have the same technical effects as the entity name recognition method according to the embodiments, and are not expanded herein.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. An entity naming and identifying method is characterized by comprising the following steps:

acquiring a text to be processed;

2. The entity naming identification method according to claim 1, wherein the randomly selecting preset parallel corpora from a preset corpus and replacing the text to be processed according to a preset proportion comprises:

3. The entity naming and recognition method according to claim 2, wherein before the vector conversion of the text to be processed and the parallel text input into the vector conversion layer in the recognition model, the method further comprises:

4. The entity naming recognition method according to claim 3, wherein the attention layer in the vector conversion layer calculates, from the first vector, an attention vector of each word in the text to be processed for the keyword, including:

multiplying the V matrix of the weight matrix to obtain a first matrix;

5. The entity naming recognition method according to claim 1, wherein the calculating the similarity between the text to be processed and the word vector corresponding to the parallel text comprises:

6. The entity naming and recognition method according to claim 1, wherein before said inputting the word vector corresponding to the text to be processed into the LSTM layer in the recognition model, further comprising:

7. The entity naming recognition method according to claim 6, further comprising, before said inputting said type distribution probability to a conditional random field layer in said recognition model:

8. An entity naming recognition apparatus, the apparatus comprising:

the acquisition module is used for acquiring a text to be processed;

9. A computer device, characterized in that the computer device comprises:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores computer readable instructions which, when executed by the processor, implement the entity name recognition method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium having computer-readable instructions stored thereon, which when executed by a processor implement the entity name recognition method of any one of claims 1 to 7.