CN113962224A

CN113962224A - Named entity recognition method and device, equipment, medium and product thereof

Info

Publication number: CN113962224A
Application number: CN202111177567.2A
Authority: CN
Inventors: 吴智东
Original assignee: Guangzhou Huaduo Network Technology Co Ltd
Current assignee: Guangzhou Huaduo Network Technology Co Ltd
Priority date: 2021-10-09
Filing date: 2021-10-09
Publication date: 2022-01-21

Abstract

The application discloses a named entity identification method, a device, equipment, a medium and a product thereof, wherein the method comprises the following steps: acquiring text information of a named entity to be identified, wherein the text information comprises a plurality of single characters; extracting deep semantic information of the text information to obtain a text feature vector of the text information; generating a first word pointer vector and a tail word pointer vector according to the text feature vector, wherein each pointer vector comprises pointer elements pointing to each single word in the text information in sequence, and the pointer elements corresponding to the first word and the tail word of a named entity in the text information in the two pointer vectors store an index tag of the entity type of the named entity; and extracting character strings between single characters in the text information pointed by the pointer elements of the same index tag in the two pointer vectors as named entities. The method and the device for identifying the named entities can improve the efficiency of identifying the named entities, have high recall rate and accuracy, and are particularly suitable for extracting the corresponding named entities from the commodity information to serve as the commodity attribute data.

Description

Named entity recognition method and device, equipment, medium and product thereof

Technical Field

The present application relates to natural language processing technology, and in particular, to a named entity recognition method and corresponding apparatus, computer device, computer readable storage medium, and computer program product.

Background

With the rapid development of the e-commerce industry, the types of users and commodities of e-commerce platforms are rapidly growing. The commodity data with the number of faces counted in ten million is very important for standardizing and unifying the commodity information. On one hand, for the platform, the platform can classify commodities by using attribute information in the commodities, or perform more scene applications such as recommendation and the like based on attributes; on the other hand, for the user, the user can perform limited search according to the commodity attribute information, and directionally find the commodity desired by the user, so that the user experience is improved. These several scenarios are not isolated from the processing of the commodity attribute information in the commodity data. How to extract the commodity attributes from the cluttered data is a big problem in the current information extraction scene.

In the prior art, a method of using keywords is commonly used to match commodity information, and when words in an attribute keyword library are hit, the commodity is considered to have attributes corresponding to the keywords; and a part of technologies are adopted, a specific text mode in the commodity information is mined, a specific attribute rule is sorted out, and the commodity attribute is extracted by using the rule.

By means of the keyword matching method, semantic information of the text is ignored, more attribute information is recalled as noise data, and accuracy cannot be effectively guaranteed. Compared with a keyword method, the method for extracting the attribute rules has higher accuracy, but has a single mode, the attribute rules need to be added and deleted in a manual mining mode, new attribute information cannot be recalled, and the method has the defects.

The extraction of the commodity attribute information is essentially the problem of named entity extraction in the natural language technical field, so that it can be understood that the problem of named entity extraction exists in the e-commerce field, and other fields needing to extract named entities also need to face the same technical problem.

Disclosure of Invention

A primary object of the present application is to solve at least one of the above problems and provide a named entity identifying method and a corresponding apparatus, computer device, computer readable storage medium, computer program product.

In order to meet various purposes of the application, the following technical scheme is adopted in the application:

a named entity recognition method provided adapted to one of the objects of the present application, comprising the steps of:

acquiring text information of a named entity to be identified, wherein the text information comprises a plurality of single characters;

extracting deep semantic information of the text information to obtain a text feature vector of the text information;

generating a first word pointer vector and a tail word pointer vector according to the text feature vector, wherein each pointer vector comprises pointer elements pointing to each single word in the text information in sequence, and the pointer elements corresponding to the first word and the tail word of a named entity in the text information in the two pointer vectors store an index tag of the entity type of the named entity;

and extracting character strings between single characters in the text information pointed by the pointer elements of the same index tag in the two pointer vectors as named entities.

In a deepened embodiment, the method for acquiring the text information of the named entity to be identified comprises the following steps:

acquiring a commodity object of a named entity to be identified;

extracting the commodity title text and/or the commodity detail information of the commodity object as commodity information;

and carrying out data cleaning on the commodity information of the commodity object, and constructing the text information after the data cleaning as the text information of the named entity to be identified.

In a deepened embodiment, a first character pointer vector and a tail character pointer vector are generated according to the text feature vector, and a pre-trained pointer generation network model is called to implement, wherein the pointer generation network model executes the following steps:

inputting the text feature vector serving as a vector to be processed, adapting to the text information word by word, and performing feature operation on the vector to be processed by adopting an encoder to generate a first character pointer vector corresponding to the current single character;

splicing the text feature vector with the first character pointer vector corresponding to the current character to obtain a fusion feature vector corresponding to the current character;

and inputting the fusion characteristic vector serving as a vector to be processed, adapting to the text information word by word, and performing characteristic operation on the vector to be processed by adopting an encoder to generate a tail word pointer vector corresponding to the current word.

In an embodiment, the encoder performs the following steps for each word in the text message to obtain a corresponding first word pointer vector or a corresponding last word pointer vector:

obtaining a hidden layer memory vector corresponding to a previous single character, wherein the hidden layer memory vector is updated correspondingly word by word after being randomly initialized and quoted;

performing multi-head attention mechanism operation according to the vector to be processed, and calculating the normalized dot product of the vector to be processed and the hidden layer memory vector corresponding to the previous single character to be used as the addressing memory vector corresponding to the current single character;

linearly transforming the addressing memory vector corresponding to the current single character and adding the addressing memory vector to the vector to be processed to obtain an intermediate vector;

linearly converting, superposing and regularizing the intermediate vector to obtain the first character pointer vector or the tail character pointer vector corresponding to the current character;

and updating the hidden layer memory vector corresponding to the current single character for the reference of the coding process of the next single character according to the first character pointer vector and the tail character pointer vector corresponding to the current character and the hidden layer memory vector corresponding to the previous single character.

In a further embodiment, updating the hidden-layer memory vector corresponding to the current word for the encoding process of the next word to refer to according to the first word pointer vector, the last word pointer vector and the hidden-layer memory vector corresponding to the previous word, the method includes the following steps:

vector addition is carried out on the first word pointer vector and the tail word pointer vector corresponding to the current word to obtain a sum vector;

carrying out classification mapping on the summation vector to obtain a classification probability as a corresponding smooth weight;

taking the smooth weight as a weight parameter, and performing smooth synthesis on the added vector and the hidden layer memory vector corresponding to the previous single word to obtain the hidden layer memory vector corresponding to the current word;

and transmitting the hidden memory vector corresponding to the current word to the coding process reference of the next single word.

In a further embodiment, extracting a character string between single characters in the text information pointed by the pointer elements of the same index tag in the two pointer vectors as a named entity includes the following steps:

acquiring index tags corresponding to the first character pointer element and the last character pointer element of different named entities according to the first character pointer vector and the last character pointer vector;

inquiring a preset mapping word list, and restoring the entity type corresponding to the index tag;

and extracting character strings corresponding to the named entities according to boundaries defined by single characters in the text information pointed by the first character pointer element and the tail character pointer element of each named entity, and constructing a corresponding relation list of the entity type and the named entities.

In an extended embodiment, the pointer generation network model is trained in advance, and the training process includes the following steps:

acquiring a sample data set, wherein the sample data set comprises a plurality of groups of sample data, and each group of sample data comprises a title text of a commodity object and a first character pointer vector and a tail character pointer vector corresponding to a named entity in the title text;

calling each group of sample data to train the pointer generation network model, wherein the pointer generation network model predicts a corresponding first character pointer vector and a corresponding tail character pointer vector according to text feature vectors in the group of sample data;

and respectively and correspondingly supervising the first character pointer vector and the tail character pointer vector predicted by the pointer generation network model according to the first character pointer vector and the tail character pointer vector in the group of the trained sample data, calculating a loss value, and calling the next group of sample data to continue iterative training on the pointer generation network model when the loss value is greater than a preset threshold value.

A named entity recognition apparatus provided adapted to one of the objects of the present application includes: the system comprises a text acquisition module, a feature extraction module, a pointer generation module and an entity extraction module, wherein the text acquisition module is used for acquiring text information of a named entity to be identified, and the text information comprises a plurality of single characters; the feature extraction module is used for extracting deep semantic information of the text information to obtain a text feature vector of the text information; the pointer generating module is used for generating a first character pointer vector and a last character pointer vector according to the text feature vector, each pointer vector comprises pointer elements which point to each single character in the text information in sequence, and the pointer elements corresponding to the first character and the last character of a named entity in the text information in the two pointer vectors store the index tag of the entity type of the named entity; and the entity extraction module is used for extracting character strings between single characters in the text information pointed by the pointer elements with the same index tag in the two pointer vectors as named entities.

In a further embodiment, the text obtaining module includes: the object acquisition submodule is used for acquiring the commodity object of the named entity to be identified; the text extraction submodule is used for extracting the commodity title text and/or the commodity detail information of the commodity object as commodity information; and the text optimization submodule is used for carrying out data cleaning on the commodity information of the commodity object and constructing the text information after the data cleaning as the text information of the named entity to be identified.

In a further embodiment, the pointer generation module calls a pre-trained pointer generation network model implementation, where the pointer generation network model includes: the first character pointer network is used for inputting the text characteristic vector as a vector to be processed, adapting to the text information word by word and performing characteristic operation on the vector to be processed by adopting an encoder so as to generate a first character pointer vector corresponding to the current single character; the vector splicing network is used for splicing the text characteristic vector with the first character pointer vector corresponding to the current character to obtain a fusion characteristic vector corresponding to the current character; and the tail word pointer network is used for inputting the fusion characteristic vector serving as a vector to be processed, adapting to the text information word by word and performing characteristic operation on the vector to be processed by adopting an encoder so as to generate a tail word pointer vector corresponding to the current word.

In an embodiment, the encoder invoked in the first pointer network and the last pointer network comprises: the hidden layer acquisition unit is used for acquiring a hidden layer memory vector corresponding to a previous single character, and the hidden layer memory vector is updated correspondingly word by word after being randomly initialized and quoted; the addressing operation unit is used for performing multi-head attention mechanism operation according to the vector to be processed, and calculating the normalized dot product of the vector to be processed and the hidden layer memory vector corresponding to the previous single character to be used as the addressing memory vector corresponding to the current single character; the intermediate processing unit is used for linearly transforming the addressing memory vector corresponding to the current single character and then adding the addressing memory vector to the vector to be processed to obtain an intermediate vector; the pointer generating unit is used for linearly converting, superposing and regularizing the intermediate vector to obtain the first character pointer vector or the tail character pointer vector corresponding to the current character; and the hidden layer updating unit is used for updating the hidden layer memory vector corresponding to the current single character for the encoding process of the next single character to refer to according to the first character pointer vector and the tail character pointer vector corresponding to the current character and the hidden layer memory vector corresponding to the previous single character.

In a further embodiment, the hidden layer update unit includes: the adding processing subunit is used for carrying out vector addition on the first word pointer vector corresponding to the current word and the tail word pointer vector to obtain an adding vector; the weight mapping subunit is used for carrying out classification mapping on the summation vector to obtain a classification probability as a corresponding smooth weight; the smooth synthesis subunit is used for performing smooth synthesis on the added vector and the hidden layer memory vector corresponding to the previous single word by taking the smooth weight as a weight parameter to obtain the hidden layer memory vector corresponding to the current word; and the hidden layer transfer subunit is used for transferring the hidden layer memory vector corresponding to the current word to the coding process reference of the next single word.

In a further embodiment, the entity extraction module comprises: the character extraction submodule is used for acquiring index tags corresponding to the first character pointer element and the tail character pointer element of different named entities according to the first character pointer vector and the tail character pointer vector; the mapping reduction sub-module is used for inquiring a preset mapping word list and reducing the entity type corresponding to the index tag; and the list construction submodule is used for extracting character strings corresponding to the named entities according to the boundaries limited by the single characters in the text information pointed by the first character pointer element and the tail character pointer element of each named entity, and constructing a corresponding relation list of the entity type and the named entities.

In an extended embodiment, the pointer generation network model is pre-trained in a structure comprising: the system comprises a sample acquisition submodule and a data processing submodule, wherein the sample acquisition submodule is used for acquiring a sample data set, the sample data set comprises a plurality of groups of sample data, and each group of sample data comprises a title text of a commodity object and a first character pointer vector and a tail character pointer vector corresponding to a named entity in the title text; the sample training submodule is used for calling each group of sample data to train the pointer generation network model, wherein the pointer generation network model predicts a corresponding first character pointer vector and a corresponding tail character pointer vector according to the text feature vectors in the group of sample data; and the gradient updating submodule is used for correspondingly monitoring the first character pointer vector and the tail character pointer vector predicted by the pointer generation network model respectively according to the first character pointer vector and the tail character pointer vector in the group of trained sample data, calculating a loss value, and calling the next group of sample data to continue iterative training on the pointer generation network model when the loss value is greater than a preset threshold value.

A computer device adapted for one of the purposes of the present application comprises a central processing unit and a memory, the central processing unit being adapted to invoke execution of a computer program stored in the memory to perform the steps of the named entity recognition method described herein.

A computer-readable storage medium, which stores in the form of computer-readable instructions a computer program implemented according to the named entity recognition method, which, when invoked by a computer, performs the steps comprised by the method.

A computer program product, provided to adapt to another object of the present application, comprises computer programs/instructions which, when executed by a processor, implement the steps of the method described in any of the embodiments of the present application.

Compared with the prior art, the application has the following advantages:

firstly, the named entities are identified through exhaustion in a keyword-based mode in the prior art, the prior art cannot cope with the field of rapid data increase such as E-commerce commodity data, the accuracy of extraction results is low, the method for naming the entities based on rules can improve the accuracy, only the results matched with the rule templates can be extracted, and the recall rate is too low. The method is different from the defects in the prior art, the head and tail double pointers are generated based on the deep semantic information of the text information, the named entities in the text information are extracted by using the two pointer vectors corresponding to the head and the tail, the accuracy of named entity identification can be improved, the accuracy and the recall rate can be improved while the confidence of the named entities extraction can be improved, and the method is particularly suitable for application scenes of rapid growth of the text information of the named entities to be identified, such as fields of e-commerce platforms, book management title information processing and the like.

Secondly, the structure of the first word pointer vector and the last word pointer vector generated by the present application is different from the pointer structure in the prior art, each pointer vector comprises a plurality of pointer elements with the same number as the word length of the text information, each single word of the text information is realized to have one first word pointer element and one last word pointer element corresponding to the same, when one named entity is identified from the text information, the first word pointer element corresponding to the first word in the named entity stores the index tag corresponding to the entity type to which the named entity belongs, and similarly, the last word pointer element corresponding to the last word in the named entity stores the index tag corresponding to the entity type to which the named entity belongs, so that it can be seen that the unified representation of all the plurality of named entities in the text information can be realized through one single first word pointer vector and one single last word pointer vector, the character string of a named entity in text information is characterized in that a single character, namely a first character, in the text information pointed by a pointer element of an index tag of an entity type to which the named entity belongs is stored in a first character pointer vector, a single character, namely a last character, in the text information pointed by a pointer element of an index tag of an entity type to which the named entity belongs is stored in a last character pointer vector, a character string from the first character to the last character is the named entity, and the mapping relation between the named entity and the entity type is also determined.

In addition, the method has the advantages of high operation efficiency, accurate identification and the like, so the method is particularly suitable for the E-commerce platform, and various commodity attribute information is extracted from the commodity information of the mass commodity objects of the E-commerce platform, so the advantages in the aspect of large-scale application are further embodied, such as cost reduction, rapid digestion of mass data processing tasks and the like.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart diagram of an exemplary embodiment of a named entity identification method of the present application;

FIG. 2 is a diagram illustrating the correspondence between the first word pointer vector and the last word pointer vector and the text information about the index tag;

fig. 3 is a flowchart illustrating a process of extracting text information of a commodity object in an embodiment of the present application;

FIG. 4 is a flowchart illustrating a process of generating a first word pointer vector and a last word pointer vector according to an embodiment of the present application;

FIG. 5 is a schematic block diagram of a pointer network generation model in the embodiment of the present application, in which a network structure corresponding to each single character is expanded to illustrate;

fig. 6 is a schematic network structure diagram of an encoder for generating a first character pointer vector in a pointer network generation model in an embodiment of the present application;

FIG. 7 is a schematic diagram of a network structure of an encoder for generating a pointer vector of a tail word in a pointer network generation model according to an embodiment of the present application;

FIG. 8 is a flow chart illustrating common coding business logic of encoders in a pointer network generative model according to an embodiment of the present application;

FIG. 9 is a flowchart illustrating an update process of hidden memory vectors according to an embodiment of the present application;

fig. 10 is a flowchart illustrating a process of obtaining a list of correspondence between entity types and named entities from text information according to a first word pointer vector and a last word pointer vector in an embodiment of the present application;

FIG. 11 is a schematic flow chart of a training task for training a pointer generation network model proposed in the present application;

FIG. 12 is a functional block diagram of a named entity recognition arrangement of the present application;

fig. 13 is a schematic structural diagram of a computer device used in the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As will be appreciated by those skilled in the art, "client," "terminal," and "terminal device" as used herein include both devices that are wireless signal receivers, which are devices having only wireless signal receivers without transmit capability, and devices that are receive and transmit hardware, which have receive and transmit hardware capable of two-way communication over a two-way communication link. Such a device may include: cellular or other communication devices such as personal computers, tablets, etc. having single or multi-line displays or cellular or other communication devices without multi-line displays; PCS (Personal Communications Service), which may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant), which may include a radio frequency receiver, a pager, internet/intranet access, a web browser, a notepad, a calendar and/or a GPS (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other device having and/or including a radio frequency receiver. As used herein, a "client," "terminal device" can be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or situated and/or configured to operate locally and/or in a distributed fashion at any other location(s) on earth and/or in space. The "client", "terminal Device" used herein may also be a communication terminal, a web terminal, a music/video playing terminal, such as a PDA, an MID (Mobile Internet Device) and/or a Mobile phone with music/video playing function, and may also be a smart tv, a set-top box, and the like.

The hardware referred to by the names "server", "client", "service node", etc. is essentially an electronic device with the performance of a personal computer, and is a hardware device having necessary components disclosed by the von neumann principle such as a central processing unit (including an arithmetic unit and a controller), a memory, an input device, an output device, etc., a computer program is stored in the memory, and the central processing unit calls a program stored in an external memory into the internal memory to run, executes instructions in the program, and interacts with the input and output devices, thereby completing a specific function.

It should be noted that the concept of "server" as referred to in this application can be extended to the case of a server cluster. According to the network deployment principle understood by those skilled in the art, the servers should be logically divided, and in physical space, the servers may be independent from each other but can be called through an interface, or may be integrated into one physical computer or a set of computer clusters. Those skilled in the art will appreciate this variation and should not be so limited as to restrict the implementation of the network deployment of the present application.

One or more technical features of the present application, unless expressly specified otherwise, may be deployed to a server for implementation by a client remotely invoking an online service interface provided by a capture server for access, or may be deployed directly and run on the client for access.

Unless specified in clear text, the neural network model referred to or possibly referred to in the application can be deployed in a remote server and used for remote call at a client, and can also be deployed in a client with qualified equipment capability for direct call.

Various data referred to in the present application may be stored in a server remotely or in a local terminal device unless specified in the clear text, as long as the data is suitable for being called by the technical solution of the present application.

The person skilled in the art will know this: although the various methods of the present application are described based on the same concept so as to be common to each other, they may be independently performed unless otherwise specified. In the same way, for each embodiment disclosed in the present application, it is proposed based on the same inventive concept, and therefore, concepts of the same expression and concepts of which expressions are different but are appropriately changed only for convenience should be equally understood.

The embodiments to be disclosed herein can be flexibly constructed by cross-linking related technical features of the embodiments unless the mutual exclusion relationship between the related technical features is stated in the clear text, as long as the combination does not depart from the inventive spirit of the present application and can meet the needs of the prior art or solve the deficiencies of the prior art. Those skilled in the art will appreciate variations therefrom.

The named entity identification method can be programmed into a computer program product, is deployed in a client or a server to run, and is generally deployed in the server to implement the method in an e-commerce platform application scenario of the application, so that the method can be executed through man-machine interaction with a process of the computer program product through a graphical user interface by accessing an open interface after the computer program product runs.

Referring to fig. 1, the named entity recognition method of the present application, in an exemplary embodiment thereof, includes the following steps:

step S1100, acquiring text information of the named entity to be identified, wherein the text information comprises a plurality of single characters:

the text information is composed of computer-displayable characters and generally comprises a plurality of single characters. The single characters include half-corner characters or full-corner characters, for example, the single characters may include arabic numerals or english letters, words, etc., such as "2021", "XL", or may include full chinese characters, such as "red money of winter short sleeves jacket of a certain brand", or may be a random combination of these different single characters. In the present application, all characters, whether full-angle characters or half-angle characters, are treated as a single character.

The content expressed by the text information differs according to the application scene, for example, in the e-commerce platform scene applied in the present application, the content may be content for describing commodity information, including but not limited to a commodity title text of a commodity object in the e-commerce platform, commodity detail information, and the like. Still further, in another exemplary scenario for reviewing non-canonical words, there may be instant chat content of the user.

The text information usually includes named entities that can be recognized manually, so that the corresponding named entities can be automatically recognized by implementing the technical scheme of the application. The named entity is also a keyword in nature. For example, in the e-commerce application scenario, attribute information of a commodity, such as description words for describing various entity types, such as brand, style, function, applicable season, etc., may constitute the named entity.

The source of the text information does not affect the implementation of the present application, and for example, the text information may be provided individually by the user or extracted in batch from a preset database. Generally, the text information is the content corresponding to a specific field, and depends on the content of the training data set adopted in the training stage of the pointer generation network model of the application. That is, the pointer-generating network model of the present application is trained by using the content corresponding to a specific domain as a training data set, so that the pointer-generating network model can learn the capability of identifying the named entity of the domain, thereby serving the named entity requirement of the text information corresponding to the domain. In this regard, one skilled in the art will appreciate.

Step S1200, extracting deep semantic information of the text information to obtain a text feature vector:

in order to realize named entity recognition based on deep semantics of text information, a neural network model capable of processing sequence information is preferably selected to extract the deep semantic information of the text information so as to obtain a corresponding text feature vector. The neural network model can be a recurrent neural network connected according to a chain, and can also be a recurrent neural network, which both belong to an RNN framework. Of course, other neural network models may be used which are equivalent to perform deep semantic representation learning on text information with reference to context information from the text information and extract deep semantic information thereof. In this regard, the neural network models proposed in the present application include, but are not limited to, LSTM, BiLSTM, transform, Bert, AlBert, Ernie, and electrora, among which Bert is the best measurement and is preferred.

The neural network models for extracting the text feature information can be collectively called as text feature extraction models, and the text feature extraction models adopted by the application are pre-trained models, namely the pre-trained models are trained to a convergence state in advance, so that the pre-trained models have the capability of carrying out deep semantic feature representation on the text information and are suitable for extracting corresponding text feature vectors from the text information. As for the training principle of the text feature extraction model, it is known to those skilled in the art, so this is omitted.

Of course, in the present application, the text feature extraction model may also participate in the joint training of the pointer generation network model used in the present application, so as to implement the fine tuning training of the text feature extraction model in the process of training the pointer generation network model of the present application, and as will be understood by those skilled in the art.

In the exemplary embodiment, the Bert model is taken as an example of a text feature extraction model required for extracting deep semantic information of text information, after the text feature extraction model is trained to a convergence state and used in the technical scheme of the present application, the text information can be used as input to adapt to the requirement of an input format of the Bert model, a text vector (Token Embedding), a Position vector (Position Embedding), and a sentence dividing vector (Segment Embedding) corresponding to the text information are encoded, and then the text vector (Token Embedding), the Position vector (Position Embedding), and the sentence dividing vector (Segment Embedding) are fed into the Bert model, i.e., the Bert model can extract the deep semantic information according to the vector reference context information, and finally the text feature vector corresponding to the text information is output.

As can be easily understood, the text feature vector realizes the feature representation of deep semantic information of the text information and integrates the context information in the text information.

Step 1300, generating a first word pointer vector and a last word pointer vector according to the text feature vector, where each pointer vector includes pointer elements pointing to each single word in the text information in sequence, and the pointer elements corresponding to the first word and the last word of a named entity in the text information in the two pointer vectors store an index tag of the entity type to which the named entity belongs:

after the text feature vector is obtained, a network model can be generated through a preset pointer, and two corresponding pointer vectors, namely a first character pointer vector and a tail character pointer vector, are generated according to the text feature vector.

The pointer may be generated to a network model, preferably as detailed in the embodiments presented below, or may be implemented by one skilled in the art based on the functionality and/or principles disclosed herein. The pointer generation network model is suitable for being modified and realized by a model based on a multi-head attention machine system, for example, the pointer generation network model is realized by a coding structure based on Transformer and Bert, the model can process sequence information based on the multi-head attention machine system, and the semantic is sorted according to context information in a sequence, and on the basis, the pointer generation network model is modified into two independent pointer vectors, namely a first pointer vector and a tail pointer vector, which are required by generating the application. After the pointer generation network model is trained to be convergent, the pointer generation network model can be used for generating corresponding first character pointer vectors and tail character pointer vectors according to text characteristic information.

The first character pointer vector comprises a plurality of pointer elements corresponding to the total number of the single characters according to the total number of the single characters of the text information, and each pointer element points to the single characters in a one-to-one correspondence mode according to the single character sorting in the text information, so that each pointer element points to one single character in the text information independently, and the sequence of the pointer elements in the first character pointer vector is the same as the sequence of the pointed single character in the text information. A pointer element in the first character pointer vector stores, in a numerical form, an index tag of an entity type to which a named entity corresponding to a single character pointed by the pointer element belongs, and specifically, when the single character pointed by the pointer element belongs to a first character of a named entity, the pointer element is used for storing the index tag of the entity type to which the named entity belongs; a pointer element may store a predetermined identification, such as a null or "0" value, to distinguish a word in the textual information that the pointer element points to, does not belong to the first word of any possible named entity.

Similarly, the pointer vector of the tail word includes a plurality of pointer elements corresponding to the total number of the single words according to the total number of the single words of the text information, and each pointer element points to the single words one by one according to the ordering of the single words in the text information, so that each pointer element points to one single word in the text information independently, and the order of the pointer elements in the pointer vector of the tail word is the same as the order of the single word pointed by the pointer element in the text information. A pointer element in a pointer vector of a suffix stores an index tag of an entity type to which a named entity corresponding to a single character pointed by the pointer element belongs in a numerical value form, and specifically, when the single character pointed by the pointer element belongs to a last character of a named entity, the pointer element is used for storing the index tag of the entity type to which the named entity belongs; a pointer element may store a predetermined identification, such as a null or "0" value, to distinguish a word in the textual information that the pointer element points to, but does not belong to the last word of any possible named entity.

It can be seen that the first character pointer vector and the last character pointer vector have the same structure, but the single characters indicated when each pointer element points to the same named entity are different, and the pointer element in the first character pointer vector is used for indicating the position of the first character of the named entity in the text information and storing the index tag corresponding to the entity type to which the corresponding named entity belongs; and the pointer element in the tail word pointer vector is used for indicating the position of the tail word of the named entity in the text information and storing the index tag corresponding to the entity type to which the corresponding named entity belongs. The first word pointer vector and the tail word pointer vector are uniform pointer vectors, and centralized indication of a plurality of named entities contained in the text information is achieved.

The index tag is a numerical tag mapped by each entity type compiled in advance corresponding to the entity type to which each named entity belongs. Taking various attribute information in the commodity information in the e-commerce platform as an example, the following mapping word list shows that:

entity type	Mapping values
		Non-entity	0
Brand	1
		Plate type	2
Style	3
		Fabric	4
Season of use	5
		Function(s)	6

This table is for illustration only. As can be seen from the table, when the single character in the text information does not belong to any named entity, the single character is a non-entity, and is represented by a value of "0"; when a single character in the text information points to a named entity corresponding to the entity types such as "brand", "type", "style", "face", "season of use", "function", etc., it is represented by index tags such as "1", "2", "3", "4", "5", "6", etc., respectively.

Accordingly, it is understood that the head pointer vector and the tail pointer vector already determine the entity type corresponding to a named entity by storing the index tag of the entity type to which the named entity belongs in the pointer element, i.e., the mapping value in the above example table.

To adapt to the above example, continuing to illustrate the construction of the initial pointer vector and the end pointer vector, for example, as shown in fig. 2, there is a text message corresponding to a product title as: "a certain 2021 ultraviolet-proof ice-silk loose comfortable wind coat in summer". From this text information, two pointer vectors will be obtained as follows:

first character pointer vector: [1,0,0,0,0,0,5,0,6,0,0,0,4,0,2,0,0,0,3,0]

Tail word pointer vector: [0,1,0,0,0,0,0,5,0,0,0,6,0,4,0,2,0,0,0,3]

Wherein, the order of each nonzero value in the first character pointer vector corresponds to the order of the corresponding single character in the text information, the single character is the first character of a named entity, and the value is used as an index label and is mapped to the entity type in the previous example table. Similarly, the sequence of each non-zero value in the vector of the suffix pointer corresponds to the sequence of the corresponding single word in the text message, the single word is the suffix of a named entity, i.e. the last word, and the value itself serves as an index tag and is mapped to the entity type in the previous example table. Wherein, according to the above exemplary table, the "0" value is the single word corresponding to the non-entity type.

Therefore, as long as the pointer generation network model is controlled to generate the first character pointer vector and the tail character pointer vector according to the above rules, the named entity extraction can be performed subsequently according to the first character pointer vector and the tail character pointer vector.

Step S1400, extracting a character string between single characters in the text information pointed by the pointer elements of the same index tag in the two pointer vectors as a named entity:

continuing with the example of fig. 2, it will be readily appreciated that for the entity type "brand", which is mapped with the label "1", the first and last letters of its corresponding named entity appear in the

order

1 and 2 single words of the textual information, respectively, i.e., "a house", and that similarly, the entity type "function", which is mapped with the label "6", the first and last letters of its corresponding named entity appear in the

order

9 and 12, respectively, of the textual information, i.e., "uv-protected". Therefore, as long as two single characters are taken as a head character and a tail character according to different single characters in the text information pointed by the same index tag in the first character pointer vector and the tail character pointer vector, the whole character string is extracted from the middle part of the two single characters, and the character string is the named entity corresponding to the same index tag. Therefore, a plurality of named entities in the text information can be extracted at one time.

Through the principles disclosed in the present embodiment, it can be understood that the technical solution of the present application has wide advantages over the prior art, including but not limited to:

Referring to fig. 3, in a further embodiment, in order to combine a more specific application scenario with the present application, the step S1100 of obtaining text information of a named entity to be identified includes the following steps:

step S1110, acquiring a commodity object of the named entity to be identified:

the application scenario of this embodiment is the e-commerce field in which the applicant is engaged, in the e-commerce field, there are a large number of commodity objects, and for both the whole e-commerce platform and each shop in the e-commerce platform, there are a large number of commodity objects in the corresponding commodity data. Therefore, once the named entity identification needs to be carried out on the commodity objects, the commodity database can be called to obtain the commodity objects of the named entities to be identified, and the commodity objects are used as target commodity objects for carrying out named entity extraction subsequently.

The individual extraction is also allowable, for example, when a user of one merchant instance publishes a newly online commodity, various information related to the commodity is entered, including a commodity title text, a commodity detail text and the like, to form text information, and after the user publishes the commodity, the background server regards the commodity as a commodity object to be stored, and in the process, the commodity object can also be automatically determined as a target commodity object to be identified by a named entity.

Step S1120, extracting the product title text and/or the product detail information of the product object as product information:

for a commodity object, the text type information in the commodity object needs to be acquired in order to identify the named entity in the commodity object. As mentioned above, the text type information of the product object usually includes a title text and/or a product detail text (product detail information), and in the simplified embodiment, only the title text or the product detail text may be selected, or in the more popular embodiment, both of them may be selected, depending on the actual needs. These text type information constitute the merchandise information required for named entity recognition in the present application.

Step S1130, carrying out data cleaning on the commodity information of the commodity object, and constructing the text information after the data cleaning as the text information of the named entity to be identified:

after the commodity information of the commodity object is obtained, various technical means commonly used in the field can be used for carrying out data cleaning on the commodity information, such as blank removal, punctuation mark removal and the like, so that relevant redundant information can be removed, and the named entity identification efficiency can be improved.

The technical scheme of the application is applied to the e-commerce platform, the service for extracting the named entities of the commodity objects in the e-commerce platform can be provided, the related named entities are extracted to serve as commodity attribute data, practical help can be provided for data entry of the commodity objects, organization dimensionality of the e-commerce platform about the commodity attribute information can be unified, particularly for cross-border e-commerce based on independent stations, all merchants independently maintain own sites and commodity databases, the e-commerce platform is suitable for the organization management service for providing the commodity attribute information for the e-commerce platform, and organization and maintenance cost of the commodity information of all merchants can be reduced while information management standards are unified.

Referring to fig. 4, in a further embodiment, in step S1300, a first pointer vector and a last pointer vector are generated according to the text feature vector, and a pre-trained pointer generation network model is called to implement, where the pointer generation network model performs the following steps:

step S2100, inputting the text feature vector as a vector to be processed, performing feature operation on the vector to be processed by using an encoder word by word in response to the text information to generate a first character pointer vector corresponding to the current single character:

in this embodiment, the pointer generation network model shown in fig. 5, fig. 6, and fig. 7 is used to generate the first word pointer vector and the last word pointer vector. Fig. 5 is a network architecture formed by a schematic framework after a pointer generation model of the present application is developed word by word according to a sequence and a text feature extraction model, and fig. 6 and 7 are schematic internal structural diagrams of an encoder for generating a first word pointer vector and an encoder for generating a last word pointer vector in the pointer generation network model, respectively.

As shown in fig. 5, after the text information is put into the text feature extraction model (taking the Bert model as an example), corresponding text feature vectors are extracted, and the text feature vectors are referred to word by word in the process of encoding the text information word by the pointer generation network model of the present application.

When the pointer generation network model performs coding corresponding to the first character pointer vector on each single character in the corresponding text information, the text feature vector is taken as input and is regarded as a vector to be processed, then an encoder shown in fig. 6 is called for the vector to be processed to perform feature operation, specifically, operation is performed based on a multi-head attention mechanism so as to fully refer to context information.

Step S2200, splicing the text feature vector with the first character pointer vector corresponding to the current character to obtain a fusion feature vector corresponding to the current character:

on the other hand, for each word in the text information, it is also necessary to generate a suffix pointer vector for the word, and therefore, coding service logic is also executed for each word in the text information, and in order to generate the suffix pointer vector by referring to the first character pointer vector, as shown in fig. 7, in each coding service logic for performing the suffix pointer vector, feature fusion is performed on the first character pointer vector corresponding to the current word and the text feature vector extracted by the text feature extraction model, so as to obtain a corresponding fusion feature vector. The specific fusion mode may be to splice the text feature vector and the corresponding first character pointer vector, and then perform linear transformation through a linear layer to obtain the corresponding fusion feature vector. The formula is expressed as:

wherein the content of the first and second substances,

indicates the text feature vector, Y, obtained when corresponding to the ith word_i，sIndicates the first character pointer vector, Lin, generated corresponding to the ith single character_iThe fusion feature vector corresponding to the ith single character is obtained.

Step S2300, inputting the fusion feature vector as a vector to be processed, adapting to the text information word by word, and performing feature operation on the vector to be processed by adopting an encoder to generate a tail word pointer vector corresponding to the current word:

for the tail word pointer vector, the input is the fusion feature vector corresponding to each single word, that is, the comprehensive semantic vector fusing the text feature vector and the first word pointer vector corresponding to the corresponding single word, then, it can be understood that when the pointer generation network model performs the encoding corresponding to the tail word pointer vector on each single word in the corresponding text information, the fusion feature vector is taken as the input and is regarded as the vector to be processed, and then the vector to be processed is subjected to the feature operation as shown in fig. 7, specifically, the operation is performed based on the multi-head attention machine system, so as to fully refer to the context information.

As shown in fig. 5, the pointer generation network model of the present application is configured as a bidirectional structure, so for the generation of the first word pointer vector and the last word pointer vector, when encoding corresponding to each single word, both memorize the intermediate feature information through a hidden layer memory vector, which will be referred to by the encoding service logic of the first word pointer vector and the last word pointer vector corresponding to each word, and also perform corresponding word-by-word update with reference to the two pointer vectors. This will be further explained in the following examples, but should be readily implemented by those skilled in the art in view of the principles disclosed herein.

In the embodiment, the network model is generated by using the pointer with the bidirectional structure, and the two pointer vectors are coded word by word aiming at the text information, so that the two pointer vectors are generated to refer to the context information, the semantic understanding of the whole model is more accurate, the accurate first-word pointer vector and the accurate last-word pointer vector can be obtained, and a principle foundation is laid for the extraction of the named entity.

In an embodiment, the encoder used in the pointer generation network model of the present application is implemented based on the structures shown in fig. 6 and 7, it is not difficult to see that fig. 6 and 7 have the same structure, except that in the coding service logic of the first word pointer vector, the encoder shown in fig. 6 takes the text feature vector output by the text feature extraction model as direct input and encodes it as a vector to be processed, and in the coding service logic of the last word pointer vector, the encoder shown in fig. 7 takes the fused feature vector obtained by splicing the first word pointer vector corresponding to the same single word with the text feature vector as input and encodes it as a vector to be processed, that is, the encoder shown in fig. 7 encodes the vector to be processed, that is, relative to the encoder shown in fig. 6, only a splicing layer and a linear layer are added in front positions required for adapting to encoding the last word pointer vector, therefore, the same structure as that in fig. 6 and 7 will be further described. It can be understood that the encoder of fig. 6 and 7, which has a common structure, performs the following steps as shown in fig. 8 for each single word in the text message to obtain the corresponding first word pointer vector or the corresponding last word pointer vector:

step S3100, obtaining a hidden layer memory vector corresponding to a previous single character, wherein the hidden layer memory vector is updated word by word after being randomly initialized and quoted:

when an encoder starts a coding service logic for a single word in text information, a hidden layer memory vector generated by a previous single word is firstly quoted, for the coding service logic of a first word in the text information, the hidden layer memory vector can be initialized randomly for quote because the hidden layer memory vector corresponding to the previous single word does not exist but is slightly exceptional, when the coding is completed to realize the updating of a first word pointer vector or a last word pointer vector, the hidden layer memory vector is correspondingly updated, and for other single words, the hidden layer memory vector corresponding to the updating of the previous single word is quoted in sequence. The hidden-layer memory vector is used for memorizing the pointer hidden feature information corresponding to the current single character, is constructed by applying a multi-head attention mechanism, and can be understood by a person skilled in the art.

For two encoders corresponding to the same single word, the hidden-layer memory vectors corresponding to the previous single word are referenced to each other so as to update the corresponding first word pointer vector and the corresponding last word pointer vector respectively, and then the updated first word pointer vector and the updated last word pointer vector are used for jointly updating the hidden-layer memory vector corresponding to the current word and are transmitted to the two encoders corresponding to the next single word.

Therefore, the first character pointer vector, the tail character pointer vector and the hidden layer memory vector are updated word by word according to each single character in the text information and are transmitted in sequence to form the serialization processing capacity.

Step S3200, performing a multi-head attention mechanism operation according to the vector to be processed, and calculating a normalized dot product of the vector to be processed and the hidden-layer memory vector corresponding to the previous word as an addressing memory vector corresponding to the current word:

the encoder applies a multi-head Attention mechanism, so that an Attention layer Attention is adopted to multiply the vector to be processed and the hidden layer memory vector corresponding to the quoted previous single character, and then normalization is carried out to obtain a vector formed by normalized dot products, and the vector is used as an addressing memory vector of the vector to be processed relative to the hidden layer memory vector corresponding to the previous single character.

Step S3300, add the addressing memory vector corresponding to the current word to the vector to be processed after linear transformation to obtain an intermediate vector:

on the basis of obtaining the addressing memory vector, a Linear layer Linear is adopted to carry out Linear transformation on the addressing memory vector, activation is carried out through an activation layer, and on the basis, the addressing memory vector and the vector to be processed are added to obtain an intermediate vector.

Step S3400, linearly converting the intermediate vector, superimposing the intermediate vector on a regularization operation, and obtaining the first word pointer vector or the last word pointer vector corresponding to the current word:

and further performing Linear conversion on the intermediate vector through a Linear layer and applying regularization LN processing to obtain a first character pointer vector or a tail character pointer vector corresponding to the current character.

Step S3500, updating the hidden-layer memory vector corresponding to the current single character for the encoding process of the next single character according to the first character pointer vector, the last character pointer vector and the hidden-layer memory vector corresponding to the previous single character:

according to the first word pointer vector and the last word pointer vector generated in the above process and the hidden layer memory vector corresponding to the previous word, the hidden layer vector corresponding to the current word can be updated and output to the encoding process of the next word for reference. The specific operation process can be referred to the next embodiment.

Referring to fig. 9, in a further embodiment, the step S3500 of updating the hidden layer memory vector corresponding to the current word for the encoding process of the next word according to the first word pointer vector, the last word pointer vector and the hidden layer memory vector corresponding to the previous word includes the following steps:

step S3510, add the first word pointer vector corresponding to the current word and the last word pointer vector to obtain a sum vector:

for two encoders corresponding to the current word, the first word pointer vector and the last word pointer vector which are respectively input by the two encoders are subjected to vector addition to obtain corresponding addition vectors.

Step S3520, the added vector is classified and mapped to obtain classification probability as corresponding smooth weight:

for the summation vector, the summation vector can be matched with corresponding weight and then a two-classifier is applied to carry out classification mapping on the summation vector, and a Sigmoid function can be adopted to carry out classification, so that the classification probability corresponding to the summation vector can be obtained and used as a smooth weight.

Step S3530, taking the smooth weight as a weight parameter, carrying out smooth synthesis on the added vector and the hidden layer memory vector corresponding to the previous single word to obtain the hidden layer memory vector corresponding to the current word:

and then, the smooth weight is used as a weight parameter to smoothly synthesize the addition vector and the hidden layer memory vector corresponding to the previous single word so as to determine the information quantity of the previous hidden layer memory vector and the previous hidden layer memory vector, and finally the hidden layer memory vector corresponding to the current word is obtained.

Step S3540, the hidden layer memory vector corresponding to the current word is transmitted to the next single word for reference in the coding process:

after the hidden memory vector corresponding to the current word is updated, the hidden memory vector can be transmitted to the coding process corresponding to the next single word for reference. It can be understood that the updating principle of the hidden layer memory vector is transferred by word-by-word updating according to the single word sequence expansion of the text information, so that the context information in the whole text information is fully referred to in the whole pointer generation network model.

To make the description of the above two embodiments more formal, the coding business logic of each single word is further described below in conjunction with mathematical formulas:

first, assuming that the title text of the product is X, the output of the text feature extraction model (taking the Bert model as an example) is defined as follows:

V_bert＝Bert(X)

defining the hidden layer memory vector corresponding to the current word stored in the pointer memory layer as Y_i，cAnd the memory elements representing the first character pointer vector and the last character pointer vector to the ith time (namely, the ith character in the corresponding text message) position are used for storing the implicit characteristic information shared by the first character pointer vector and the last character pointer vector.

Second, a first word pointer vector is generated:

1.1, as shown in the encoder structure of FIG. 6, the hidden memory vector Y corresponding to the previous word is input_i-1，cAnd text feature vectors

First, the normalized dot product Attention between the two is calculated_iTo obtain

With respect to Y_i-1，cAddressing memory vector of (1):

1.2, activating output after linear conversion of the addressing memory vector, adding input

To obtain a vector-added intermediate vector:

and 1.3, finally, after one layer of linear conversion and regularization processing is carried out on the intermediate vector, obtaining a first character pointer vector corresponding to the current word:

Y_i，s＝LN(Linear(Add_i))

at this point, the encoding service flow corresponding to the single word in the text message implemented by the encoder shown in fig. 6 is completed.

Then, a tail word pointer vector is generated:

2.1, as shown in the editor structure of FIG. 7, input the hidden memory vector Y corresponding to the previous word_i-1，cText feature vector

First character pointer vector corresponding to current wordY_i，sFirst, the

And Y_i，sCarrying out vector splicing and linear conversion to obtain a fusion characteristic vector:

the following calculation process is consistent with the encoder shown in fig. 6.

2.2 calculating the fusion feature vector Lin_iHidden layer memory vector Y corresponding to previous word_i-1，cNormalized dot product Attention between the two_iObtaining a fused feature with respect to Y_i-1，cAddressing memory vector of (1):

2.3, the addressing memory vector is activated and output after linear conversion, and the fusion characteristic vector Lin is added_iObtaining a vector-added intermediate vector:

and 2.4, finally, after the intermediate vector is subjected to one layer of linear conversion and regularization treatment, obtaining a first character pointer vector corresponding to the current word:

at this point, the encoding service flow corresponding to the single word in the text message implemented by the encoder shown in fig. 7 is completed.

Finally, after the two coding service logics corresponding to the current word respectively obtain the first word pointer vector and the last word pointer vector corresponding to the current word, the hidden layer memory vector corresponding to the current word can be updated. Specifically, the following formula is applied:

Y_i，c＝αY′_i，c+(1-α)Y_i-1，c

α＝Sigmoid(W_iY_i，c+b)

Y′_i，c＝Sum(Y_i，s，Y_i，e)

where α is a smoothing weight, used in the form of a weight parameter, and represents the hidden layer memory vector Y at the current time_i，cIn the middle, the ratio of the first word pointer vector and the tail word pointer vector information is reserved, and 1-alpha represents the hidden layer memory vector Y of the current time (current single word)_i，cIn the method, the ratio of the memory information of the previous moment (the previous single word) is reserved.

The above formula can be seen that after the first word pointer vector and the last word pointer vector corresponding to the current word are added to obtain the added vector, the added vector is matched with the corresponding weight and offset calculation, then the obtained smooth weight is obtained by the Sigmoid function to perform classification mapping, and then the added vector corresponding to the current word and the hidden layer memory vector corresponding to the previous word are smoothly synthesized by taking the smooth weight as the weight parameter, so that the hidden layer memory vector corresponding to the current word can be obtained, and can be transmitted to the coding service logic reference corresponding to the next word.

As can be seen from the above embodiments, in the present application, the pointer generation network model applies technologies such as multi-head Attention mechanism (Attention), full connection (Linear), Layer Normalization (Layer Normalization) and the like to an encoder for generating the first character pointer vector and the last character pointer vector, and the pointer vector is calculated and converted for the text feature vector output by the pre-training model with a stacked specific flow structure. The hidden layer memory vector is used for fusing all pointer characteristic information of the current time (current single character) and the previous time (previous single character) so as to enhance the entity extraction capability of the current time. Therefore, the pointer network generation model can effectively convert the text characteristic vector of the text information into the first character pointer vector and the tail character pointer vector. In addition, the pointer memory layer can transmit the feature information of the extracted index tag in the current text information, and information dimensions which can be concerned by subsequent named entity extraction are increased, so that the extraction capability of the model for the named entities is enhanced.

For example, there may be implicit contact information between the commodity attribute information, for example, when the "brand" of the named entity related to the commodity is "a certain shoe industry" or "a certain clothing", in the subsequent text of the same commodity information, there may be a high probability that the named entity related to the clothing or footwear, such as "fabric", "model", "style", and the like, and there may be a low probability that the named entity related to the electronic product, such as "resolution", "display card model", and the like, appears. Therefore, when judging whether the current single character is identified as the entity boundary or the entity type corresponding to which kind of named entity, adding the identified named entity feature information, on one hand, the feature information dimension of the entity identification is increased, and on the other hand, the judgment capability of the model for the named entity boundary and the entity type at the current moment is enhanced. Namely, the extraction capability and generalization capability of the model to the correct named entity are enhanced by adding the memory information.

Based on the advantages described above, it can be understood that the pointer generation network model of the present application has a particularly significant effect when applied to the extraction of the commodity attribute information from the commodity information in the e-commerce field.

Referring to fig. 10, in a further embodiment, the step S1400 of extracting a character string between single characters in text information pointed by pointer elements of the same index tag in the two pointer vectors as a named entity includes the following steps:

step 1410, obtaining index labels corresponding to the first word pointer element and the last word pointer element of different named entities according to the first word pointer vector and the last word pointer vector:

for example, let the text information be X, the pointer generation network model be M, and the model will output the first character pointer vector Y_sTail word pointer vector Y_e。

The index tags corresponding to the first character pointer element and the last character pointer element of different named entities are obtained by applying the following formula:

Index_s＝Argmax(Y_s)

Index_e＝Argmax(Y_e)

still using fig. 2 as an example to illustrate: taking a product title text X as an example of 'a certain department 2021 ultraviolet-proof ice silk loose comfortable wind coat in summer', the obtained index tag structures of the first character pointer vector and the tail character pointer vector are respectively as follows:

Index_s＝[10000050600040200030]

Index_e＝[01000005000604020003]

it will be understood that the two pointer vectors store index labels for a plurality of named entities, respectively.

Step S1420, querying a preset mapping word list, and restoring the entity type corresponding to the index tag:

referring back to the exemplary embodiment of the present application, the entity type corresponding to each index tag can be obtained by querying the mapping vocabulary as illustrated in the exemplary embodiment of the present application.

Step S1430, extracting character strings corresponding to the named entities according to boundaries defined by the single characters in the text information pointed by the first character pointer element and the last character pointer element of each named entity, and constructing a corresponding relationship list between entity types and named entities:

acquiring boundary information of different named entities in the text information according to the first character pointer vector and the tail character pointer vector:

Entity_k＝{(m，n)|m，n∈1，2，...，l∪m＜n}

where K ∈ 1, 2., K denotes an entity type corresponding to the named entity, K denotes a total number of the entity types corresponding to the named entity, l denotes a length of the text information X, and (m, n) is boundary information defining a boundary of the named entity in the text information.

On the basis, the character strings in the text information X are correspondingly intercepted through the boundary information corresponding to each named entity, so that a data pair formed by the entity type and the named entity is obtained, mapping relation data between the entity type and the named entity is obtained, and named entity extraction is completed.

For example, suppose that the sorting index value of the text information starts with 1, the sorting index value of the original product title is from 1 to 20, and the data list obtained after restoring the entity type and extracting the boundary information is: { "brand": [1, 2], "season of use": [7, 8], "function": [9, 12], "facing material": [13, 14], "type": [15, 16], "style": [19, 20]}. Therefore, the entity type and named entity correspondence list obtained after the entity extraction has the results of ("brand", "certain department"), ("applicable season", "summer"), ("function", "anti-ultraviolet"), ("fabric", "ice silk"), ("type", "loose"), ("style", "wind coat").

The embodiment further exemplarily illustrates a specific way of extracting each named entity from text information by using the first word pointer vector and the last word pointer vector obtained in the present application, and according to the embodiment, the following understanding can be more vividly obtained: in the e-commerce field, the traditional keyword method can only be listed in an exhaustive manner and can only be matched with information in a word bank, but for rapidly-increased commodity data, the method has obvious technical limitation, and for newly-increased commodity data, the accuracy of extraction results is low, because the keyword does not necessarily refer to commodity attributes and may be only description information which just appears together with other words, but the keyword method cannot understand the semantic information. The method based on the rule template can improve the accuracy of attribute extraction, but the recall rate is too low, only results matched with design patterns can be mined, and the method undoubtedly has high cost and low efficiency when the template capable of matching all commodity attributes is defined. The method based on the neural network, the pre-training model and the head-to-tail double pointers enables the neural network to learn and understand semantic information of a whole commodity title text, automatically learns the boundary segmentation capability of a commodity attribute entity and the discrimination capability of an entity type, and improves the accuracy and the recall rate while improving the confidence coefficient of commodity attribute extraction.

Referring to fig. 11, in an extended embodiment, the pointer generation network model is pre-trained, which is trained as a named entity extraction service for commodity information in the e-commerce domain, so as to use the named entity as commodity attribute data, and accordingly, the training process includes the following steps:

step S4100, obtaining a sample data set, wherein the sample data set comprises a plurality of groups of sample data, and each group of sample data comprises a title text of a commodity object and a first character pointer vector and a tail character pointer vector corresponding to a named entity in the title text:

the sample data set used when the pointer network model of the present application is trained may be obtained from a network, and the sample data set may include a large number of sets of sample data, where each set of sample data correspondingly includes a title text of a commodity object, and according to the example of fig. 2 and the example of the mapping word table in the exemplary embodiment of the present application, a first pointer vector and a last pointer vector corresponding to each title text are prepared in advance, so as to give boundary information of a plurality of named entities in the title text through two pointer vectors. The data set obtained in this step is represented as:

D：{(X_j，Y_j，s，Y_j，e)|j∈1，...，n}

wherein X_jA text representing the title of the jth product, represented by_jThe word consists of, expressed as:

Y_j，sthe entity type label corresponding to each word in the jth sample is represented by l_jEach tag consists of, expressed as:

Y_j，ethe entity type label corresponding to each word in the jth text is represented by l_jEach tag consists of, expressed as:

step S4200, each group of sample data is called to train the pointer generation network model, wherein the pointer generation network model predicts a corresponding first character pointer vector and a corresponding tail character pointer vector according to the text feature vectors in the group of sample data:

the pointer generation network model is trained by calling a group of sample data at a time, and according to the structure and principle of the pointer generation network model disclosed in the application, it can be understood that the pointer generation network model predicts two pointer vectors corresponding to the heading text according to the text feature vector extracted by the text feature extraction model in the group of sample data, wherein the two pointer vectors include a first character pointer vector and a last character pointer vector.

The extraction process and the transformation formula of the pointer generation network model for the first word pointer vector and the last word pointer vector can refer to the foregoing description, which is not repeated here.

Step S4300, respectively and correspondingly supervising the first character pointer vector and the tail character pointer vector predicted by the pointer generation network model according to the first character pointer vector and the tail character pointer vector in the group of trained sample data, calculating a loss value, and calling the next group of sample data to continue iterative training on the pointer generation network model when the loss value is greater than a preset threshold value:

after two pointer vectors corresponding to a commodity title text in a group of sample data are predicted, the predicted loss can be calculated by utilizing two pointer vectors prepared in the group of sample data in a one-to-one correspondence mode and the two predicted pointer vectors, then the loss of the two pointer vectors is overlapped to obtain a loss value corresponding to the whole model, and the weight parameters of the pointer network generation model are corrected according to the loss value in a back propagation mode, so that gradient updating is achieved.

If the loss value of the training does not reach the expected preset threshold value, namely the pointer generation network model does not reach the convergence state, accordingly, another group of sample data can be further called to continue iterative training on the model, and by analogy, iteration is continuously carried out, and finally the loss value is close to 0 or reaches the expected preset threshold value, namely the model can be regarded as being converged, so that the model can be put into use.

In this embodiment, an example of training the pointer generation network model implemented by the present application as a named entity extraction for serving commodity information in the e-commerce field is given, so that those skilled in the art can apply the pointer generation network model of the present application to similar fields at one stroke, and it can be understood that the pointer generation network model can be adapted to serve the named entity extraction in the corresponding field only by applying relevant sample data and mapping vocabularies in the corresponding field.

Referring to fig. 12, a named entity recognition apparatus adapted to one of the purposes of the present application is a functional embodiment of the named entity recognition method of the present application, and the apparatus includes: the system comprises a text acquisition module 1100, a feature extraction module 1200, a pointer generation module 1300 and an entity extraction module 1400, wherein the text acquisition module 1100 is used for acquiring text information of a named entity to be identified, and the text information comprises a plurality of single characters; the feature extraction module 1200 is configured to extract deep semantic information of the text information to obtain a text feature vector of the text information; the pointer generating module 1300 is configured to generate a first pointer vector and a last pointer vector according to the text feature vector, where each pointer vector includes pointer elements that point to each single character in the text information in sequence, and the pointer elements in the two pointer vectors corresponding to the first character and the last character of a named entity in the text information store an index tag of an entity type to which the named entity belongs; the entity extracting module 1400 is configured to extract a character string between single characters in the text information pointed by the pointer elements with the same index tag in the two pointer vectors as a named entity.

In a further embodiment, the text acquiring module 1100 includes: the object acquisition submodule is used for acquiring the commodity object of the named entity to be identified; the text extraction submodule is used for extracting the commodity title text and/or the commodity detail information of the commodity object as commodity information; and the text optimization submodule is used for carrying out data cleaning on the commodity information of the commodity object and constructing the text information after the data cleaning as the text information of the named entity to be identified.

In a further embodiment, in the pointer generation module 1300, a pre-trained pointer generation network model implementation is called, where the pointer generation network model includes: the first character pointer network is used for inputting the text characteristic vector as a vector to be processed, adapting to the text information word by word and performing characteristic operation on the vector to be processed by adopting an encoder so as to generate a first character pointer vector corresponding to the current single character; the vector splicing network is used for splicing the text characteristic vector with the first character pointer vector corresponding to the current character to obtain a fusion characteristic vector corresponding to the current character; and the tail word pointer network is used for inputting the fusion characteristic vector serving as a vector to be processed, adapting to the text information word by word and performing characteristic operation on the vector to be processed by adopting an encoder so as to generate a tail word pointer vector corresponding to the current word.

In a further embodiment, the entity extraction module 1400 comprises: the character extraction submodule is used for acquiring index tags corresponding to the first character pointer element and the tail character pointer element of different named entities according to the first character pointer vector and the tail character pointer vector; the mapping reduction sub-module is used for inquiring a preset mapping word list and reducing the entity type corresponding to the index tag; and the list construction submodule is used for extracting character strings corresponding to the named entities according to the boundaries limited by the single characters in the text information pointed by the first character pointer element and the tail character pointer element of each named entity, and constructing a corresponding relation list of the entity type and the named entities.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. As shown in fig. 13, the internal structure of the computer device is schematically illustrated. The computer device includes a processor, a computer-readable storage medium, a memory, and a network interface connected by a system bus. The computer readable storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can make a processor realize a named entity identification method when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform the named entity identification method of the present application. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 13 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In this embodiment, the processor is configured to execute specific functions of each module and its sub-module in fig. 12, and the memory stores program codes and various data required for executing the modules or the sub-modules. The network interface is used for data transmission to and from a user terminal or a server. The memory in this embodiment stores program codes and data required for executing all modules/sub-modules in the named entity recognition apparatus of the present application, and the server can call the program codes and data of the server to execute the functions of all sub-modules.

The present application also provides a storage medium storing computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the named entity recognition method of any of the embodiments of the present application.

The present application also provides a computer program product comprising computer programs/instructions which, when executed by one or more processors, implement the steps of the method as described in any of the embodiments of the present application.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments of the present application can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when the computer program is executed, the processes of the embodiments of the methods can be included. The storage medium may be a computer-readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

In summary, after deep semantic extraction is performed based on text information to obtain text feature vectors, two pointer vectors for respectively indicating the first character and the last character of the named entity in the text information are uniformly generated on the basis of the text feature vector, namely a first word pointer vector and a tail word pointer vector, a plurality of named entities in the text information can be extracted in batch through the two pointer vectors, the recognition efficiency of the named entities is improved, the recall rate and the accuracy rate are higher, the method is particularly suitable for application scenes needing to process massive data, such as extracting related commodity attribute information from commodity information of an e-commerce platform, namely, the method of the application can process the commodity information of the mass commodity objects in the E-commerce platform in batch, thereby extracting the named entities in each commodity object, the method and the device are used for constructing the commodity attribute information of the commodity, and therefore the technical scheme has a very wide application prospect.

Those of skill in the art will appreciate that the various operations, methods, steps in the processes, acts, or solutions discussed in this application can be interchanged, modified, combined, or eliminated. Further, other steps, measures, or schemes in various operations, methods, or flows that have been discussed in this application can be alternated, altered, rearranged, broken down, combined, or deleted. Further, steps, measures, schemes in the prior art having various operations, methods, procedures disclosed in the present application may also be alternated, modified, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A named entity recognition method, comprising:

2. The named entity recognition method of claim 1, wherein obtaining text information of the named entity to be recognized comprises the steps of:

acquiring a commodity object of a named entity to be identified;

3. The named entity recognition method of claim 1, wherein a first word pointer vector and a last word pointer vector are generated according to the text feature vector, and a pre-trained pointer generation network model is invoked, wherein the pointer generation network model performs the following steps:

4. The named entity recognition method of claim 3, wherein the encoder performs the following steps for each word in the text message to obtain the corresponding first word pointer vector or last word pointer vector:

5. The method as claimed in claim 4, wherein the step of updating the hidden layer memory vector corresponding to the current word for the encoding process of the next word comprises the following steps:

6. The named entity recognition method according to any one of claims 1 to 5, wherein extracting a character string between single words in text information pointed to by pointer elements of the same index tag in the two pointer vectors as a named entity comprises the steps of:

7. The named entity recognition method of any one of claims 3 to 5, wherein the pointer-generating network model is pre-trained, wherein the training process comprises the following steps:

8. A computer device comprising a central processor and a memory, characterized in that the central processor is adapted to invoke execution of a computer program stored in the memory to perform the steps of the method according to any one of claims 1 to 7.

9. A computer-readable storage medium, characterized in that it stores, in the form of computer-readable instructions, a computer program implemented according to the method of any one of claims 1 to 7, which, when invoked by a computer, performs the steps comprised by the corresponding method.

10. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method as claimed in any one of claims 1 to 7.