CN114564958B

CN114564958B - Text recognition method, device, equipment and medium

Info

Publication number: CN114564958B
Application number: CN202210026946.XA
Authority: CN
Inventors: 侯翠琴; 李鹏宇; 杜江楠; 李剑锋
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-01-11
Filing date: 2022-01-11
Publication date: 2023-08-04
Anticipated expiration: 2042-01-11
Also published as: CN114564958A

Abstract

The invention relates to an artificial intelligence technology and discloses a text recognition method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: inputting the text and the problem description into an identification network model, and respectively carrying out semantic coding on the text and the problem description by a first coding model and a second coding model in the identification network model to obtain a text vector and a problem description vector; fusing the text vector and the problem description vector by identifying a fusion model in the network model to obtain text features of the concerned problem; all entities, relationships between entities, or attributes between entities in the text vector are predicted and output according to the text features of the problem of interest by identifying a decoding model in the network model. Because the model completes different result identification according to different problem descriptions and can identify any relation or attribute, the scheme can complete entity, relation and attribute identification with high precision, is not limited by relation or attribute types, and can realize open domain identification.

Description

Text recognition method, device, equipment and medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a text recognition method, a text recognition device, electronic equipment and a storage medium.

Background

The identification of entities and entity relationships and attributes is an important technical means for understanding the query text requested by the user, and is also an important method for constructing a knowledge graph.

Currently, common entity, i.e., entity relationship and attribute, recognition systems are classified into a linear (pipeline) recognition form and a unified recognition form. For the linear recognition form, the entities in sentences are recognized first, then the relationships and the attributes among the entities are recognized based on the entity recognition result, and the manner can cause the errors of the entity recognition to be conducted into the relationship and attribute recognition tasks because the association relationship between the two types of tasks is not considered. For the unified recognition form, the entity recognition and the entity relationship and attribute recognition are modeled as multi-classification task learning problems, and the two problems are uniformly learned through sharing the underlying representation, in this way, the association relationship between the two types of tasks is considered, but the entity recognition and the entity relationship and the attribute recognition are modeled as classification tasks which are only suitable for the relationship attribute recognition of a closed domain (namely, the recognition of the specified relationship and attribute) but cannot recognize the relationship attribute of an open domain.

Disclosure of Invention

The invention aims at providing a text recognition method, a text recognition device, an electronic device and a storage medium aiming at the defects of the prior art, and the aim is achieved through the following technical scheme.

The first aspect of the present invention proposes a text recognition method, the method comprising:

inputting the text and the problem description into a first coding model and a second coding model respectively for semantic coding to obtain a text vector and a problem description vector;

inputting the text vector and the problem description vector into a fusion model for fusion so as to obtain text characteristics of the concerned problem;

and acquiring text features and text vectors of the attention problem through the decoding model, predicting all entities, relationships among the entities or attributes among the entities in the text vectors according to the text features of the attention problem, and outputting the predicted text features.

In some embodiments of the present application, the fusion model includes a first self-attention layer, a second self-attention layer, a key-value pair attention layer, and a third self-attention layer; inputting the text vector and the problem description vector into a fusion model for fusion to obtain text characteristics of the concerned problem, wherein the method comprises the following steps:

inputting the text vector into a first self-attention layer to extract the characteristics of the text vector, and obtaining a first characteristic representation; inputting the problem description vector into a second self-attention layer to extract the characteristics of the problem description vector, and obtaining a second characteristic representation; fusing the first characteristic representation and the second characteristic representation into the key value to acquire a characteristic representation of the attention problem; inputting the feature representation of the attention problem into a third self-attention layer to further extract the feature of the feature representation of the attention problem, and obtaining the text feature of the attention problem.

In some embodiments of the present application, the first coding model includes a first bi-directional transmitted encoder bert layer and a first long-short-term memory neural network LSTM layer; the first coding model performs semantic coding on the text to obtain a text vector, and the method comprises the following steps:

inputting the text into a first bert layer for semantic coding to obtain a text vector; and inputting the text vector into a first LSTM layer for extracting the context, obtaining the enhanced text vector and outputting the enhanced text vector.

In some embodiments of the present application, the second coding model includes a second bert layer and a second LSTM layer; the second coding model performs semantic coding on the problem description to obtain a problem description vector, and the method comprises the following steps:

inputting the problem description into a second bert layer for semantic coding to obtain a problem description vector; and inputting the problem description vector into a second LSTM layer for extracting context, obtaining the enhanced problem description vector and outputting the enhanced problem description vector.

In some embodiments of the present application, when the problem description is an entity problem description, the recognition network model outputs all entities in the text vector. When the problem description is entity relation problem description, the recognition network model outputs the relation between entities in the text vector. When the problem description is entity attribute problem description, the recognition network model outputs the attribute relationship among the entities in the text vector.

In some embodiments of the present application, the decoding model is a conditional random field CRF model.

A second aspect of the present invention proposes a text recognition device applied to a recognition network model, the recognition network model comprising: a first encoding model, a second encoding model, a fusion model, and a decoding model, the apparatus comprising:

the recognition module is used for inputting the text and the problem description into the first coding model and the second coding model respectively for semantic coding to obtain a text vector and a problem description vector; inputting the text vector and the problem description vector into a fusion model for fusion so as to obtain text characteristics of the concerned problem; acquiring text features and text vectors of the attention problem through the decoding model, and predicting all entities, relationships among entities or attributes among entities in the text vectors according to the text features of the attention problem;

the acquisition module is used for acquiring the identification result output by the identification network model, wherein the identification result comprises entities, relationships among the entities or attributes among the entities.

A third aspect of the invention proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, said processor implementing the steps of the method according to the first aspect described above when said program is executed.

A fourth aspect of the invention proposes a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the method according to the first aspect described above.

Based on the text recognition method and the text recognition device according to the first aspect and the second aspect, the invention has at least the following beneficial effects or advantages:

according to the method, text content and problem description are subjected to semantic coding in parallel through two coding models and then input into a fusion model, the text content and the problem description are adaptively fused according to the problem description by the fusion model to capture the relation between the content description and the problem, when the problem description is an entity problem, the fusion model mainly focuses on the entity in the text, the model identification result is the entity in the text, when the problem description is an entity relation problem, the fusion model mainly focuses on the relation between the two entities in the text, the model identification result is the relation between the entities in the text, when the problem description is an entity attribute problem, the fusion model mainly focuses on the attribute relation between the two entities in the text, and the model identification result is the attribute relation between the entities in the text. Because the recognition network model is used for completing recognition of different results according to different input problem descriptions, and the model can be used for recognizing entities of any relationship or attribute, the scheme of the invention can complete entity recognition, entity relationship recognition and entity attribute recognition with higher precision, is not limited by relationship or attribute types, and can realize recognition of open domain relationship and attribute.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a flow chart illustrating an embodiment of a text recognition method according to an exemplary embodiment of the present invention;

FIG. 2 is a schematic diagram of an identification network model according to an exemplary embodiment of the present invention;

FIG. 3 is a schematic diagram of a structure for identifying fusion models in a network model according to an exemplary embodiment of the present invention;

fig. 4 is a schematic structural view of a text recognition device according to an exemplary embodiment of the present invention;

fig. 5 is a schematic diagram showing a hardware structure of an electronic device according to an exemplary embodiment of the present invention;

fig. 6 is a schematic diagram illustrating a structure of a storage medium according to an exemplary embodiment of the present invention.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

In the prior art, the attribute identification of the closed domain relationship in the natural language processing technology refers to the fact that before the task identification is carried out, the relationship types and the attribute types which need to be identified by the task are preset, but in the practical application, the number of entity types, the number of relationship types and the number of attribute types are large, and all entity types, relationship types and attribute types cannot be completely covered by the preset. For example, in the sentence "apple company with a country president will go to access Qiao Busi," there are three triples of [ a country "," president "," Zhang three "," Qiao Busi "," create "," apple company "," access "," apple company "," three ", each triplet is in the form of (entity 1, relationship, entity 2) or (entity 1, attribute value), and in the closed domain relationship attribute identification," president "," create "," access "relationship or attribute needs to be defined in advance.

In order to solve the problem that the prior art can only identify the relationship attribute of the closed domain, the application provides a text identification method, namely, text and the problem are respectively described and input into a first coding model and a second coding model in an identification network model to carry out semantic coding to obtain a text vector and a problem description vector, the text vector and the problem description vector are input into a fusion model in the identification network model to be fused, so as to obtain text characteristics of the concerned problem, and finally, all entities or relationships among entities or attributes among entities in the text vector are predicted according to the text characteristics of the concerned problem through a decoding model in the identification network model and are output.

The technical effects which can be achieved based on the above description are as follows:

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application.

Embodiment one:

fig. 1 is a flowchart of an embodiment of a text recognition method according to an exemplary embodiment of the present invention, and fig. 2 is a network structure of a recognition network model used in the present embodiment, where the text recognition method may be applied to a computer device, and the computer device may be a terminal device, a mobile terminal, a PC, a server, etc., and in conjunction with the description of fig. 1 and 2, the text recognition method includes the following steps:

step 101: and respectively inputting the text and the entity problem description into a first coding model and a second coding model in the recognition network model to carry out semantic coding, so as to obtain a text vector and a problem description vector.

Where text refers to sentences or paragraphs to be identified, question descriptions are used to indicate specific questions of interest to the model.

In the embodiment of the application, the problem description comprises an entity problem description, a relationship problem description among entities and an attribute problem description.

In one possible implementation manner, the first coding model is used for converting the input text content into a vector that can be identified by the model, referring to the structure of the first coding model shown in fig. 2, the process of performing semantic coding on the text by using the first coding model specifically includes: the text input is semantically encoded through a first bert (Bidirectional Encoder Representation from Transformer) layer, the text vector obtained by encoding is input into a first LSTM layer, and the text vector is subjected to context extraction through a first LSTM (Long Short-Term Memory) layer, so that the enhanced text vector is obtained.

Wherein a more comprehensive text vector representation can be obtained by adding a first LSTM layer after the first bert layer to further obtain the context.

In another possible implementation manner, the second coding model is used for converting the input problem description content into a vector that can be identified by the model, referring to the structure of the second coding model shown in fig. 2, the process of performing semantic coding on the problem description by using the second coding model specifically includes: and carrying out semantic coding on the input problem description through a second bert layer, inputting the problem description vector obtained by coding into a second LSTM layer, and carrying out context extraction on the problem description vector by the second LSTM layer to obtain the enhanced entity problem description vector.

Wherein a more comprehensive problem description vector representation can be obtained by adding a second LSTM layer after the second bert layer to further obtain the context.

Step 102: and inputting the text vector and the problem description vector into a fusion model in the recognition network model to fuse so as to obtain the text characteristics of the concerned problem.

Wherein the fusion model is used to capture the relationship between the text description and the problem description.

In one possible implementation manner, the fusion model may capture the relationship between the text description and the question description by using an attention mechanism, as shown in a fusion model structure in fig. 3, for a process of fusing the text vector and the question description vector by using the fusion model, extracting features of the text vector by using a first self-attention layer in the fusion model, obtaining a first feature representation and inputting the first feature representation to a key value pair attention layer in the fusion model, extracting features of the question description vector by using a second self-attention layer in the fusion model, obtaining a second feature representation and inputting the second feature representation to the key value pair attention layer, then fusing the first feature representation and the second feature representation by using the key value pair attention layer, obtaining a feature representation of the question of interest and inputting the feature representation to a third self-attention layer in the fusion model, and finally further extracting features of the feature representation of the question of interest by using the third self-attention layer, obtaining the text feature of the question of interest and outputting the text feature of the question of interest.

The text vector and the question description vector are firstly respectively processed by a self-attention layer (self-attention), feature representations favorable for interaction of the text vector and the question description vector are further extracted, then key values are input to fuse the attention layers (key-value attention), so that the feature representations of the text can fully consider the questions, and then the self-attention layer (self-attention) is input, and the feature representations after the questions are further enhanced and output.

Notably, the length of the text feature of the question of interest output by the fusion model is consistent with the length of the text vector originally input. That is, the length of the input text vector is M, the length of the question description vector is L, the dimensions are all D, that is, the text vector is represented as an m×d matrix, the question description vector is represented as an l×d matrix, the dimensions of the matrix are unchanged after each pass through the self-value layer, the output of each pass through the key-value layer is an m×d matrix, and the text feature which obtains the question after each pass through the self-value layer is represented as an m×d matrix, that is, a matrix representation consistent with the length of the text vector.

Step 103: and acquiring text features and text vectors of the attention problem by identifying a decoding model in the network model, predicting all entities, relationships among the entities or attributes among the entities in the text vectors according to the text features of the attention problem, and outputting the predicted text features and text vectors.

Alternatively, the decoding model may be implemented using CRF (Conditional Random Field ).

Based on the embodiments shown in fig. 1 to 3, when the problem description input to the recognition network model is an entity problem description, the recognition network model outputs all entities in the text vector.

In one example, the entity problem description may specifically be: "which entities are in the text".

In one specific example, the text is described as: "a certain country president Zhang three will go to visit Qiao Busi created apple company", the entity problem is described as: "which entities exist in the description", the prediction result output by the network model is identified by using different BIs, for example, the prediction result is O O O O B I I O O O O B I I O O O B I I I, and the three entities of Zhang san "," Qiao Busi "and" apple company "are identified.

In another specific example, the text is described as: "Chinese basketball player Yao Ming is 2.26 meters in height", the physical problem is described as: "which entities exist in the description", the predicted result output by the identified network model is B I O O O O O B I O O B I, which means that three entities of "China", "Yao Ming", "2.26 m" are identified.

When the problem description input to the recognition network model is a relation problem description, the recognition network model outputs the relation among entities in the text vector.

In one example, the entity relationship problem description may specifically be: the relationship between "× and XX is what. Based on the above description, after all entities in the text are identified by the identified network model, the relationship between any pair of entities among all entities may be exhausted.

In one specific example, when three entities, "Zhang San", "Qiao Busi", "apple" are identified, model the entity relationship problem as entered is described as: what the relationship of "Zhang Sans" to "apple Inc." is, the text description is still: "the third party of a country will visit the apple company created by Qiao Busi", the predicted result output by the identified network model is O O O O O O O O O RB RI O O O O O O O O O O, wherein RB RI … RI is the predicted relationship result "visit"; the input entity relationship problem is described as what is the relationship between Zhang Sanand Qiao Busi, the prediction result output by the recognition network model is O O O O O O O O O O O O O O O O O O O O O, and the relationship between Zhang Sanand Qiao Busi is not existed (the prediction result is all O, and the relationship between Zhang Sanand Qiao Busi is not existed).

In another specific example, when the recognition network model recognizes three entities of "china", "Yao Ming", "2.26 m", if the input entity relationship problem is described as what is the relationship between "china" and "Yao Ming", the text description is still: the height of the Chinese basketball player Yao Ming is 2.26 meters, the prediction result output by the recognition network model is O O RB RI RI RI RI O O O O O O, and the recognized relationship is "basketball player".

When the problem description input to the recognition network model is the entity attribute relationship problem description, the recognition network model outputs the attribute relationship among the entities in the text vector.

In one example, the entity attribute problem description may specifically be: attribute relationship between "× and XX. Based on the above description, after all entities in the text are identified by the identified network model, attribute relationships between any pair of entities among all entities may be exhausted.

In a specific example, when the recognition network model recognizes three entities, namely, "Zhang san", "Qiao Busi" and "apple company", the input entity attribute problem is described as: what the attribute relationship of "Qiao Busi" to "apple" is, the text description is still: "the third party of a country will visit Qiao Busi to create apple company", the prediction result output by the recognition network model is O O O O O O O O O O O O O O O O O O O O O, and the result indicates that "Qiao Busi" and "apple company" have no attribute relationship ("Qiao Busi" and "apple company" have no attribute relationship, and the relationship between them is "create").

In another specific example, when the recognition network model recognizes three entities of "china", "Yao Ming", "2.26 m", the input entity attribute problem is described as: what the attribute relationship of "Yao Ming" to "2.26 meters" is, the text description is still: the height of the Chinese basketball player Yao Ming is 2.26 meters, the prediction result output by the recognition network model is O O O O O O O O O AB AI O O, and the AB AI … AI is the predicted attribute result of "height"; the input entity attribute problem is described as: what the attribute relationship between "Chinese" and "Yao Ming" is, the prediction result output by the recognition network model is O O O O O O O O O O O O O, which indicates that the attribute relationship between "Chinese" and "Yao Ming" does not exist.

Therefore, the fusion model in the network model is identified to capture the relation between the text and the entity/relation/attribute, the fusion model adaptively fuses the text content and the problem description according to the problem description, when the problem description is what entity exists, the fusion model mainly focuses on the entity in the text, and the model output result is all the entities in the text; when the problem is described as the relationship between the entity and the XX entity, the fusion model mainly focuses on the relationship between the entity and the XX entity, and the model output is the relationship between the two entities; when the problem is described as the attribute relationship between the entity and the XX entity, the fusion model mainly focuses on the attribute relationship between the entity and the XX entity, and the model output is the attribute relationship between the two entities.

The recognition flow shown in fig. 1 is completed, text content and problem description are subjected to semantic coding in parallel through two coding models, then the text content and the problem description are input into a fusion model, the fusion model is used for adaptively fusing the text content and the problem description according to the problem description so as to capture the relationship between the content description and the problem, when the problem description is an entity problem, the fusion model mainly focuses on the entity in the text, the model recognition result is the entity in the text, when the problem description is an entity relationship problem, the fusion model mainly focuses on the relationship between two entities in the text, the model recognition result is the relationship between the entities in the text, and when the problem description is an entity attribute problem, the fusion model mainly focuses on the attribute relationship between the two entities in the text, and the model recognition result is the attribute relationship between the entities in the text. Because the recognition network model is used for completing recognition of different results according to different input problem descriptions, and the model can be used for recognizing entities of any relationship or attribute, the scheme of the invention can complete entity recognition, entity relationship recognition and entity attribute recognition with higher precision, is not limited by relationship or attribute types, and can realize recognition of open domain relationship and attribute.

The invention also provides an embodiment of the text recognition device corresponding to the embodiment of the text recognition method.

Fig. 4 is a schematic structural diagram of a text recognition device according to an exemplary embodiment of the present invention, where the text recognition device is configured to perform the text recognition method provided in any one of the foregoing embodiments, and as shown in fig. 4, the text recognition device includes:

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The embodiment of the invention also provides electronic equipment corresponding to the text recognition method provided by the embodiment of the invention, so as to execute the text recognition method.

Fig. 5 is a hardware configuration diagram of an electronic device according to an exemplary embodiment of the present invention, the electronic device including: a communication interface 601, a processor 602, a memory 603 and a bus 604; wherein the communication interface 601, the processor 602 and the memory 603 perform communication with each other via a bus 604. The processor 602 may perform the text recognition method described above by reading and executing machine executable instructions in the memory 603 corresponding to the control logic of the text recognition method, the details of which are referred to in the above embodiments and will not be further described herein.

The memory 603 referred to herein may be any electronic, magnetic, optical, or other physical storage device that may contain stored information, such as executable instructions, data, or the like. In particular, the memory 603 may be RAM (Random Access Memory ), flash memory, a storage drive (e.g., hard drive), any type of storage disk (e.g., optical disk, DVD, etc.), or a similar storage medium, or a combination thereof. The communication connection between the system network element and at least one other network element is achieved through at least one communication interface 601 (which may be wired or wireless), the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.

Bus 604 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. The memory 603 is configured to store a program, and the processor 602 executes the program after receiving an execution instruction.

The processor 602 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in the processor 602. The processor 602 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor.

The electronic device provided by the embodiment of the application and the text recognition method provided by the embodiment of the application are the same in inventive concept, and have the same beneficial effects as the method adopted, operated or implemented by the electronic device.

The present embodiment also provides a computer readable storage medium corresponding to the text recognition method provided in the foregoing embodiment, referring to fig. 6, the computer readable storage medium is shown as an optical disc 30, on which a computer program (i.e. a program product) is stored, where the computer program, when executed by a processor, performs the text recognition method provided in any of the foregoing embodiments.

It should be noted that examples of the computer readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical or magnetic storage medium, which will not be described in detail herein.

The computer readable storage medium provided by the above embodiments of the present application has the same advantageous effects as the method adopted, operated or implemented by the application program stored therein, for the same inventive concept as the text recognition method provided by the embodiments of the present application.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.

Claims

1. A text recognition method, applied to a recognition network model, the recognition network model comprising: the text recognition method comprises the steps of a first coding model, a second coding model, a fusion model and a decoding model, wherein the text recognition method comprises the following steps:

acquiring text features and the text vectors of the attention problem through the decoding model, predicting all entities, relationships among the entities or attributes among the entities in the text vectors according to the text features of the attention problem, and outputting the predicted text features;

wherein the fusion model includes a first self-attention layer, a second self-attention layer, a key-value pair attention layer, and a third self-attention layer; inputting the text vector and the problem description vector into a fusion model for fusion to obtain text characteristics of the concerned problem, wherein the method comprises the following steps:

inputting the text vector into a first self-attention layer to extract the characteristics of the text vector, and obtaining a first characteristic representation; inputting the problem description vector into a second self-attention layer to extract the characteristics of the problem description vector, and obtaining a second characteristic representation; fusing the first characteristic representation and the second characteristic representation into the key value to acquire a characteristic representation of the attention problem; inputting the feature representation of the attention problem into a third self-attention layer to further extract the feature of the feature representation of the attention problem, and obtaining text features of the attention problem;

the first coding model comprises a first bidirectional transmission encoder bert layer and a first long-short-term memory neural network LSTM layer; the first coding model performs semantic coding on the text to obtain a text vector, and the method comprises the following steps:

inputting the text into a first bert layer for semantic coding to obtain a text vector; inputting the text vector into a first LSTM layer for extracting context, obtaining an enhanced text vector and outputting the enhanced text vector;

the second coding model comprises a second bert layer and a second LSTM layer; the second coding model performs semantic coding on the problem description to obtain a problem description vector, and the method comprises the following steps:

2. The method of claim 1, wherein when the problem description is an entity problem description, the recognition network model outputs all entities in the text vector; when the problem description is entity relation problem description, the recognition network model outputs the relation among entities in the text vector; when the problem description is entity attribute problem description, the recognition network model outputs the attribute relationship among the entities in the text vector.

3. The method of claim 1, wherein the decoding model is a conditional random field CRF model.

4. A text recognition device, characterized by being applied to a recognition network model, the recognition network model comprising: a first encoding model, a second encoding model, a fusion model, and a decoding model, the apparatus comprising:

Wherein the fusion model includes a first self-attention layer, a second self-attention layer, a key-value pair attention layer, and a third self-attention layer; the recognition module is specifically configured to, in a process of inputting the text vector and the problem description vector into a fusion model to fuse to obtain text features of a problem of interest, input the text vector into a first self-attention layer to extract features of the text vector, and obtain a first feature representation; inputting the problem description vector into a second self-attention layer to extract the characteristics of the problem description vector, and obtaining a second characteristic representation; fusing the first characteristic representation and the second characteristic representation into the key value to acquire a characteristic representation of the attention problem; inputting the feature representation of the attention problem into a third self-attention layer to further extract the feature of the feature representation of the attention problem, and obtaining text features of the attention problem;

the first coding model comprises a first bidirectional transmission encoder bert layer and a first long-short-term memory neural network LSTM layer; the recognition module is specifically configured to perform semantic coding on the text in the first coding model, and in the process of obtaining a text vector, input the text into a first bert layer to perform semantic coding, so as to obtain a text vector; inputting the text vector into a first LSTM layer for extracting context, obtaining an enhanced text vector and outputting the enhanced text vector;

the second coding model comprises a second bert layer and a second LSTM layer; the recognition module is specifically configured to perform semantic coding on the problem description in the second coding model, and input the problem description into a second bert layer for performing semantic coding in the process of obtaining a problem description vector, so as to obtain a problem description vector; and inputting the problem description vector into a second LSTM layer for extracting context, obtaining the enhanced problem description vector and outputting the enhanced problem description vector.

5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-3 when the program is executed.

6. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any of claims 1-3.