CN112905757A

CN112905757A - Text processing method and device

Info

Publication number: CN112905757A
Application number: CN202110110374.9A
Authority: CN
Inventors: 侯昶宇; 李长亮
Original assignee: Chengdu Kingsoft Interactive Entertainment Technology Co ltd; Beijing Kingsoft Software Co Ltd
Current assignee: Chengdu Kingsoft Interactive Entertainment Technology Co ltd; Beijing Kingsoft Software Co Ltd; Beijing Kingsoft Digital Entertainment Co Ltd
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2021-06-04

Abstract

The application provides a text processing method and a text processing device, wherein the text processing method comprises the following steps: acquiring a problem text submitted by a user aiming at a target field; extracting entities and relations in the question text, and creating a query path according to the entities and the relations; determining path structure information corresponding to the query path, and determining a limiting condition corresponding to the path structure information according to the entity and the relationship; and determining answer texts corresponding to the question texts in a graph database corresponding to the target field based on the limiting conditions and the query path, and feeding back the answer texts to the user.

Description

Text processing method and device

Technical Field

The present application relates to the field of text processing technologies, and in particular, to a text processing method and apparatus.

Background

With the development of internet technology, more and more information starts to be digitalized; and with the data conversion of the information, the user is more convenient to search the related information. In the prior art, when information is converted into a data form, a tabular database is usually adopted to store corresponding data, and when a user needs to query information which is related to each other, the user needs to query and record for many times to obtain answers; when the method faces multi-hop query, the multi-hop query can be realized only by calling and recording for multiple times, the process not only needs long time, but also has too many hops and too large amount of single query results, which not only affects query efficiency, but also needs extra storage space to record the query results of each hop, and therefore an effective scheme is urgently needed to solve the problems.

Disclosure of Invention

In view of this, embodiments of the present application provide a text processing method to solve technical defects in the prior art. The embodiment of the application also provides a text processing device, a text processing system, a computing device and a computer readable storage medium.

According to a first aspect of embodiments of the present application, there is provided a text processing method, including:

acquiring a problem text submitted by a user aiming at a target field;

extracting entities and relations in the question text, and creating a query path according to the entities and the relations;

determining path structure information corresponding to the query path, and determining a limiting condition corresponding to the path structure information according to the entity and the relationship;

and determining answer texts corresponding to the question texts in a graph database corresponding to the target field based on the limiting conditions and the query path, and feeding back the answer texts to the user.

Optionally, the extracting the entities and the relationships in the question text includes:

standardizing the problem text according to a preset relation set in the target field to obtain a target problem text;

performing word segmentation processing on the target problem text to obtain a plurality of word units, and matching the plurality of word units with a reference relation contained in the relation set;

and determining the relation according to the matching result, and extracting the entity in the question text based on the relation.

Optionally, the creating a query path according to the entity and the relationship includes:

constructing question features corresponding to the question texts based on the entities and the relations;

inputting the question features into a semantic recognition model for processing to obtain an intention label corresponding to the question text;

extracting target entities from the entities according to the intention tags, and extracting target relationships from the relationships;

creating the query path based on the target entity and the target relationship.

Optionally, the determining the path structure information corresponding to the query path includes:

analyzing the query path to obtain path nodes and path relations in the query path;

and determining the path structure information according to the path node and the path relation.

Optionally, the determining, according to the entity and the relationship, a limiting condition corresponding to the path structure information includes:

extracting conditional entities in the entities based on the path structure information and conditional relationships in the relationships;

and generating the limiting condition corresponding to the path structure information according to the condition entity or the condition relation.

Optionally, the generating the limiting condition corresponding to the path structure information according to the conditional entity or the conditional relationship includes:

inputting the question text into a text recognition model for processing to obtain a question type corresponding to the question text;

generating the limiting condition corresponding to the path structure information according to the condition entity under the condition that the problem type is an entity problem type;

and generating the limiting condition corresponding to the path structure information according to the condition relation when the problem type is a relation problem type.

Optionally, the determining, based on the limiting condition and the query path, an answer text corresponding to the question text in a graph database corresponding to the target field includes:

determining a target entity in the graph database according to the limiting condition and the query path under the condition that the problem type is an entity problem type;

and generating the answer text corresponding to the question text according to the target entity.

determining a target relationship in the graph database according to the limiting condition and the query path under the condition that the problem type is a relationship problem type;

and generating the answer text corresponding to the question text according to the target relation.

updating the query path according to the limiting condition to obtain a target query path;

and determining the answer text corresponding to the question text in the graph database based on the target query path.

Optionally, the graph database is created by:

acquiring target data corresponding to the target field;

generating triples according to the target data, and constructing the graph database based on the triples.

According to a second aspect of embodiments of the present application, there is provided a text processing apparatus including:

the acquisition module is configured to acquire a question text submitted by a user aiming at a target field;

the creating module is configured to extract the entity and the relation in the question text and create a query path according to the entity and the relation;

the determining module is configured to determine path structure information corresponding to the query path, and determine a limiting condition corresponding to the path structure information according to the entity and the relationship;

and the feedback module is configured to determine answer texts corresponding to the question texts in a graph database corresponding to the target field based on the limiting conditions and the query path, and feed back the answer texts to the user.

According to a third aspect of embodiments of the present application, there is provided a text processing system including:

a client and a server;

the client is configured to receive a question text uploaded by a user and a domain selection instruction submitted aiming at the question text; sending the domain selection instruction and the question text to the server;

the server is configured to extract the entity and the relation in the question text and create a query path according to the entity and the relation; determining path structure information corresponding to the query path, and determining a limiting condition corresponding to the path structure information according to the entity and the relationship; determining a target field corresponding to the field selection instruction, determining an answer text corresponding to the question text in a graph database corresponding to the target field based on the limiting condition and the query path, and sending the answer text to the client;

the client is further configured to create a feedback interface corresponding to the question text according to the answer text and display the feedback interface to the user.

According to a fourth aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is for storing computer-executable instructions that when executed by the processor implement the steps of the text processing method.

According to a fifth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the text processing method.

According to the text processing method provided by the application, after the question text submitted by a user aiming at the target field is obtained, the entity and the relation in the question text are extracted, then the query path is established based on the question text and the relation, at the moment, the initial answer text corresponding to the question text can be preliminarily determined according to the query path, in order to improve the precision of querying the answer text corresponding to the question text, the path structure information corresponding to the query path is determined, the limiting condition corresponding to the path structure information is determined according to the entity and the relation, finally, the answer text corresponding to the question text is extracted from the graph database corresponding to the target field according to the limiting condition and the query path and fed back to the user, and the answer text is determined aiming at the question text, the query efficiency of determining the answer text is improved through the query path, and the accuracy of determining the answer text is improved through the limiting conditions, so that the answer text can be fed back to the user quickly and accurately.

Drawings

Fig. 1 is a flowchart of a text processing method according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a text processing method applied to a contract retrieval scenario according to an embodiment of the present application;

fig. 3 is a schematic diagram of a display page in a text processing method according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present application;

FIG. 5 is a block diagram of a text processing system according to an embodiment of the present application;

fig. 6 is a block diagram of a computing device according to an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments of the present application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present application.

First, the noun terms to which one or more embodiments of the present invention relate are explained.

Entity (Entity): the abstract of an objective individual generally refers to an entity with special meaning or strong reference in text, and generally comprises a name of a person, a name of a place, a name of an organization, a date and time, a proper noun and the like.

Relationship (relationship): a relationship is an abstraction of an entity-to-entity relationship, being some kind of association between two or more entities; for example, the relationship expressed in the sentence "beijing is capital, political center, and cultural center of china" can be expressed as (china, capital, beijing), (china, political center, beijing), and (china, cultural center, beijing).

Graph database: the method is characterized in that the method is a database for gathering knowledge in a specific field, the data building form is a non-structural natural language, and the knowledge is formalized and simplified in a triple expression mode for convenience of computer processing and understanding; the triplets in the database are (entity, entity relationship, entity) entities.

Named entity recognition: named Entity Recognition (NER), refers to recognizing entities in text that have a particular meaning.

BERT model: (bidirectional encoder recurrents from Transformer), characterized by a transform-based bi-directional encoder, and the root of the BERT model is the transform, and is derived from the interpretation is all you need. Wherein the bidirectional meaning means that when processing a word, it can take into account the information of the words before and after the word, thereby obtaining the semantic meaning of the context.

Contract: the protocol is an agreement for establishing, changing and terminating the civil legal relationship between the civil subjects (the persons participating in the civil legal relationship and enjoying the right and undertaking the obligation, namely the parties of the civil legal relationship).

Knowledge map (Knowledge Graph) is a series of different graphs displaying Knowledge development process and structure relationship in the book intelligence field, describing Knowledge resources and carriers thereof by using visualization technology, mining, analyzing, constructing, drawing and displaying Knowledge and mutual relation between Knowledge resources and Knowledge carriers.

In the present application, a text processing method is provided. The present application relates to a text processing apparatus, a text processing system, a computing device, and a computer readable storage medium, which are described in detail in the following embodiments one by one.

Fig. 1 shows a flowchart of a text processing method according to an embodiment of the present application, which specifically includes the following steps:

step S102, obtaining a question text submitted by a user aiming at the target field.

Specifically, the user refers to a user with a query requirement, the target field refers to a field to which the content that the user needs to query belongs, and if the user queries computer knowledge, the target field can be an electrical field; if the user inquires about the health instruction, the target field may be a medical field; in addition, the target field may also be a field to which business personnel need to query business related information, for example, if business personnel query contract information of a certain company, the target field may be a contract field; if business personnel inquire personnel information of a certain company, the target field can be a personnel field; that is, the target domain has different definitions in different query scenarios. Correspondingly, the question text specifically refers to a text corresponding to the content to be queried, which is submitted by the user.

In specific implementation, after a user submits a question text aiming at a target field, if the question text is simple, and if the question text is a first-degree question, the answer text can be directly extracted from a knowledge base and fed back to the user; however, if the question text is complex, if the question text is a multi-degree question, the answer text can be extracted from the knowledge base only by skipping for multiple times according to the degree of the question, and in the process, the result after each skipping needs to be recorded, and the result after skipping needs to be recorded in an extra storage space, so that the efficiency of extracting the answer text is greatly influenced, and quick feedback to the user cannot be realized.

In view of this, according to the text processing method provided by the application, the query path of the question text can be extracted without paying attention to the degree relation of the question text, in order to ensure the accuracy of the extraction of the answer text, a limiting condition is established according to the entity and the relation in the question text, and the mapping range of the query path is reduced through the limiting condition, so that the accuracy of the extraction of the answer text is further ensured, and quick and accurate feedback for the question of the user is realized, so that the query experience of the user is improved.

In this embodiment, the target field is used as a contract field to describe the text processing method, the question text submitted by the user refers to a text related to the contract information that needs to be queried, and it should be noted that text processing processes in other fields can refer to corresponding description contents of this embodiment, which is not described in detail herein.

And step S104, extracting the entity and the relation in the question text, and creating a query path according to the entity and the relation.

Specifically, on the basis of obtaining the question text uploaded by the user in the target field, in order to improve the query efficiency and avoid the record burden caused by multi-hop, a query path for querying the question answer may be created for the question text, where the query path specifically refers to a path for querying the question answer for the question text in the graph database.

The entities refer to abstract individuals in the question text, and it should be noted that different target fields correspond to different graph databases, so that different entities will be provided in different fields, for example, in the contract field, an entity included in a question may be a user name, a company name, or a contract name, a contract may be an entity, and in the medical field, a disease may be an entity. Correspondingly, the relationship refers to the entity in the question text and the relationship between the entities.

In specific implementation, the entities and the relations which can be extracted from the question text are entities and relations which have corresponding relations in the graph database of the target field, and the storage form of data in the graph database is nodes and relations, so that the entities or the relations mapped by the entities and the relations can be determined in the graph database, and the mapped entities and relations can be used for determining the answer text of the question text.

Based on this, after the question text is obtained, the entity and the relationship can be extracted by processing the question text, and then the query path is created based on the entity and the relationship, so that the answer text corresponding to the question text can be extracted from the graph database corresponding to the target field. In practical application, in order to conveniently query answers in the graph database through the query path, the entity and the relationship which are set and extracted have the same expression form as the entity and the relationship in the graph database, so that a foundation is laid for the follow-up answer extraction; in addition, before the answer is extracted, the entity and the relationship in the query path may be converted to obtain a statement meeting the query condition of the graph database, and the answer is queried.

Further, if the answer text corresponding to the question text is to be accurately queried, the accuracy of extracting the entity and the relationship needs to be ensured, so that the text processing method provided by the application can provide a better query service for a user, and the relationship and the entity are extracted in a preset relationship set manner in the process of querying the answer of the question text, and in this embodiment, the specific implementation manner is as follows:

Specifically, the preset relationship set refers to a set formed by all relationships that can be related in the target field, and it should be noted that, because the information related in the target field is relatively single, the relationships in the relationship set can be obtained by an exhaustion method; correspondingly, one mode of carrying out standardized processing on the problem text specifically refers to removing paragraph symbols in the problem text, so that word segmentation processing can be conveniently carried out subsequently; the word unit specifically refers to an element obtained after word segmentation processing is performed on the target problem text, and the reference relation included in the relation set specifically refers to a relation which is established and stored in advance; wherein the reference relationship specifically refers to a relationship that can be related to the target field, and the reference relationship exists in the graph database; in addition, the target fields are related to a plurality of relationships, so that the target fields can be obtained in an exhaustive mode.

Based on this, after the problem text is obtained, the problem text can be subjected to standardization processing according to a preset relation set in the target field, specifically, words of the problem text are traversed, words which do not belong to the relation set in the problem text are removed, and the target problem text can be obtained according to a processing result; and then performing word segmentation processing on the target problem text to obtain a plurality of word units, matching each word unit with a reference relation in a relation set, selecting the word unit matched with the reference relation contained in the relation set as the relation, and finally extracting an entity from the problem text based on the relation.

In practical application, when an entity is extracted from the question text based on the relationship, the extraction operation of the entity can be completed as follows: traversing from the second word of the target question text, and upon encountering the relationship, selecting the entity with the closest relationship as the entity.

For example, a user needs to query the rental contracts signed by company a and company B in 2019, and the problem text uploaded by the user is acquired as "how many contracts related to the rental relationship were signed by company a and company B in 2019? "; based on this, after the problem text uploaded by the user is determined, the problem text can be standardized according to a preset relation set { signing relation, responsibility relation and indemnity relation … } in the contract field, the target problem text is obtained to be a leasing contract signed by company A and company B in 2019, then word segmentation processing is carried out on the target problem text, and word units are obtained to include {2019, company A, and company B, signing relation, leasing and contract }.

Further, matching the word units with the reference relationship { signing relationship, responsibility relationship, indemnity relationship … } contained in the relationship set, determining that the "signing" in the word units matches with the reference relationship "signing relationship", determining that the "signing" is the relationship of the question text, and then extracting entities "company a" and "company B" from the question text based on the relationship for subsequently querying answers to the question text.

In conclusion, by standardizing the problem text, the relationship and the entity can be accurately extracted from the problem text, so that preparation is made for subsequently creating a query path, and the accuracy of subsequently determining the problem text is improved.

In addition, the operation of extracting the entities and the relationships from the problem text may also adopt weak supervised relationship extraction, unsupervised relationship extraction or fuzzy supervised relationship extraction, and a specific extraction mode may be selected according to an actual application scenario, which is not limited in this embodiment.

Furthermore, since the relationship extracted from the question text and the entity may include a plurality of relationships, a plurality of query paths may be created when creating the query path, if an answer is queried based on the plurality of query paths, a question with an excessive number of answers may occur, that is, the accuracy of query answers is low, and in order to improve the accuracy of determining subsequent answers, the query path may be created by combining a semantic analysis model, in this embodiment, the specific implementation manner is as follows:

creating the query path based on the target entity and the target relationship.

Specifically, the question feature specifically refers to an attribute feature corresponding to the question text, the question feature is constructed by the entity and the relationship and serves as an input feature of the semantic recognition model, and an expression form of the question feature may be a vector form; correspondingly, the intention label is a label expressing the intention of the question text, and the intention label can determine the intention of the user to put forward the question text; the target entity is specifically an entity with a higher matching degree with the intention label in the entities, and the target relationship is specifically a relationship with a higher matching degree with the intention label in the relationships; the semantic recognition model may use a pre-trained BERT model, or may select a model with another structure according to actual business requirements, which is not limited in this embodiment.

Based on the method, firstly, problem features corresponding to the problem texts are constructed according to the entities and the relations, secondly, the problem features are input into a semantic recognition model to be processed, intention labels corresponding to the problem texts are obtained, secondly, the target entities are extracted from the entities according to the intention labels, and the target relations are extracted from the relations according to the intention labels, and finally, the query path can be constructed based on the target entities and the target relations.

In practical application, the determination of the target entity and the target relationship may be implemented by calculating a matching degree, that is, calculating a matching degree between the intention tag and each entity, selecting an entity with a highest matching degree as the target entity, calculating a matching degree between the intention tag and each relationship, and selecting a relationship with a highest matching degree as the target relationship.

Further, a query path constructed according to the target relationship and the target entity is a path for querying an answer, so that the query path can be characterized by the target relationship and the target entity, specificallyThe expression form may be a tail entity expression form<？x><Relationships between><Entity>Or in the form of head entity expression<Entity><Relationships between><？x>(ii) a Or Query path ═ Relationship_direction,Entity _attribute](ii) a Or { entity. attribute/relationship. direction/? Html orx.doc orx.jpg }; in practical application, the specific expression form of the query path may be determined according to the content of the question text, and this embodiment is not limited in any way here.

For example, in the question text "how many contracts a company a and a company B signed about a rental relationship in 2019? The extracted entities comprise { company A, company B and lease contract }, the relationship comprises { signing relationship and lease relationship }, problem feature features _ S corresponding to the problem text can be constructed according to the entities and the relationship, then the problem feature features _ S is input into a semantic recognition model to be processed, an intention label corresponding to the problem text is obtained as label _ S, the intention label _ S represents a query contract, and the intention of the user can be determined by combining the intention label and the problem text to query the contract signed between company A and company B.

Based on the above, the matching degree between each entity and the intention label _ S is calculated, the highest matching degree between the entities "a company" and "B company" and the intention label is determined, then the "a company" and "B company" are determined as target entities, then the matching degree between each relationship and the intention label _ S is calculated, the highest matching degree between the relationship "signing relationship" and the intention label is determined, then the "signing" is determined as a target relationship, and finally a query path constructed based on the target entities { a company, B company } and the target relationship { signing } is < a company > < signing > <? x > for subsequent query of answers corresponding to the question text.

In summary, in order to meet the requirement of accurately querying answer texts corresponding to the question texts subsequently, the intention of the question texts is determined by combining semantic recognition, then the target entities and the target relationships are extracted according to the recognized intention labels, and finally, query paths with high query accuracy can be constructed by combining the target entities and the target relationships, so that the answer texts meeting the query requirements of users can be accurately determined subsequently.

Step S106, determining the path structure information corresponding to the query path, and determining the limiting condition corresponding to the path structure information according to the entity and the relationship.

Specifically, on the basis of the above-mentioned establishment of the query path based on the entity and the relationship, further, in order to improve the accuracy of querying an answer text, a limiting condition with an auxiliary effect may be determined by combining the entity and the relationship again, so as to reduce the mapping range of the query path by the limiting condition and improve the accuracy of querying the answer text.

Based on this, the path structure information specifically refers to structure information corresponding to the query path, and node information and relationship information in the query path can be determined through the path structure information; correspondingly, the limiting condition is specifically a condition for assisting in reducing the query path mapping translation, that is, the limiting condition can reduce the effect of the number of answer texts in subsequent queries.

Further, since the query path is a basis for the answer text of the subsequent query, controlling the mapping range of the query path is the root for determining the number of the answer texts, and the query path is constructed by the path nodes and the path relationships, so the essence of the limiting condition is a condition for limiting the nodes and the relationships in the query path, and before determining the limiting condition, the structural information of the query path needs to be determined, and in this embodiment, the specific implementation manner is as follows:

Specifically, the path node specifically refers to a node constructed by an entity or a relationship in the query path, and the path relationship specifically refers to a relationship constructed by a relationship in the query path; correspondingly, the path structure information is information for integrating the path node and the path relationship.

Further, after determining the path structure information, a limiting condition corresponding to the path structure information may be determined according to the entity and the relationship, so as to be used for a subsequent auxiliary query path to query the answer text, in this embodiment, a specific implementation manner is as follows:

Specifically, the conditional entity is an entity capable of performing a restriction function on the mapping range of the query path, and the conditional relationship is a relationship capable of performing a restriction function on the mapping range of the query path.

Based on this, since the query path is generated by constructing the path node and the path relationship, and the path node and the path relationship that create the query path are the target entity and the target relationship extracted from the entity and the relationship, when determining the constraint condition, the condition entity may be extracted in the entity in combination with the path structure information, and the condition relationship may be extracted in the relationship according to the path structure information.

If a conditional entity is extracted from the entity based on the path structure information, a limiting condition corresponding to the path structure information can be generated according to the conditional entity; if a conditional relationship is extracted from the relationship based on the path structure information, a limiting condition corresponding to the path structure information can be generated according to the conditional relationship, and finally, the subsequent answer query operation can be completed by combining the limiting condition and the query path.

Further, in the process of generating the limiting condition, since the limiting condition is a condition for narrowing the query path mapping range, different strategies may be adopted to create the limiting condition for different problem types, and in this embodiment, a specific implementation manner is as follows:

Specifically, the entity question type refers to a type that an answer to the question text is an entity in a graph database, and the relationship question type refers to a type that an answer to the question text is a relationship in a graph database; on the basis, the question text is input into the text recognition model to be processed, the question type output by the text recognition model is obtained, and under the condition that the question type is an entity question type, the answer corresponding to the question text is an entity in a graph database, and at the moment, the limiting condition corresponding to the path structure information can be generated according to the condition entity to reduce the mapping range of the query path; when the question type is a conditional question type, it is described that the answer corresponding to the question text is a relationship in a graph database, and at this time, the limiting condition corresponding to the path structure information may be generated according to the conditional relationship, so as to reduce the mapping range of the query path.

In addition, if there are more entities and relations extracted from the question text, a conditional entity and a conditional relation may be extracted at the same time, and the limiting condition is determined by combining the conditional entity and the conditional relation, so as to improve the accuracy of the subsequent query answer text.

For example, the query path created from entities and relationships is < a corporation > < sign > <? x >, analyzing the query path to determine that the path nodes are 'company A' and 'company B', and the path relation is 'signing'; then extracting a condition entity in the entity according to the path node, wherein the condition entity is '2019', and extracting a condition relation in the relation according to the path relation, wherein the condition relation is 'indemnity relation'; then, inputting the question text into a text recognition model for processing, and if the question type of the obtained question text is an entity question type, selecting a conditional entity to create a limiting condition, namely creating a query path according to the conditional entity '2019' and having a query path < a company > < subscription > <? x > is a time constraint for subsequent auxiliary query answer text.

In summary, by creating the limiting conditions for different question types, the mapping range of the query path can be effectively limited, so that the fitness between the answer text of the subsequent query and the question text is ensured, and the answer text with higher accuracy is fed back to the user.

Step S108, determining answer texts corresponding to the question texts in a graph database corresponding to the target field based on the limiting conditions and the query path, and feeding back the answer texts to the user.

Specifically, on the basis of obtaining the limiting conditions, the query path may be assisted by the limiting conditions to determine the answer text, so that the number of answer texts may be reduced, and the accuracy of determining the answer text may be ensured, thereby improving the query experience of the user.

In practical application, since a question text of a user is submitted for a target field, the graph database needs to be constructed based on target data of the target field, and in order to improve query efficiency of subsequent query answers, the graph database may be constructed in a triple manner, and in this embodiment, a specific implementation manner is as follows:

acquiring target data corresponding to the target field;

Specifically, the target data specifically refers to all data related to the target field, and in the case that the target field is a contract field, the target data may be data related to a contract, such as a type of the contract, clause data, first party data, second party data, compensation data, and the like; the triples are specifically elements constructed by entities and relations, the graph database is composed of a large number of triples, and the number of the triples is determined by the target data.

In addition, the graph database can be expanded according to actual requirements, namely after new target data are generated, the new target data can be uploaded through an interface arranged on the graph database, and then a system bearing the graph database processes the new target data to obtain new entities and/or relationships; at this time, an entity or a relationship associated with the new entity and/or relationship may be detected in the graph database, and then the new entity and/or relationship may be connected to the entity or relationship in the graph database, so as to obtain the expanded graph database. In practical applications, the graph database may be built by using a Neo4j structure, and the present embodiment is not limited in any way.

Further, in a case that the question type is an entity question type, it is described that an answer corresponding to the question text is an entity in a graph database, and an entity determined in the graph database according to the limiting condition and the query path may be used as the answer, in this embodiment, a specific implementation manner is as follows:

For example, the question text is "how many contracts about rental relationships were signed by company a and company B in 2019? "then, by processing the question text, it is determined that the limiting condition corresponding to the question text is" 2019 ", and the query path is < a company > < subscription > < lease > <? x >, if the target entity determined in the graph database according to the query path and the limiting conditions has a printer rental contract, a computer rental contract, and a workstation rental contract, then the answer text of the question text is determined according to the target entity { printer rental contract, computer rental contract, and workstation rental contract }? And x is equal to 3 leasing contracts, namely a printer leasing contract, a computer leasing contract and a station leasing contract, and the answer text is fed back to the user.

Further, when the question type is a relational question type, it is described that an answer corresponding to the question text is a relation in a graph database, and a relation determined in the graph database according to the limiting condition and the query path may be used as the answer, in this embodiment, a specific implementation manner is as follows:

For example, the question text is "what is the default fund agreed in the printer rental contracts that company a and company B signed in 2019? "; by processing the question text, it is determined that the limiting condition corresponding to the question text is "2019" and the query path is < a company > < sign-on > < printer > < lease > < default gold > <? x >, the target relationship determined in the graph database according to the query path and the limiting condition is 10000 yuan, then the answer text of the question text can be determined according to the target relationship? And x is 10000 yuan, and the answer text is fed back to the user.

In conclusion, the answer texts are determined by combining the limiting conditions, so that the query precision can be improved, the number of the determined answer texts can be reduced, and the answer texts meeting the query requirements can be fed back to the user.

In addition, in the process of determining the answer text according to the limiting condition and the query path, since the limiting condition is a condition that reduces the mapping range of the query path, the query path may be updated according to the limiting condition in specific implementation, and in this embodiment, a specific implementation manner is as follows:

Specifically, the target query path specifically refers to a new query path generated after the limitation condition is added to the query path, and based on this, after the limitation condition is determined, the query path may be updated according to the limitation condition to obtain the target query path; and finally, determining the answer text corresponding to the question text in the graph database based on the target query path.

For example, the constraint is "2019", and the query path is < a company > < subscription > < lease > <? x >, when the query path < a company > < subscription > < lease > <? x > is updated, and the target query path is acquired as < A company > <2019 > < subscription > < lease > <? x >, and finally, determining the answer text based on the target query path.

The text processing method is further described below with reference to fig. 2 by taking an application of the text processing method provided by the present application in a contract retrieval scenario as an example. Fig. 2 shows a processing flow chart of a text processing method applied to a contract retrieval scenario according to an embodiment of the present application, which specifically includes the following steps:

step S202, a question text submitted by a user is obtained.

In the present embodiment, a text processing method is used for an example of a contract retrieval scenario, and correspondingly, the graph database corresponding to the target field specifically refers to a graph database composed of data related to all contracts signed by company a and company B from the beginning of cooperation to the current time, in the graph database, company a, company B and various types of contracts are entities, and other information related to contracts, and information related to company a and company B are relationships and attributes of the entities.

Based on this, when the user a in the company a needs to inquire about the contracts that the company a and the company B signed in the first half of 2019, the user a can upload the question text "a few contracts that the company a and the company B signed in the first half of 2019 through the contract retrieval system? ".

And step S204, extracting entities and relations in the question text.

Specifically, a few contracts were signed when the question text "company a and company B were in the first half of 2019? After that, in order to improve the response accuracy of the question text, the question text is standardized according to a preset relation set, and the obtained question text is that "a company a and a company B sign several contracts in the last half of 2019".

Based on the above, the word segmentation processing is performed on the target problem text, a plurality of word units {2019, last half year, company a, company B, sign, several copies, contract } are obtained, then the word units are matched with the remembering relations included in the relation set, after the word units matched with the relation set are encountered, the word unit closest to the left and the back of the word unit can be selected as an entity, the word units matched with the relation set are taken as the relation, namely the entity is company a and company B, and the relation is the sign relation and the time relation.

And S206, constructing question features corresponding to the question texts based on the entities and the relations, inputting the question features into the semantic recognition model for processing, and obtaining the intention labels corresponding to the question texts.

Specifically, on the basis of obtaining the entity companies A and B, and the relationship between the entity companies A and B and the relationship between the entity companies A and.

Based on the method, problem features are constructed based on entities { A company and B company } and relations { signing relation and time relation }, then the problem features are input into a pre-trained semantic recognition model for semantic recognition processing, an Intention label corresponding to a problem text is obtained and is an intent label _1, and at the moment, the Intention of a user is determined to be contract information between the company A and the company B.

Step S208, extracting target entities from the entities according to the intention labels, and extracting target relations from the relations.

Step S210, a query path is created based on the target entity and the target relationship.

Specifically, after the intent tag intent label _1 is obtained, in order to create a query path for quickly querying an answer text corresponding to a question text, a target entity may be extracted from the entity according to the intent tag, a target relationship may be extracted from the relationship, and finally, a query path may be created according to the target entity and the target relationship for subsequent query of the answer text.

Based on this, the target entity { company, company B } is extracted from the entity { company, company B } according to the Intention tag integration label _1, and the target relationship { signing relationship } is extracted from the relationship { signing relationship, time relationship } according to the Intention tag integration label _1, and then the target entity and the target relationship are spliced according to the Intention of the user "query contract information between company a and company B", and the query path is < company > < signing relationship > < x? And > < signing relation > < company B > for subsequently inquiring answer texts corresponding to the question texts.

Step S212, determining the path structure information of the query path, and generating a limiting condition corresponding to the path structure information according to the entity and the relationship.

Specifically, the query path is < a corporation > < subscription > < x? < signing relationship > < company B >, the entity is { company a, company B }, the relationship is { signing relationship, time relationship }; in order to improve the accuracy of answering to the question text, the range of the answer may be reduced by a limiting condition.

Based on this, since the entities company a, company B, and relationship signing relationship are used to create the query path < company a > < signing relationship > < x? And therefore, the relation-time relation can be determined as a conditional relation, and corresponding limiting conditions {2019, the first half year of the year } can be generated for the path nodes and the path relation in the query path according to the conditional relation.

Step S214, generating a target inquiry path based on the limiting condition and the inquiry path.

Specifically, determine the query path as < a corporation > < subscription > < x? And > < signing relationship > , the limiting condition is { the first half year of 2019 }, at this time, the query path and the limiting condition can be combined to generate a target query path carrying the limiting condition < a company > < signing relationship > <2019 year > <1-6 months > < x? < signing relationship > <1-6 months > <2019 > < company B > for the subsequent query answer text.

Step S216, according to the target query path, determining an answer text corresponding to the question text in a preset graph database, and feeding back the answer text to the user.

Specifically, in determining the target query path < a company > < signing relationship > <2019 > <1-6 months > < x? After >.

Based on the method, an answer text is generated according to the inquired office appliance purchase contract and the computer purchase contract, and the answer text is { office appliance purchase contract, Party A: company a, party b: company B, time of subscription: 2019-01-15, office appliances: 500 pens, 200 books, contract clauses …, and { computer procurement contract, Party A: company a, party b: company B, time of subscription: 2019-01-15, computer: the S brand notebook 50 and the contract terms … }, so that the user can view the answer text conveniently, the obtained answer text can be added to a preset display template and then displayed to the user, and the display content is a display page shown in fig. 3.

In summary, in the process of determining the answer text for the question text, the query efficiency of determining the answer text is improved through the query path, and the accuracy of determining the answer text is improved through the limiting conditions, so that the answer text can be fed back to the user quickly and accurately.

Corresponding to the above method embodiment, the present application further provides a text processing apparatus embodiment, and fig. 4 shows a schematic structural diagram of a text processing apparatus provided in an embodiment of the present application. As shown in fig. 4, the apparatus includes:

an obtaining module 402 configured to obtain a question text submitted by a user for a target field;

a creating module 404 configured to extract entities and relations in the question text and create a query path according to the entities and relations;

a determining module 406, configured to determine path structure information corresponding to the query path, and determine a limiting condition corresponding to the path structure information according to the entity and the relationship;

a feedback module 408 configured to determine answer texts corresponding to the question texts in the graph database corresponding to the target field based on the limiting conditions and the query path, and perform feedback to the user.

In an optional embodiment, the creating module 404 is further configured to:

standardizing the problem text according to a preset relation set in the target field to obtain a target problem text; performing word segmentation processing on the target problem text to obtain a plurality of word units, and matching the plurality of word units with a reference relation contained in the relation set; and determining the relation according to the matching result, and extracting the entity in the question text based on the relation.

In an optional embodiment, the creating module 404 is further configured to:

constructing question features corresponding to the question texts based on the entities and the relations; inputting the question features into a semantic recognition model for processing to obtain an intention label corresponding to the question text; extracting target entities from the entities according to the intention tags, and extracting target relationships from the relationships; creating the query path based on the target entity and the target relationship.

In an optional embodiment, the determining module 406 is further configured to:

analyzing the query path to obtain path nodes and path relations in the query path; and determining the path structure information according to the path node and the path relation.

In an optional embodiment, the determining module 406 is further configured to:

extracting conditional entities in the entities based on the path structure information and conditional relationships in the relationships; and generating the limiting condition corresponding to the path structure information according to the condition entity or the condition relation.

In an optional embodiment, the determining module 406 is further configured to:

inputting the question text into a text recognition model for processing to obtain a question type corresponding to the question text; generating the limiting condition corresponding to the path structure information according to the condition entity under the condition that the problem type is an entity problem type; and generating the limiting condition corresponding to the path structure information according to the condition relation when the problem type is a relation problem type.

In an optional embodiment, the feedback module 408 is further configured to:

determining a target entity in the graph database according to the limiting condition and the query path under the condition that the problem type is an entity problem type; and generating the answer text corresponding to the question text according to the target entity.

In an optional embodiment, the feedback module 408 is further configured to:

determining a target relationship in the graph database according to the limiting condition and the query path under the condition that the problem type is a relationship problem type; and generating the answer text corresponding to the question text according to the target relation.

In an optional embodiment, the feedback module 408 is further configured to:

updating the query path according to the limiting condition to obtain a target query path; and determining the answer text corresponding to the question text in the graph database based on the target query path.

In an alternative embodiment, the graph database is created by:

acquiring target data corresponding to the target field;

In the text processing apparatus provided in this embodiment, after a question text submitted by a user in a target field is acquired, an entity and a relationship in the question text are extracted, a query path is created based on the question text and the relationship, at this time, an initial answer text corresponding to the question text can be preliminarily determined according to the query path, in order to improve accuracy of querying the answer text corresponding to the question text, path structure information corresponding to the query path is determined, a limiting condition corresponding to the path structure information is determined according to the entity and the relationship, finally, the answer text corresponding to the question text is extracted from a graph database corresponding to the target field according to the limiting condition and the query path, and is fed back to the user, so that in a process of determining the answer text for the question text is realized, the query efficiency of determining the answer text is improved through the query path, and the accuracy of determining the answer text is improved through the limiting conditions, so that the answer text can be fed back to the user quickly and accurately.

The above is a schematic scheme of a text processing apparatus of the present embodiment. It should be noted that the technical solution of the text processing apparatus and the technical solution of the text processing method belong to the same concept, and details that are not described in detail in the technical solution of the text processing apparatus can be referred to the description of the technical solution of the text processing method. Further, the components in the device embodiment should be understood as functional blocks that must be created to implement the steps of the program flow or the steps of the method, and each functional block is not actually divided or separately defined. The device claims defined by such a set of functional modules are to be understood as a functional module framework for implementing the solution mainly by means of a computer program as described in the specification, and not as a physical device for implementing the solution mainly by means of hardware.

Corresponding to the above method embodiment, the present application further provides a text processing system embodiment, and fig. 5 shows a schematic structural diagram of a text processing system provided in an embodiment of the present application. As shown in FIG. 5, text processing system 500 includes a client 502 and a server 504;

the client 502 is configured to receive a question text uploaded by a user and a domain selection instruction submitted aiming at the question text; sending the domain selection instruction and the question text to the server 504;

the server 504 is configured to extract the entities and the relations in the question text, and create a query path according to the entities and the relations; determining path structure information corresponding to the query path, and determining a limiting condition corresponding to the path structure information according to the entity and the relationship; determining a target field corresponding to the field selection instruction, determining an answer text corresponding to the question text in a graph database corresponding to the target field based on the limiting condition and the query path, and sending the answer text to the client 502;

the client 502 is further configured to create a feedback interface corresponding to the question text according to the answer text, and display the feedback interface to the user.

Specifically, the client 502 specifically refers to a terminal device held by a user, and the terminal device may provide a function of querying an answer to a question text to the user, where the function may be loaded with a web page or an application program; in specific implementation, the client includes, but is not limited to, a computer, a mobile phone, a reader, and other devices. Correspondingly, the server 504 is specifically the one that performs query processing according to the request of the client, that is, according to the instruction and the question text uploaded by the client, the answer can be retrieved for the question text in the graph database in the field corresponding to the instruction.

In practical application, after the feedback interface is displayed through the client 502, the user can check the answer corresponding to the question text through the feedback interface; when the user needs to check the specific content of the answer text, the files or resources related to the answer text can be downloaded in a downloading mode. For example, in the contract retrieval scenario, after the contract retrieval result is fed back according to the question text submitted by the user, the user can download the contract copy related in the contract retrieval result in a downloading manner. It should be noted that, since the files or resources related to the answer text may be important relative to the target field, the user may also be authenticated before the user downloads or before the interface is displayed, and the relevant answer text can be downloaded or displayed if the user passes the authentication.

creating the query path based on the target entity and the target relationship.

Optionally, the graph database is created by:

acquiring target data corresponding to the target field;

The text processing system provided by the embodiment can improve the query efficiency of determining the answer text through the query path and improve the accuracy of determining the answer text through the limiting condition in the process of determining the answer text for the question text, so that the answer text can be fed back to the user quickly and accurately.

The above is a schematic scheme of a text processing system of the present embodiment. It should be noted that the technical solution of the text processing system and the technical solution of the text processing method belong to the same concept, and details that are not described in detail in the technical solution of the text processing system can be referred to the description of the technical solution of the text processing method.

Fig. 6 illustrates a block diagram of a computing device 600 provided according to an embodiment of the present application. The components of the computing device 600 include, but are not limited to, a memory 610 and a processor 620. The processor 620 is coupled to the memory 610 via a bus 630 and a database 650 is used to store data.

Computing device 600 also includes access device 640, access device 640 enabling computing device 600 to communicate via one or more networks 660. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 640 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present application, the above-described components of computing device 600, as well as other components not shown in FIG. 6, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 6 is for purposes of example only and is not limiting as to the scope of the present application. Those skilled in the art may add or replace other components as desired.

Computing device 600 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 600 may also be a mobile or stationary server.

Wherein processor 620 is configured to execute the following computer-executable instructions:

acquiring a problem text submitted by a user aiming at a target field;

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the text processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the text processing method.

An embodiment of the present application further provides a computer-readable storage medium storing computer instructions that, when executed by a processor, are configured to:

acquiring a problem text submitted by a user aiming at a target field;

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the text processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the text processing method.

The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and its practical applications, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims

1. A method of text processing, comprising:

acquiring a problem text submitted by a user aiming at a target field;

2. The text processing method of claim 1, wherein the extracting the entities and the relationships in the question text comprises:

preprocessing the problem text according to a preset relation set in the target field to obtain a target problem text;

performing word segmentation processing on the target problem text to obtain a plurality of word units, and matching the plurality of word units with reference relations contained in a relation set;

3. The text processing method of claim 1, wherein the creating a query path from the entity and the relationship comprises:

creating the query path based on the target entity and the target relationship.

4. The text processing method according to claim 1, wherein the determining the path structure information corresponding to the query path comprises:

5. The text processing method according to claim 1, wherein the determining the constraint condition corresponding to the path structure information according to the entity and the relationship comprises:

6. The text processing method according to claim 5, wherein the generating the constraint condition corresponding to the path structure information according to the condition entity or the condition relationship comprises:

7. The method according to claim 5, wherein determining answer text corresponding to the question text in a graph database corresponding to the target domain based on the limiting conditions and the query path comprises:

8. The method according to claim 5, wherein determining answer text corresponding to the question text in a graph database corresponding to the target domain based on the limiting conditions and the query path comprises:

9. The method according to claim 1, wherein determining answer text corresponding to the question text in a graph database corresponding to the target domain based on the limiting conditions and the query path comprises:

10. The text processing method of claim 1, wherein the graph database is created by:

acquiring target data corresponding to the target field;

11. A text processing apparatus, comprising:

12. A text processing system, comprising:

a client and a server;

13. A computing device, comprising:

a memory and a processor;

the memory is used for storing computer-executable instructions, and the processor is used for executing the computer-executable instructions to realize the steps of the text processing method in any one of claims 1 to 10.

14. A computer-readable storage medium storing computer instructions, which when executed by a processor, implement the steps of the text processing method of any one of claims 1 to 10.