CN110069631B - Text processing method and device and related equipment - Google Patents

Text processing method and device and related equipment Download PDF

Info

Publication number
CN110069631B
CN110069631B CN201910277438.7A CN201910277438A CN110069631B CN 110069631 B CN110069631 B CN 110069631B CN 201910277438 A CN201910277438 A CN 201910277438A CN 110069631 B CN110069631 B CN 110069631B
Authority
CN
China
Prior art keywords
target
entity
character string
vector
string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910277438.7A
Other languages
Chinese (zh)
Other versions
CN110069631A (en
Inventor
陈曦
赖盛章
曹行
张淳
乔倩倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910277438.7A priority Critical patent/CN110069631B/en
Publication of CN110069631A publication Critical patent/CN110069631A/en
Application granted granted Critical
Publication of CN110069631B publication Critical patent/CN110069631B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The embodiment of the invention discloses a text processing method, a text processing device and related equipment, wherein the method comprises the following steps: acquiring a target text and acquiring a knowledge graph; the knowledge graph comprises a plurality of entity character strings and a service attribute character string corresponding to each entity character string; searching a target entity character string matched with the target text in the entity character strings, and extracting a target service attribute character string corresponding to the target entity character string; identifying a target intention type matched with the target text according to the target entity character string and the target service attribute character string; and determining a target intention character string associated with the target intention type from the target entity character string, and generating recommended service data according to the target intention character string. By adopting the invention, the efficiency of acquiring the service data can be improved.

Description

Text processing method and device and related equipment
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a text processing method and apparatus, and a related device.
Background
With the rapid development of information technology, users can realize remote registration on the internet without going out. However, for most users, which department they should hang, what examination they should do, etc. are often determined by their past experiences or by their symptoms to inquire about relevant information. After self-diagnosis, the user finds out the corresponding hospital and department registration appointment on the Internet.
However, due to the non-profession nature of the user during self-diagnosis, the registered department is not matched with the symptoms of the user, so that the user needs to perform registration operation for many times and can find the department matched with the symptoms most after seeing the doctor for many times, and the efficiency of finding the correct department by the user is further reduced.
Disclosure of Invention
The embodiment of the invention provides a text processing method, a text processing device and related equipment, which can improve the efficiency of acquiring service data.
An embodiment of the present invention provides a text processing method, including:
acquiring a target text and acquiring a knowledge graph; the knowledge graph comprises a plurality of entity character strings and a service attribute character string corresponding to each entity character string;
searching a target entity character string matched with the target text in the entity character strings, and extracting a target service attribute character string corresponding to the target entity character string;
identifying a target intention type matched with the target text according to the target entity character string and the target service attribute character string;
and determining a target intention character string associated with the target intention type from the target entity character string, and generating recommended service data according to the target intention character string.
Wherein the searching for the target entity string matching the target text from the plurality of entity strings comprises:
extracting target keywords from the target text;
mapping the target key words into map labeling entity character strings;
and searching the entity character strings which are the same as the map labeling entity character strings from the entity character strings to serve as target entity character strings matched with the target text.
Wherein, the extracting the target key words from the target text comprises:
dividing the target text into a plurality of target unit characters, and converting each target unit character into a target unit character vector;
based on a coding layer in a first cyclic neural network model, performing bidirectional cyclic coding on a plurality of target unit character vectors to obtain a forward coding matrix and a reverse coding matrix;
splicing the forward coding matrix and the reverse coding matrix into a hidden state matrix;
performing sequence labeling on the hidden state matrix based on the conditional random field in the first recurrent neural network, and determining part-of-speech tags corresponding to each target unit character respectively;
and determining the target keyword according to the part-of-speech tag corresponding to each target unit character.
Wherein the mapping the target keyword to a graph labeling entity character string comprises:
dividing the target keyword into a plurality of key unit characters, and converting each key unit character into a key unit character vector;
coding a plurality of key unit character vectors based on a coding layer in a second recurrent neural network model to obtain a context vector of the target keyword;
decoding the context vector based on a decoding layer in the second recurrent neural network model to obtain a hidden state vector of the context vector;
and identifying the hidden state vector to obtain a character sequence corresponding to the hidden state vector, and determining the character sequence as the map labeling entity character string.
Identifying a target intention type matched with the target text according to the target entity character string and the target service attribute character string, wherein the identifying comprises:
converting the target entity character string into a target entity word vector, and converting the target service attribute character string into a target service attribute word vector;
combining the target entity word vector and the target service attribute word vector into an input vector, and performing convolution and pooling on the input vector based on a convolution layer and a pooling layer in an intention identification model to obtain an intention feature vector of the target text;
based on a classifier in the intention recognition model, matching probabilities between the intention feature vector and a plurality of intention types in the classifier are recognized, and an intention type corresponding to the maximum matching probability is determined as the target intention type from the plurality of matching probabilities.
Wherein the converting the target entity character string into a target entity word vector comprises:
searching a unique hot code corresponding to the target entity character string from the entity word bag to serve as a first vector; the entity word bag comprises the entity character strings in the knowledge graph and unique hot codes respectively corresponding to the entity character strings;
and performing dimension reduction on the first vector based on a hidden layer in an entity word vector conversion model to obtain the target entity word vector.
Wherein the determining a target intention character string associated with the target intention type from the target entity character string and generating recommended service data according to the target intention character string includes:
acquiring a service attribute represented by the target intention type as an intention service attribute;
taking the target entity character string with the intention service attribute as the target intention character string;
and searching business data associated with the target intention character string, and taking the business data associated with the target intention character string as the recommended business data.
Wherein the searching for the business data associated with the target intention character string and taking the business data associated with the target intention character string as the recommended business data comprises:
when the target intention type belongs to a semantic reasoning type, extracting a plurality of first entity character strings from a plurality of entity character strings of the knowledge graph; the first entity string is an entity string of the plurality of entity strings of the knowledge-graph other than the entity string having the intent-service attribute;
determining a target similarity coefficient between the target intention character string and each first entity character string in the knowledge graph, and selecting a target first entity character string matched with the target intention type from the plurality of first entity character strings according to a plurality of target similarity coefficients;
and searching the business data associated with the target first entity character string, and taking the business data associated with the target first entity character string as the recommended business data.
Wherein the knowledge-graph further comprises correlation matching coefficients between a plurality of entity strings; the correlation matching coefficient is obtained through data set statistics related to the knowledge graph;
the determining, in the knowledge-graph, a target similarity coefficient between the target intent string and each first entity string includes:
extracting a test first entity string for polling from the plurality of first entity strings;
taking the correlation matching coefficient between the target intention character string and the test first entity character string as a target similarity coefficient between the target intention character string and the test first entity character string;
when each first entity string is determined to be a test first entity string, polling is stopped.
Wherein the determining, in the knowledge-graph, a target similarity coefficient between the target intent string and each first entity string comprises:
taking a plurality of entity character strings with the intention service attribute in the knowledge graph as second entity character strings;
acquiring a first graph vector of each first entity character string, and acquiring a second word vector and a second graph vector of each second entity character string;
extracting a second word vector corresponding to the target intention character string from the plurality of second word vectors to be used as a target word vector, and extracting a second map vector corresponding to the target intention character string from the plurality of second map vectors to be used as a target map vector;
and determining a target similarity coefficient between the target intention character string and each first entity character string according to the target entity word vector, the target map vector, the plurality of first map vectors and the plurality of second word vectors.
Wherein the knowledge-graph further comprises associative matching coefficients between the plurality of entity strings;
the obtaining a first graph vector of each first entity string includes:
initializing an original graph vector corresponding to each entity character string in the knowledge graph;
sampling a plurality of sample entity character strings from the knowledge graph, acquiring an original graph vector corresponding to each sample entity character string as a sample graph vector, respectively updating each sample graph vector by adopting a gradient descent rule according to the association matching coefficient among the plurality of sample entity character strings to obtain an adjusted graph vector, and determining the adjusted graph vector as the original graph vector;
and when the sampling times reach a time threshold value, taking the updated original image vector corresponding to each first entity character string as the first image vector.
Wherein the determining a target similarity coefficient between the target intent character string and each first entity character string according to the target entity word vector, the target map vector, the plurality of first map vectors, and the plurality of second word vectors comprises:
extracting a test first entity character string for polling from the plurality of first entity character strings, and taking a first graph vector corresponding to the test first entity character string as a test graph vector;
determining a first similarity coefficient between a second word vector of a second entity character string adjacent to the test first entity character string in the knowledge graph and the target word vector according to the association matching coefficient;
determining a second similarity coefficient between the test chart vector and the target chart vector according to the correlation matching coefficient;
generating a target similarity coefficient between the target intention character string and the test first entity character string according to the first similarity coefficient and the second similarity coefficient;
when each first entity string is determined to be a test first entity string, polling is stopped.
Another aspect of an embodiment of the present invention provides a text processing apparatus, including:
the acquisition module is used for acquiring a target text and acquiring a knowledge graph; the knowledge graph comprises a plurality of entity character strings and a service attribute character string corresponding to each entity character string;
the searching module is used for searching a target entity character string matched with the target text in the entity character strings;
the acquisition module is also used for extracting a target service attribute character string corresponding to the target entity character string;
the identification module is used for identifying a target intention type matched with the target text according to the target entity character string and the target service attribute character string;
and the generation module is used for determining a target intention character string associated with the target intention type from the target entity character string and generating recommended service data according to the target intention character string.
Wherein, the searching module comprises:
an extracting unit configured to extract a target keyword from the target text;
the mapping unit is used for mapping the target key words into map labeling entity character strings;
and the searching unit is used for searching the entity character strings which are the same as the map labeling entity character strings from the entity character strings to be used as target entity character strings matched with the target text.
Wherein the extraction unit includes:
the dividing subunit is used for dividing the target text into a plurality of target unit characters and converting each target unit character into a target unit character vector;
the coding subunit is used for performing bidirectional cyclic coding on the target unit character vectors based on a coding layer in the first cyclic neural network model to obtain a forward coding matrix and a reverse coding matrix;
the splicing subunit is used for splicing the forward coding matrix and the reverse coding matrix into a hidden state matrix;
the splicing subunit is further configured to perform sequence tagging on the hidden state matrix based on the conditional random field in the first recurrent neural network, and determine part-of-speech tags corresponding to each target unit character respectively;
and the splicing subunit is further configured to determine the target keyword according to the part-of-speech tag corresponding to each target unit character.
Wherein the mapping unit includes:
a conversion subunit, configured to divide the target keyword into a plurality of key unit characters, and convert each key unit character into a key unit character vector;
the conversion subunit is further configured to encode the plurality of key unit character vectors based on an encoding layer in a second recurrent neural network model to obtain a context vector of the target keyword;
a decoding subunit, configured to decode the context vector based on a decoding layer in the second recurrent neural network model to obtain a hidden state vector of the context vector;
the decoding subunit is further configured to identify the hidden state vector, obtain a character sequence corresponding to the hidden state vector, and determine the character sequence as the map labeling entity character string.
Wherein the identification module comprises:
the conversion unit is used for converting the target entity character string into a target entity word vector;
the combination unit is used for converting the target service attribute character string into a target service attribute word vector;
the combination unit is further configured to combine the target entity word vector and the target service attribute word vector into an input vector, and perform convolution and pooling on the input vector based on a convolution layer and a pooling layer in an intention recognition model to obtain an intention feature vector of the target text;
and the identification unit is used for identifying the matching probability between the intention feature vector and a plurality of intention types in the classifier based on the classifier in the intention identification model, and determining the intention type corresponding to the maximum matching probability from the plurality of matching probabilities as the target intention type.
Wherein the conversion unit includes:
the searching subunit is used for searching the one-hot code corresponding to the target entity character string from the entity word bag as a first vector; the entity word bag comprises the entity character strings in the knowledge graph and unique hot codes respectively corresponding to the entity character strings;
and the dimension reduction subunit is used for performing dimension reduction on the first vector based on a hidden layer in an entity word vector conversion model to obtain the target entity word vector.
Wherein the generating module comprises:
the acquisition unit is used for acquiring the service attribute represented by the target intention type as an intention service attribute;
a determining unit, configured to use a target entity string having the intention service attribute as the target intention string;
and the generating unit is used for searching the business data associated with the target intention character string and taking the business data associated with the target intention character string as the recommended business data.
Wherein the generating unit includes:
a first extraction subunit, configured to extract, when the target intention type belongs to a semantic reasoning type, a plurality of first entity character strings from a plurality of entity character strings of the knowledge graph; the first entity string is an entity string of the plurality of entity strings of the knowledge-graph other than the entity string having the intent-service attribute;
a selecting subunit, configured to determine, in the knowledge-graph, a target similarity coefficient between the target intent character string and each first entity character string;
the first extraction subunit is further configured to select, according to a plurality of target similarity coefficients, a target first entity string that matches the target intention type from the plurality of first entity strings;
and the determining subunit is used for searching the service data associated with the target first entity character string and taking the service data associated with the target first entity character string as the recommended service data.
Wherein the knowledge-graph further comprises correlation matching coefficients between a plurality of entity strings; the correlation matching coefficient is obtained through data set statistics related to the knowledge graph;
the selection subunit includes:
a start subunit operable to extract a test first entity string for polling from the plurality of first entity strings;
the starting subunit is further configured to use an association matching coefficient between the target intention string and the test first entity string as a target similarity coefficient between the target intention string and the test first entity string;
a stopping subunit configured to stop the polling when each of the first entity character strings is determined as a test first entity character string.
Wherein the selection subunit includes:
a first obtaining subunit, configured to use a plurality of entity character strings with the intended service attribute in the knowledge graph as second entity character strings;
the second acquiring subunit is used for acquiring a first graph vector of each first entity character string;
the first obtaining subunit is further configured to obtain a second word vector and a second graph vector of each second entity character string;
a second extraction subunit, configured to extract a second word vector corresponding to the target intention character string from the plurality of second word vectors as a target word vector, and extract a second map vector corresponding to the target intention character string from the plurality of second map vectors as a target map vector;
and the third extraction subunit is used for determining a target similarity coefficient between the target intention character string and each first entity character string according to the target entity word vector, the target map vector, the first map vectors and the second word vectors.
Wherein the knowledge-graph further comprises associative matching coefficients between the plurality of entity strings;
the second acquisition subunit includes:
the initialization subunit is used for initializing an original graph vector corresponding to each entity character string in the knowledge graph;
a sampling subunit, configured to sample multiple sample entity character strings from the knowledge graph, obtain an original graph vector corresponding to each sample entity character string, as a sample graph vector, respectively update each sample graph vector by using a gradient descent rule according to an association matching coefficient between the multiple sample entity character strings, obtain an adjusted graph vector, and determine the adjusted graph vector as the original graph vector;
the sampling subunit is further configured to, when the sampling frequency reaches a frequency threshold, use the updated original graph vector corresponding to each first entity character string as the first graph vector.
Wherein the third extraction subunit includes:
a first generating subunit, configured to extract a test first entity string used for polling from the plurality of first entity strings, and use a first chart vector corresponding to the test first entity string as a test chart vector;
the first generating subunit is further configured to determine, according to the association matching coefficient, a first similarity coefficient between a second word vector of a second entity character string adjacent to the test first entity character string in the knowledge graph and the target word vector;
the first generation subunit is further configured to determine, according to the associated matching coefficient, a second similarity coefficient between the test chart vector and the target chart vector;
a second generating subunit, configured to generate a target similarity coefficient between the target intention string and the test first entity string according to the first similarity coefficient and the second similarity coefficient;
the second generating subunit is further configured to stop polling when each of the first entity character strings is determined as a test first entity character string.
Another aspect of an embodiment of the present invention provides an electronic device, including: a processor and a memory;
the processor is connected to a memory, wherein the memory is used for storing program codes, and the processor is used for calling the program codes to execute the method in one aspect of the embodiment of the invention.
Another aspect of the embodiments of the present invention provides a computer storage medium storing a computer program, the computer program comprising program instructions that, when executed by a processor, perform a method as in one aspect of the embodiments of the present invention.
The method comprises the steps of obtaining a target text and a knowledge graph containing a plurality of entity character strings and service attribute character strings, determining the target entity character strings and the target service attribute character strings in the knowledge graph, subsequently determining a target intention type according to the target entity character strings and the target service attribute character strings, determining the target intention character strings according to the target intention type, and further determining recommended service data. Compared with the method that the service data are determined by manually searching data and manually executing multiple service behaviors, the method and the device for generating the service data meet the psychological expectation of the user in an automatic mode can save the time consumed by the user for executing the multiple service behaviors and further improve the efficiency of acquiring the service data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a system architecture diagram of a text process provided by an embodiment of the present invention;
2 a-2 b are schematic diagrams of a text processing scenario provided by an embodiment of the present invention;
fig. 3 is a schematic flowchart of a text processing method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a knowledge-graph provided by an embodiment of the present invention;
FIG. 5 is a schematic interface diagram of a text processing method according to an embodiment of the present invention;
fig. 6 is a schematic view of a scenario for determining recommended service data according to an embodiment of the present invention;
fig. 7 is a schematic view of another scenario for determining recommended service data according to an embodiment of the present invention;
FIG. 8 is a flowchart illustrating another text processing method according to an embodiment of the present invention;
FIG. 9 is a flowchart illustrating another text recognition method according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of another knowledge-graph provided by an embodiment of the present invention;
FIG. 11 is a graph comparing the conversion rates of a registry provided by the examples of the present invention;
FIG. 12 is a system block diagram of a text processing system according to an embodiment of the present invention;
FIG. 13 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present invention;
fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a system architecture diagram of text processing according to an embodiment of the present invention. The server 10f establishes a connection with a user terminal cluster through the switch 10e and the communication bus 10d, and the user terminal cluster may include: user terminal 10a, user terminal 10 b.
Taking the user terminal 10a as an example, when the user terminal 10a acquires a text input by the user, the user terminal 10a sends the text to the server 10f through the switch 10e and the communication bus 10 d. The database 10g corresponding to the server 10f stores a knowledge map, which includes a plurality of entity character strings and a service attribute character string corresponding to each entity character string. The server 10f determines the intention type of the user according to the target text and the knowledge graph, and further determines corresponding recommended service data according to the intention type, for example, the recommended service data may be schedule information of doctors, department registration links, and the like. The server 10f may transmit the determined recommended service data to the user terminal 10a, and the subsequent user terminal 10a may display the recommended service data in a screen.
Of course, if the user terminal 10a stores the knowledge map locally, the user terminal 10a may also determine the intention type of the user directly according to the knowledge map and the target text, and similarly determine the recommended service data according to the intention type. The following description will be made specifically taking an example of how the user terminal 10a determines the type of intention and determines recommended service data. The user terminal 10a, the user terminal 10b, the user terminal 10c, and the like shown in fig. 1 may include a mobile phone, a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), a wearable device (e.g., a smart watch, a smart band, and the like), and the like.
Please refer to fig. 2 a-2 b, which are schematic views of a text processing scenario according to an embodiment of the present invention. In the following description, taking the medical guidance system as an example, as shown in an intelligent guidance response page 20a in fig. 2a, the user inputs "drummed" and "nausea" on the intelligent guidance response page 20a, and the user terminal 10a sets "drummed, nausea" as a target text 20b.
The user terminal 10a may extract the keyword set 20c in the target text 20b based on the entity recognition model: "belly drumming" and "nausea". The entity recognition model may be obtained by training according to the LSTM (Long Short-Term Memory network) algorithm + CRF (Conditional Random Field) algorithm.
The specific process of extracting the keyword set 20c is as follows: the user terminal 10a first divides each character in the target text 20b, encodes each character into a character vector by using a one-hot encoding method, and the user terminal 10a sequentially inputs a plurality of character vectors into an encoding layer in the entity recognition model according to a front-back order, that is, uses LSTM to respectively perform forward encoding and backward encoding on each character vector to obtain a forward encoding matrix and a backward encoding matrix, wherein the number of rows of the matrix corresponds to the number of characters, and the number of columns of the matrix corresponds to the vector of each character after being encoded based on LSTM. The forward encoding matrix and the reverse encoding matrix are spliced into a hidden state matrix, each row in the hidden state matrix is identified, and a corresponding part-of-speech tag is determined, that is, a part-of-speech tag of each character in the target text 20b is determined, where the part-of-speech tag may include: B-Person (beginning part of Person name), I-Person (middle part of Person name), B-Organization (beginning part of Organization), I-Organization (middle part of Organization), O (non-entity information).
After the user terminal 10a determines the part-of-speech tag of each character in the target text 20B, the character sequence with the part-of-speech tag between "B-Person" and "I-Person" may be used as a keyword, the character sequence with the part-of-speech tag between "B-Organization" and "I-Organization" may be used as a keyword, and the character with the part-of-speech tag "O" is a non-entity character and may be directly ignored.
Since the keyword set 20c is described using natural language, the user terminal 10a is also required to convert the keywords in the keyword set 20c into a specialized written vocabulary based on a conversion model, that is, "belly drumhead" in the keyword set 20c is converted into "abdominal distension", and "nausea" is already a specialized written vocabulary, and thus conversion may not be performed. That is, the converted "bloating", "nausea" may be combined into the standard entity string set 20d.
The conversion process is similar to a machine language translation process, for example, translating Chinese into English with the same semantics. The conversion model can be obtained by training according to the LSTM and a large amount of linguistic data, and in order to improve the accuracy rate of subsequent matching with the knowledge graph, the linguistic data used for constructing the knowledge graph can be the same as the linguistic data used for the conversion model.
Of course, a plurality of mapping relationships X → Y may be constructed, where X is a spoken keyword and Y is a written term synonymous with the keyword, and the keyword set 20c may be converted into the standard entity string set 20d by finding the corresponding mapping relationship.
The user terminal 10a acquires a knowledge graph 20e, where the knowledge graph 20e includes a plurality of entity character strings and a service attribute character string corresponding to each entity character string. Since the current scenario is a medical scenario, the knowledge-graph 20e is a knowledge-graph about medical treatment, i.e., the entity strings in the knowledge-graph are strings about the doctor name, symptom name, disease name, and medicine name. The service attribute string is used to identify the service domain where the entity corresponding to the entity string is located, for example, in the knowledge map 20e, the service attribute string of the entity string "diarrhea" is: symptoms; the service attribute character string of the entity character string 'digestive department' is as follows: a department; the service attribute character string of the entity character string 'Zhang III' is as follows: a doctor.
The user terminal 10a searches, in the plurality of entity character strings included in the knowledge graph 20e, for an entity character string that is the same as the standard entity character string set 20d as a target entity character string, and searches, in the knowledge graph 20e, for a service attribute character string corresponding to the target entity character string, that is, the service attribute character string corresponding to the target entity character string "bloated" found in the knowledge graph 20e is: symptoms; the service attribute character string corresponding to the target entity character string 'nausea' is: symptoms are presented. The target entity string and the corresponding service attribute string may be combined into a text 20f.
The user terminal 10a first converts the character string "bloating" in the text 20f into an entity word vector 20g, converts the character string "symptom" in the text 20f into an attribute word vector 20h, converts the character string "nausea" in the text 20f into an entity word vector 20m, and concatenates the entity word vector 20g and the attribute word vector 20h into a first input vector, which may be known as a word vector related to "bloating, symptom" in the text 20 f; the entity word vector 20m and the attribute word vector 20h are concatenated into a second input vector, which is a word vector for "nausea, symptoms" in the text 20f.
The user terminal 10a combines the first input vector and the second input vector into an input matrix, and obtains an intention recognition model 20k, which may recognize an intention type of the input matrix, where the intention type may include: doctor finding, department finding, registration department recommending, inspection and examination, flow correlation and the like. The intention recognition model 20k includes a convolution layer, a pooling layer and a classifier, wherein the convolution layer and the pooling layer are used for performing convolution operation and pooling operation on the input matrix, and after the convolution operation and the pooling operation, the intention feature vector of the target text can be extracted.
The classifier in the intention recognition model 20k can recognize the correlation between the extracted intention feature vector and a plurality of intention types, that is, the matching probability between the target text and the plurality of intention types is recognized, as shown in fig. 2a, the matching probability between the target text 20b and the intention type "finding doctor" is: 0.05; the matching probability of the target text 20b and the intention type "finding department" is: 0.03; the probability of matching the target text 20b with the intent type "registered department recommendation" is: 0.75; the probability of matching the target text 20b with the intention type "check" is: 0.07; the probability of matching the target text 20b with the intent type "other" is: 0.07.
from the plurality of matching probabilities, the user terminal 10a may determine that the type of intent matching the target text 20b is: and (4) registration department recommendation.
After the user terminal 10a determines the intention type, the recommended service data is determined according to the intention type "registration department recommendation".
The specific process for determining the recommended service data comprises the following steps: the user terminal 10a obtains a matching coefficient between the entity character strings in the knowledge graph 20e, where the matching coefficient is used to represent a co-occurrence probability of two entity character strings, and the higher the co-occurrence probability is, the more matched the entities corresponding to the two entity character strings are. In the knowledge map 20e, the user terminal 10a calculates matching coefficients between each entity character string and the target entity character string "abdominal distension" and between each entity character string and the target entity character string "nausea", and sets the service attribute character strings as: the "department" and the entity string with the highest matching coefficient between the "bloating" and the "nausea" target entity strings are taken as the recommended entity strings, i.e., "digestive department" is the recommended entity string. And searching a registration link of a recommended entity character string 'digestive department' in a database, and taking the registration link as recommended service data.
As shown in the interface 20x of fig. 2b, a preset animation may be played on the screen when the user terminal 10a determines the type of intention and searches for recommended service data. As shown in the interface 20y in fig. 2b, when the user terminal 10a determines the recommended service data, the recommended entity character string "digestive department" may be displayed on the screen, and the recommended service data, i.e., the registration link of "digestive department" may be displayed.
Referring to fig. 3, which is a flowchart illustrating a text processing method according to an embodiment of the present invention, as shown in fig. 3, the method may include:
step S101, acquiring a target text and acquiring a knowledge graph; the knowledge-graph includes a plurality of entity strings, and a service attribute string corresponding to each entity string.
Specifically, the terminal device (e.g., the user terminal 10a in the embodiment corresponding to fig. 2 a) acquires the target text (e.g., the target text 20b in the embodiment corresponding to fig. 2 a), where the target text may be acquired by manually inputting a text into the terminal device by a user, and the terminal device takes the text input by the user as the target text; or the terminal device collects the voice of the user, converts the voice into a text, and calls the converted text as a target text.
The terminal device obtains a knowledge graph (such as the knowledge graph 20e in the corresponding embodiment of fig. 2 a), wherein the knowledge graph uses a visualization technology to describe knowledge resources and carriers thereof, and mines, analyzes, constructs, draws and displays knowledge and mutual relations between the knowledge resources and the carriers. The knowledge graph comprises entity character strings corresponding to a plurality of entities and service attribute character strings of service attributes corresponding to each entity character string; wherein the service attribute indicates a service domain in which the corresponding entity is located.
For example, the service attribute character string corresponding to the entity character string "lack of power" in the knowledge graph is: symptoms; the service attribute character string corresponding to the entity character string 'ophthalmology' in the knowledge map is as follows: a department; the business attribute character string corresponding to the entity character string 'rash' in the knowledge map is as follows: diseases; the service attribute character string corresponding to the entity character string 'cefaclor' in the knowledge graph is as follows: a drug; the service attribute string corresponding to the entity string "Magnetic Resonance Imaging (MRI)" in the knowledge map is: checking; the service attribute character string corresponding to the entity character string 'ZNGIII' in the knowledge map is as follows: a doctor; the business attribute character string corresponding to the entity character string 'appendectomy' in the knowledge graph is as follows: performing surgery; the service attribute character string corresponding to the entity character string 'review' in the knowledge map is as follows: and (5) carrying out the process.
Please refer to fig. 4, which is a diagram illustrating a knowledge graph according to an embodiment of the present invention. Taking the example of constructing a knowledge map of the disease "rash", the entity string associated with the disease "rash" can be from entities involved in Electronic medical records (EHRs), such as department names, age, etc., and since constructing "rash" involves a great deal of expertise, crawling from the internet is more reliable using EHRs than using EHRs. The connecting line between every two entity character strings represents the corresponding logical relationship, that is, the respective service attributes of the two entity character strings are identified, and the knowledge graph of the disease "rash" corresponding to fig. 4 contains 1 disease: rash; 4 departments: pediatric medicine, dermatology, pediatric emergency and isolation doors; 2, sex: male and female; 7 ages: 0,1, 2, 3, 4, 5, 9 years old. Therefore, the number of logical relationships between disease and department is 4, the number of logical relationships between disease and sex is 2, and the number of logical relationships between disease and age is 7.
Alternatively, user information (referred to as login user information) of a user who inputs the target text (referred to as a login user) is acquired, and the login user information may include age information and sex information. When the terminal device obtains the target text and detects that the target text also comprises age information or gender information, extracting the age information or the gender information from the target text, and if the extracted age information or the gender information is not consistent with the age information or the gender information in the login user information, the terminal device generates a prompt message for prompting the login user to adjust the login user information or adjust the age information or the gender information in the target text.
This is because in a medical scenario, diagnosis of a part of diseases depends on age and gender to a great extent, and therefore, determining the age and gender of a patient in advance can improve the accuracy of subsequent disease diagnosis, and thus improve the accuracy of recommending business data to a user.
Referring to fig. 5, which is a schematic view of an interface for text processing according to an embodiment of the present invention, as shown in an interface 30a, the login user information set by the login user on the intelligent diagnosis guidance login page is: age: age 29 years old; sex: the male. The target text input by the login user in the intelligent diagnosis guide response page is as follows: the child is uncomfortable, and the terminal device recognizes that the age information of the child in the target text is as follows: 0-10 years old, so the age information in the target text does not match the age information in the information of the login user, and the terminal device can generate a prompt message: "the personal information of the current counselor does not coincide with the inputted personal information, please reselect the counselor, or input again". The login user resets the login user information of the consultant as follows: and the terminal equipment is 1 year old and male, and the subsequent terminal equipment can receive new target text and determine a target intention type on the basis of new login user information.
Step S102, searching a target entity character string matched with the target text in the entity character strings, and extracting a target service attribute character string corresponding to the target entity character string.
Specifically, the terminal device extracts a keyword (referred to as a target keyword, for example, the keyword in the keyword set 20c in the embodiment corresponding to fig. 2 a) from the target text, where the extracted target keyword is a character that is irrelevant to the user intention in the target text and can be discarded, so as to improve the accuracy of subsequently identifying the user intention. For example, the target text is: "yesterday starts pulling the belly in the middle of the night", then the target keywords are: "pull the belly".
Since the target keyword is input by the user, there may be a case where the keyword is not specified or is not specialized, such as "belly" in the foregoing example, in order to improve the accuracy of the subsequent recognition intention, the terminal device further needs to map the target keyword to a specialized and specified synonym, and the mapped synonym may be referred to as a map identification entity character string (e.g., the standard entity character string in the standard entity character string set 20d in the foregoing corresponding embodiment of fig. 2 a). And the terminal equipment searches the entity character string which is the same as the map identification entity character string in the entity character strings of the knowledge map to be used as a target entity character string.
For example, a knowledge graph includes entity strings of: "diarrhea", "early satiety", "digestive department", "blood routine", map-identified entity strings are: "diarrhea" and "zhangsan", then the entity character strings found in the above 4 entity character strings that are the same as the map identification entity character strings "diarrhea" and "zhangsan" are: "diarrhea", i.e., the target entity string, is: "diarrhea".
The following describes in detail how to extract the target keyword from the target text: the terminal device divides the target text into a plurality of unit characters, each of which is called a target unit character. The terminal device may convert each target unit character into a one-dimensional vector in a one-hot encoding manner, which is referred to as a target unit character vector, and obtain a first recurrent neural network model, where the first recurrent neural network model may label a part-of-speech tag of each character, and the part-of-speech tag may include: B-Person (beginning part of Person name), I-Person (middle part of Person name), B-Organization (beginning part of Organization), I-Organization (middle part of Organization), O (non-entity information). The coding layer in the first Recurrent Neural Network model may be based on RNN (Recurrent Neural Network) or LSTM training, and the conditional random field in the first Recurrent Neural Network model is trained by CRF.
Firstly, all target unit character vectors are coded in the forward direction, the terminal equipment initializes a hidden state vector h10, at the time t11, a target unit character vector x1 and the hidden state vector h10 which are positioned at the head in a target text are input into a coding layer in a first cyclic neural network model, the hidden state vector h11 at the time t11 is calculated according to a formula (1),
Figure BDA0002020504050000161
wherein, σ () is a σ function, i, f, o respectively represent the input gate, the forgetting gate, and the output gate. All W represent the weight matrix between two gates. During the encoding process, the parameters in the whole first recurrent neural network model are shared, namely, the parameters are not changed when the hidden state vector at each moment is calculated.
At the time t12, inputting a target unit character vector x2 positioned at the second position in the target text and a hidden state vector h11 at the time t11 into the coding layer, and recalculating a hidden state vector h12 at the time t12 according to the formula (1); at time t13, the hidden state vector h13 at time t13 is also calculated according to equation (1). In other words, the hidden state vector ht at time t is determined by the hidden state vector h (t-1) at time t-1 and the target unit character vector xt at time t, iteration is continued until the hidden state vector h1n is obtained in the last iteration, and the hidden state vectors h11, h12,. And h1n are combined into a forward coding matrix, wherein the size of the forward coding matrix can be expressed as: n × m, where n represents the number of target unit characters and m represents the dimension of the hidden state vector after each target unit character is forward encoded.
Then reversely encoding all target unit character vectors, initializing a hidden state vector h20 by the terminal equipment, inputting a target unit character string vector xn positioned at the last position in a target text and the hidden state vector h20 into an encoding layer in the first cyclic neural network model at the time of t21, and calculating the hidden state vector h21 at the time of t21 according to a formula (1); at time t22, the target unit string vector x (n-1) positioned next to the last in the target text and the hidden state vector h21 at time t21 are input into the coding layer, and the hidden state vector h22 at time t22 is calculated according to formula (1). The hidden state vector ht at the time t is determined by the hidden state vector h (t-1) at the time t-1 and the target unit character vector x (n + 1-t) at the time t, iteration is continuously carried out until a hidden state vector h2n is obtained in the last iteration, and the hidden state vectors h21, h22, h2n are combined into a reverse coding matrix. The size of the inverse coding matrix can be expressed as: n × m, where n represents the number of target unit characters and m represents the dimension of the hidden state vector after each target unit character is reverse encoded.
The forward encoding and the reverse loop encoding are the same except that the sequence of inputting the target unit character vector is different (the forward encoding is input from front to back, and the reverse encoding is input from back to front), which is to ensure that even if the target text contains a great number of target unit characters, the semantic information and the time sequence information of the target unit characters are not reduced along with the sequence.
The terminal equipment splices the forward coding matrix and the reverse coding matrix into a hidden state matrix, and the size of the hidden state matrix can be known to be n multiplied by 2m. And respectively carrying out sequence labeling on each line in the hidden state matrix based on the conditional random field in the first cyclic neural network, namely identifying the similarity between each line and various part-of-speech tags, and taking the part-of-speech tag with the maximum similarity as the part-of-speech tag of the corresponding target unit character.
The terminal device may use a character sequence between parts of speech tagged as "B-Person" and "I-Person" as a keyword, use a character sequence between parts of speech tagged as "B-Organization" and "I-Organization" as a keyword, and use a character with parts of speech tagged as "O" as a non-entity character, which can be directly ignored.
Specifically, how the terminal device maps a target keyword to a map labeling entity character string is described below, if a plurality of target keywords exist, the corresponding map labeling entity character string can be determined based on the same processing procedure.
The terminal equipment divides the target keyword into a plurality of unit characters, called key unit characters, converts each key unit character into a one-dimensional vector in a one-hot coding mode, called key unit character vector, and obtains a second recurrent neural network model, and the second recurrent neural network model can translate the target keyword into a normalized graph labeled entity character string. The second recurrent neural network model comprises an encoding layer and a decoding layer, and the encoding layer and the decoding layer are obtained by RNN or LSTM training.
And according to the front-back position relation of each key unit character vector in the target keyword, the coding layer in the second recurrent neural network model sequentially codes each key unit character vector, and the coding process can be a parameter of the forward coding process. The terminal device may use the hidden state vector determined by the key unit character vector located at the end of the target keyword as the context vector of the target keyword, that is, use the hidden state vector obtained by the last operation of the encoding layer as the context vector. Decoding the context vector based on a decoding layer of a second recurrent neural network model, wherein in the decoding process, the output y at the time t t Is composed of the hidden state vector h (t-1) at time t-1 and the output y at time t-1 t-1 And determining that the terminal equipment takes the decoded character sequence as a map labeling entity character string.
The process of converting the target keyword into the map labeled entity character string is similar to the process of translating a Chinese character sequence into a synonymous English character sequence by a translation model.
And step S103, identifying a target intention type matched with the target text according to the target entity character string and the target service attribute character string.
Specifically, the terminal device searches a one-hot code (referred to as a first vector) corresponding to the target entity character string from the entity word bag. The entity word bag comprises a plurality of entity character strings in the knowledge graph and an one-hot code corresponding to each entity character string, wherein the one-hot code is a vector which only comprises one 1 and the rest are 0.
For example, the knowledge graph includes 4 entity strings, which are: digestive department, abdominal distension, early satiety and Zhangsan; then the unique hot code of the entity string "digestive department" in the entity word bag is: [1,0,0,0]; the one-hot code of the entity character string "abdominal distension" is: [0,1,0,0]; the one-hot code of the entity character string 'early saturation' is: [0,0,1,0]; the one-hot code of the entity character string 'Zhang three' is as follows: [0,0,0,1]. It can be appreciated that the number of entity strings contained in the knowledge-graph is equal to the dimensionality of each unique code.
The terminal device obtains an entity word vector conversion model, which can reduce a first vector with a high dimension to a word vector with a low dimension, and performs matrix multiplication on the first vector and a hidden matrix based on the hidden matrix corresponding to a hidden layer in the entity word vector conversion model, and the vector obtained after the matrix multiplication is called a target entity word vector (for example, the entity word vector 20g in the embodiment corresponding to fig. 2 a), wherein the number of rows of the hidden matrix is equal to the dimension of the first vector, and the number of columns of the hidden matrix is equal to the dimension of the target entity word vector. For example, the size of the first vector is: 1 × 1000, the size of the hidden matrix is: 1000 x 100, then the size of the target entity word vector is: 1 × 100.
The entity word vector conversion model can be obtained according to the training of fasttext (fast text classifier) and word2vec (word vector conversion model), and a classifier is required to be added behind the hidden layer in the training process for calculating the classification error, and the classification error is transmitted to the hidden layer through forward propagation for updating the parameter value of the hidden matrix in the hidden layer.
If there are multiple target entity character strings, each target entity character string may be converted into a target entity word vector corresponding thereto based on the entity word bag and the entity word vector conversion model, and the t target entity word vectors may be represented as: { x-1,x-2.. X-t }, where x-n denotes the nth target entity word vector of size 1 × p, 1 ≦ n ≦ t, where p may equal 256.
The terminal equipment searches a one-hot code (called as a second vector) corresponding to the target service attribute character string from the service attribute word bag. The service attribute word bag comprises a plurality of service attribute character strings corresponding to the knowledge graph and an unique hot code corresponding to each service attribute character string.
The terminal device obtains a service attribute word vector conversion model, which can reduce a high-dimensional second vector into a low-dimensional word vector, and performs matrix multiplication on the second vector and a hidden matrix based on the hidden matrix corresponding to the hidden layer in the service attribute word vector conversion model, so as to obtain a vector called a target service attribute word vector (for example, the attribute word vector 20h in the embodiment corresponding to fig. 2 a).
The business attribute word vector conversion model can be obtained according to the training of fasttext and word2vec, and a classifier is required to be added behind the hidden layer in the training process for calculating a classification error and transmitting the classification error to the hidden layer through forward propagation for updating parameter values of a hidden matrix in the hidden layer.
If a plurality of target service attribute character strings exist, each target service attribute character string can be respectively converted into a corresponding target service attribute word vector based on a service attribute word bag and a service attribute word vector conversion model, and t target service attribute word vectors can be expressed as: { g-1,g-2.. G-t }, where g-n denotes the nth target traffic attribute word vector of size 1 × p, 1 ≦ n ≦ t, where p may equal 256.
The terminal device combines the target entity word vector and the target service attribute word vector into an input vector, and the input vectors corresponding to the t target entity character strings can be represented as: { d-1,d-2.. D-t }, where d-n denotes the nth input vector of size 1 × 2p, 1 ≦ n ≦ t.
The terminal device obtains an intention recognition model (e.g., intention recognition model 20k in the corresponding embodiment of fig. 2 a) that can recognize an intention type corresponding to the input vector, the intention recognition model including a convolutional layer, a pooling layer, and a classifier. The convolution layer corresponds to 1 or more convolution kernels (kernel, also called filter or reception field), the convolution operation is that the convolution kernels and sub-matrixes at different positions of the input vector perform matrix multiplication, and the row number H of the output matrix after the convolution operation out Sum column number W out Is determined by the size of the input vector, the size of the convolution kernel, the step size (stride), and the boundary padding (padding), i.e., H out =(H in -H kernel +2*padding)/stride+1,W out =(W in -W kernel +2*padding)/stride+1。H in ,H kernel Respectively representing the number of input vectors and the number of rows of a convolution kernel; w is a group of in ,W kernel The dimension of each input vector and the number of columns of the convolution kernel are separately represented.
After convolution, pooling operation needs to be performed on the output matrix based on a pooling layer, the pooling operation refers to aggregation statistics performed on the extracted output matrix, and the pooling operation may include average pooling operation and maximum pooling operation. The average pooling operation method is that an average value is calculated in each row (or column) of the output matrix to represent the row (or column); the maximum pooling operation is to extract the maximum value in each row (or column) of the output matrix to represent the row (or column). Through convolution operation and pooling operation, the most significant intention features of the target text can be extracted, and the intention features can be called intention feature vectors.
And identifying the matching probability between the intention feature vector and the multiple intention types in the classifier based on the classifier in the intention classification model, and extracting the intention type corresponding to the maximum matching probability from the multiple matching probabilities as the target intention type.
If the current business scenario is a medical referral scenario, the intent types may include: doctor finding, department finding, disease diagnosis, inspection, procedure correlation, and others. The doctor finding, the department finding, the inspection and the like are all specific business objects known by the user, for example, the user knows the doctor finding Zhang III, the user knows the number hanging digestive department, the user knows the blood routine inspection and the like; correspondingly, the disease diagnosis requires reasoning to be carried out by the terminal equipment according to the intention of the user to determine a business object.
Step S104, determining a target intention character string associated with the target intention type from the target entity character string, and generating recommended service data according to the target intention character string.
Specifically, an entity string associated with the target intention type is extracted from the target entity string as a target intention string. And the terminal equipment searches the service data corresponding to the target intention character string to be used as recommended service data.
By understanding the intention (namely the type of the target intention) of the user, the user data streams with different intentions can be distributed to different business processing modules, the business processing modules can search corresponding recommended business data according to the target intention character strings so as to improve the recommendation accuracy and efficiency of the recommended business data, and meanwhile, the user data streams are divided by identifying the intention of the user, and the dividing mode enables the whole system to have a reasonable hierarchical structure.
For example, the target text is: "my belly is painful, still attacks nausea, wants to find Zhang three doctors", confirms the target entity character string from the target text as: "abdominal pain", "nausea", "zhang san", and the target intent type of the identified target text is: and finding out a doctor. Therefore, among the above 3 target entity character strings, the target intention character string associated with the target intention type "find doctor" is: and (5) opening the leaf. The terminal equipment can search the database for the shift list, personal profile, registration link and the like of Zhang III doctors, packages the data into recommended service data and displays the recommended service data on the screen. The disease types which are good for each doctor can be stored in the database as labels, and when the corresponding doctor is found, the disease types which are good for the doctor can be packaged into recommended service data, so that the optimal personalized answers can be fed back to the user.
Referring to fig. 6 together, a scene schematic diagram for determining recommended service data according to an embodiment of the present invention is shown in fig. 6, in an intelligent diagnosis guide response page, target texts input by a user are: "i want to find three", after the terminal device obtains the target text, the terminal device extracts a target entity character string as follows: "zhang san", and determine the target business attribute character string as: and the doctor determines the target intention type according to the target entity character string and the target service attribute character string as follows: find a doctor, and the target intention string is: and searching a scheduling list of the doctor Zhang III and a registration link of the doctor Zhang III in a database, packaging the data into recommended service data, and displaying the recommended service data to a user.
Referring to fig. 7, another scene schematic diagram for determining recommended service data according to the embodiment of the present invention is shown in fig. 7, in the intelligent diagnosis guide response page, target texts input by the user are: the terminal device determines that the target entity character string is as follows after acquiring the target text: "trauma orthopedics", and determine the target business attribute string as: the department determines the target intention type according to the target entity character string and the target service attribute character string as follows: finding departments, and target intent strings: and further searching the department introduction of the wound orthopedics department and the registration link of the wound orthopedics department in a database, packaging the data into recommended service data, and displaying the recommended service data to a user. Subsequently, a function of recommending doctors can be continuously provided for the user, when the user selects a doctor recommending the department, doctors belonging to the department creating the bone can be searched in the knowledge graph, a recommended doctor list is generated according to the searched doctor image, and the list is displayed on a screen for the user to select.
Referring to fig. 8, which is a flowchart illustrating another text processing method according to an embodiment of the present invention, a target text 40a input by a user is: "drumhead of the nearest belly, uncomfortable", the target keyword 40b in the target text 40a is recognized by the entity recognition algorithm: the specific process of extracting the target keyword "belly drumhead" may be referred to step S102 in the above embodiment corresponding to fig. 3. Mapping the target keyword 40b to a target entity string 40c by entity linking: "abdominal distension", searching the target service attribute character string of the target entity character string 40c in the knowledge-graph: for the specific process of determining the target entity character string and the target service attribute character string, reference may be made to step S102 in the corresponding embodiment of fig. 3. Identifying, by an intent recognition model, a target intent type for the target text, the intent type may include: doctor finding, department finding, disease diagnosis, examination, process correlation, and others. When the target intention type is a disease diagnosis type, the terminal device, based on the medical knowledge map, the target entity string 40c: and reasoning the abdominal distension to determine recommended service data matched with the target text 40a, wherein the recommended service data can be: diseases, departments, drugs, symptoms, etc.
Compared with the method that the service data are determined by manually searching data and manually executing multiple service behaviors, the method and the device for generating the service data meet the psychological expectation of the user in an automatic mode can save the time consumed by the user for executing the multiple service behaviors and further improve the efficiency of acquiring the service data.
Referring to fig. 9, which is a flowchart illustrating another text recognition method according to an embodiment of the present invention, the text recognition method may include the following steps S201 to S206:
step S201, acquiring a target text and acquiring a knowledge graph; the knowledge-graph includes a plurality of entity strings, and a service attribute string corresponding to each entity string.
Step S202, searching a target entity character string matched with the target text in the entity character strings, and extracting a target service attribute character string corresponding to the target entity character string.
Step S203, identifying the target intention type matched with the target text according to the target entity character string and the target service attribute character string.
The specific processes of step S201 to step S203 may refer to step S101 to step S103 in the corresponding embodiment of fig. 3, and are not described in detail here.
Step S204, obtaining the service attribute represented by the target intention type, and determining the service attribute as the intention service attribute.
Specifically, the service attribute identified by the target intention type is obtained, and the obtained service attribute is referred to as an intention service attribute, for example, the service attribute identified by the target intention type "doctor finding" is: a doctor; the service attributes identified by the target intent type "find department" are: a department; the service attributes identified by the target intent type "checkup" are: checking; the service attributes identified by the target intent type "flow related" are: carrying out a process; the service attributes identified by the target intent type "disease judgment" are: symptoms are presented.
Step S205, using the target entity character string with the intention service attribute as the target intention character string.
The terminal device takes the target entity character string with the intention service attribute as a target intention character string, so that the character string most relevant to the intention of the user is further screened out, and the service data searched based on the character string is the service data meeting the psychological expectation of the user or the service data meeting the intention of the user.
Step S206, searching the business data associated with the target intention character string, and taking the business data associated with the target intention character string as the recommended business data.
Specifically, when the target intention type is a semantic determination type, business data associated with the target intention character string is searched in the database, and the searched business data is used as recommended business data. The target intention type belonging to the semantic determination type means that the user already knows that a specific business object, for example, doctor finding, department finding, inspection and examination, process correlation, etc. all belong to the semantic determination type.
For example, when the target intention string is "digestive department", a digestive department registration link and a profile belonging to a digestive doctor may be searched in the database and packaged as recommended business data; when the target intention character string is 'blood routine', the notice of 'blood routine' examination, the reservation link of 'blood routine' examination and the like can be searched in the database, and the data are packaged into the recommended service data; when the target intention character string is 'review', the flow introduction of the review and the notice of the review can be searched in the database and packaged into the recommended service data.
When the target intent type is a semantic reasoning type, the terminal device may divide all entity character strings in the knowledge graph into a plurality of first entity character strings and a plurality of second entity character strings, wherein an entity character string without an intent service attribute is referred to as a first entity character string, and an entity character string with an intent service attribute is referred to as a second entity character string. The target intention type belonging to the semantic reasoning type means that a terminal device needs to further reason out a service object according to the current service scene and the target text.
For example, the target intent type "disease diagnosis" belongs to the semantic reasoning type, and the service attributes identified by the target intent type are: the plurality of entity character strings corresponding to the knowledge graph may use an entity character string without a "symptom" service attribute as a first entity character string, and correspondingly use an entity character string with a "symptom" service attribute as a second entity character string.
In the knowledge-graph, a similarity coefficient (referred to as a target similarity coefficient) between the target intention character string and each first entity character string in the knowledge-graph is determined, and since the target intention character string also belongs to the knowledge-graph, that is, a node of the target intention character string in the knowledge-graph is determined, the correlation between the node of the first entity character string in the knowledge-graph is determined.
From the plurality of target similarity coefficients, the terminal device may regard the first entity character string with the target similarity coefficient greater than the coefficient threshold as a target first entity character string matching the target intention type; or the first entity character with the largest target similarity coefficient is taken as a target first entity character string from a plurality of target similarity coefficients; or selecting a target first entity character string matched with the target intention type from the plurality of first entity character strings according to different current service scenes and a plurality of target similarity coefficients; meanwhile, the login user information of the login user can be used as an auxiliary condition, and the target first entity character string is determined according to the mode; for example, if the first entity string includes: "adult eczema" and "infant eczema", according to the age information in the login user information: "age 1", selecting a target entity character string corresponding to the login user information from the first entity character string: "infantile eczema".
For example, if the current business scenario is a disease inquiry system, and the intended business attributes are: symptoms, then the business attributes may be: selecting a first entity character string with the highest target similarity coefficient as a target first entity character string from target similarity coefficients of first entity character strings corresponding to 'diseases'; if the current business scenario is a pharmacy drug recommendation system, the business attributes may be: and selecting the first entity character string with the highest target similarity coefficient as the target first entity character string from the target similarity coefficients of the first entity character strings corresponding to the medicines.
Please refer to fig. 10, which is a schematic diagram of another knowledge graph according to an embodiment of the present invention, wherein fig. 10 is a knowledge graph generated by symptoms "facial rash", department "dermatology", department "pediatrics", disease "infantile eczema", disease "infantile acne", disease "eczema" and disease "rash", and the knowledge graph further includes correlation matching coefficients between the symptoms "facial rash" and 2 departments and between the symptoms "facial rash" and 4 diseases.
If the target intention type is: disease diagnosis, the target semantic type belongs to the semantic inference type, and the target intention character string is "facial rash", then "dermatology", "pediatrics", "infantile eczema", "infantile acne", "eczema" and "rash" in the knowledge graph of fig. 10 all belong to the first entity character string, and the association matching coefficient may be taken as a target similarity coefficient between the target intention character string "facial rash" and the corresponding first entity character string.
From the above 6 target similarity coefficients, the first entity string "dermatology" and the first entity string "eczema" whose target similarity coefficient is greater than the coefficient threshold value of 0.4 may be taken as the target first entity string; the first entity string "dermatology" with the largest target similarity coefficient can be selected from the above 6 target similarity coefficients as the target first entity string; if the current scenario is a disease inquiry system, the service attributes may be: selecting a first entity character string 'eczema' with the highest target similarity coefficient from target similarity coefficients of first entity character strings corresponding to 'diseases' as a target first entity character string; if the current scene is a department recommendation system, the business attributes can be: the first entity character string "dermatology" with the highest target similarity coefficient is selected as the target first entity character string from the target similarity coefficients of the first entity character strings corresponding to the "departments".
There are two ways to determine the target similarity coefficient between the target intention string and the first entity string in the knowledge-graph, one of which is described in detail below:
the knowledge graph further comprises an association matching coefficient among the entity character strings, wherein the association matching coefficient is obtained through statistics of a data set associated with the knowledge graph. And if the number of the target intention character strings is more than one, the sum of the correlation matching coefficients of each target intention character string and the first entity character string before the test is carried out is used as the target similarity coefficient between the target intention character string and the first entity character string. And then extracting the test first entity character string from the rest first entity character strings, determining a target similarity coefficient between the target intention character string and the new test first entity character string, continuously circulating, and stopping polling when each first entity character string is determined as the test first entity character string, namely determining the target similarity coefficient between the target intention character string and each first entity character string.
For example, in the following description, a medical guide is taken as an example to illustrate how to determine an association matching coefficient between a plurality of entity character strings in a knowledge graph, where the knowledge graph may be constructed based on a large number of Electronic medical records (EHRs), and first, items meeting data requirements are screened from a large number of RHRs, for example, each data item must include patient gender, age, treatment department, symptom, medication, doctor, disease, and the like; then, combining the items of different drug names corresponding to the same drug, and combining the items into a unified drug name; and combining the items of different department names corresponding to the same department into a unified department name. With symptoms, medications, doctors, diseases as centers, the co-occurrence probability between the center and the remaining columns is calculated.
Taking disease-gender as an example, assuming that the set of diseases is { d _1, d _2.. D _ n }, the set of gender is { s1, s2}, and T is a threshold, the co-occurrence probability between the disease and the gender is calculated according to formula (2):
Figure BDA0002020504050000251
that is, for any disease and sex, the number of simultaneous occurrence is counted, and if it is larger than a certain threshold T, the number of co-occurrence is retained and recorded as r pop . The co-occurrence probability r is obtained by dividing the co-occurrence frequency by the occurrence frequency of the central vector rat
Similarly, co-occurrence coefficients and co-occurrence probabilities of co-occurrence pairs of disease-gender, disease-symptom, symptom-department, doctor-disease, disease-department, and the like can be calculated.
The terminal device may use a character string corresponding to an entity in the EHR as an entity character string in the knowledge graph, and use a co-occurrence probability between any two entities as an association matching coefficient between the corresponding entity character strings; the relevance between any two nodes (entity character strings) in the knowledge graph is determined in a statistical mode, and the relevance is used for providing scientific data support for subsequent reasoning on the knowledge graph.
Another way of how to determine the target similarity coefficient between the target intention string and the first entity string in the knowledge-graph is explained in detail below: this approach is to generate a graph vector for each entity string by random walk + skipgram (language model). The entity string in the knowledge-graph that does not have the intent-service attribute is a first entity string and the entity string that has the intent-service attribute is a second entity string. Obtaining a map vector of each entity character string in the knowledge map, which may be referred to as a first map vector of the first entity character string, and a map vector of the second entity character string as a second image, and obtaining a word vector of the second entity character string, where a specific process of obtaining the word vector may refer to the description of obtaining the target entity word vector and the target service attribute word vector in step S103 in the embodiment corresponding to fig. 3.
And extracting a second word vector of the target intention character string from the plurality of second word vectors to serve as a target word vector, wherein the target intention character string belongs to the target entity character string, and the corresponding target word vector also belongs to the target entity word vector. And extracting a second map vector of the target intention character string from the plurality of second map vectors as a target map vector. And extracting one first entity character string used for current polling from the plurality of first entity character strings as a test first entity character string, and using a first graph vector corresponding to the test first entity character string as a test graph vector.
Calculating a first similarity coefficient score1 (d) between the target intention string and the test first entity string according to formula (3) t ):
Figure BDA0002020504050000261
Wherein n represents the number of target intent strings; d t Representing a test first entity string; t is i Representing a target word vector; t is tj A second word vector representing a second entity string adjacent to the test first entity string in the knowledge-graph, j being greater than or equal to 0 and less than or equal to m, m representing the number of second entity strings adjacent to the test first entity string; w is a ti An association matching coefficient between the target intention character string and the test first entity character string; sim () represents cosine similarity.
According to the formula(4) Calculating a second similarity coefficient score2 (d) between the target intent string and the test first entity string t ):
Figure BDA0002020504050000262
Wherein n represents the number of target intent strings; g i Representing a target graph vector; g (d) t ) A graph vector representing a test first entity string (i.e., is a test graph vector); w is a ti An association matching coefficient between the target intention character string and the test first entity character string; sim () represents cosine similarity.
The first similarity coefficient score1 (d) t ) And a second similarity coefficient score2 (d) t ) And the sum is used as a target similarity coefficient between the target intention character string and the test first entity character string:
score(d t )=score1(d t )+score2(d t ) (5)
and extracting test first entity character strings from the rest first entity character strings, determining a target similarity coefficient between the target intention character string and the newly tested first entity character string according to a formula (3), a formula (4) and a formula (5), continuously circulating, and stopping polling when each first entity character string is determined as the test first entity character string, namely determining the target similarity coefficient between the target intention character string and each first entity character string.
The formula (3) can be regarded as calculating the similarity between all the symptoms input by the user and all the symptoms corresponding to diseases, departments, medicines and operations. Equation (4) can be viewed as a knowledge graph, with all input symptoms similar to a disease, department, medication, surgery, etc. In the former part, the relevance of symptoms and diseases, departments, medicines, operations and the like is inspected by comparing the input symptoms with symptoms corresponding to diseases, departments, medicines and operations; the latter part uses knowledge map to directly calculate the similarity of symptoms and diseases, departments, medicines, operations, etc. The two methods complement each other, so that the data of the knowledge graph is fully utilized, and the flexibility of the system is enhanced.
The following is a detailed description of how to determine a graph vector for each entity string in the knowledge-graph: the knowledge graph can be regarded as a topological graph (graph), and each entity character string in the knowledge graph is a vertex of the topological graph; if the correlation matching coefficient between the two entity character strings is larger than 0, a connecting edge exists between the two entity character strings, and if the correlation matching coefficient between the two entity character strings is equal to 0, the connecting edge does not exist between the two entity character strings. Initializing a graph vector of each entity character string in the knowledge graph to serve as an original graph vector, sampling a plurality of entity character strings from all entity character strings contained in the knowledge graph in a random walk mode to serve as sample entity character strings, and acquiring the original graph vector of each sample entity character string to serve as a sample graph vector respectively.
Determining a loss value according to equation (6):
Figure BDA0002020504050000271
wherein phi (v) i ) The vector of the graph representing the vertex i represents the function, P (v) j |v i ) And the probability of the j-th entity character string and the i-th entity character string appearing in the training corpus simultaneously is represented. And finally, obtaining a new graph vector of each sample entity character string according to the new graph vector representation function, namely the adjusted graph vector, and replacing the original graph vector of the sample entity character string with the adjusted graph vector. Sampling from all entity character strings, determining a new adjustment map vector according to a formula (6) and a gradient descending rule, continuously updating the original map vector corresponding to each entity character string by taking the new adjustment map vector as an original map vector, and taking the original map vector of all the entity character strings determined last time as the map vector of each entity character string when the sampling times reach a time threshold valueAnd (4) vector quantity.
Please refer to fig. 11, which is a comparison graph of registration conversion rates provided by an embodiment of the present invention, and as can be seen from fig. 11, during a period from 12 months 1 to 12 months 9, after identifying a target intent type, determining that the registration conversion rate of service recommendation data is between 4.10% and 6.30%, and determining that the registration conversion rate of service recommendation data is between 3.00% and 4.10% directly according to a target text without identifying the target intent type; therefore, after the target intention type is identified, the registration conversion rate is improved by more than 1%, and the performance is improved by more than 20%.
Fig. 12 is a system structure diagram of text processing according to an embodiment of the present invention. When the scene of text processing is in a medical scene, the system of text recognition is divided into: data, algorithm support, and applications. The data mainly comes from three parts of dictation, authoritative knowledge and EHR of the patients. The patient dictation is used for generating linguistic data used for training a first recurrent neural network model, a second recurrent neural network model and an intention recognition model; the EHR is used for providing a data source for automatically and rapidly constructing the knowledge graph; the effect of authoritative disease knowledge is to adjust the knowledge map and the results of the system, and to provide guiding opinions from professionals and books.
The algorithm support part consists of a keyword extraction module, a knowledge graph module, an intention identification module and a matching module. When the system detects the input target text of a user, firstly, determining a target entity character string in the target text through Chinese word segmentation, entity identification, part of speech tagging, entity modification, syntax analysis and entity relationship in a keyword extraction module; meanwhile, whether the login user in the input target text is the patient can be judged according to the age information and the sex information in the target text.
Extracting a target service attribute character string corresponding to the target entity character string through a knowledge graph module; each target entity string has a service attribute corresponding to it, for example, the service attribute string is: "disease", "symptoms", "examination", "treatment", and the like; and finally, inputting the target entity character string and the target service attribute character string into an intention recognition module for intention type classification, and determining the target intention type. And after the target intention type is obtained, determining the optimal recommended service data by using the target intention type and the target entity character string, and outputting the recommended service data. For example, the diagnosis of the disease can be performed according to the type of the target intention and the target text, the diagnosed disease can be recommended service data, and the classification of the disease diagnosis is determined to be a rough diagnosis, a medium diagnosis or a fine diagnosis according to the target similarity coefficient between the diagnosed disease and the target entity character string.
In the application aspect, not only the intention understanding of the user and the reference to the disease classification system are considered, but also the doctor seeing situation is analyzed to obtain the seeing picture of each doctor, and then when the doctor needs to be recommended to the user (namely, the recommended service data is generated), a recommended doctor list can be generated according to the intention of the user, the disease classification system and the seeing picture of the doctor so as to recommend a proper doctor to the user, meanwhile, each doctor can be embedded into the database as a label with good illness as well, and when the recommended service data is determined, the best personalized answer is returned to the question of the user.
Compared with the method that the service data are determined by manually searching data and manually executing multiple service behaviors, the method and the device for generating the service data meet the psychological expectation of the user in an automatic mode can save the time consumed by the user for executing the multiple service behaviors and further improve the efficiency of acquiring the service data.
Further, please refer to fig. 13, which is a schematic structural diagram of a text processing apparatus according to an embodiment of the present invention. As shown in fig. 13, the text processing apparatus 1 may be applied to the terminal device in the above-described embodiments corresponding to fig. 3 to 12, and the text processing apparatus 1 may include: the device comprises an acquisition module 11, a search module 12, an identification module 13 and a generation module 14.
The acquisition module 11 is used for acquiring a target text and acquiring a knowledge graph; the knowledge graph comprises a plurality of entity character strings and a service attribute character string corresponding to each entity character string;
a searching module 12, configured to search the entity character strings in the plurality of entity character strings for a target entity character string matching the target text;
the obtaining module 11 is further configured to extract a target service attribute character string corresponding to the target entity character string;
the identification module 13 is configured to identify a target intention type matched with the target text according to the target entity character string and the target service attribute character string;
and the generating module 14 is configured to determine a target intention character string associated with the target intention type from the target entity character string, and generate recommended service data according to the target intention character string.
For specific functional implementation manners of the obtaining module 11, the searching module 12, the identifying module 13, and the generating module 14, reference may be made to steps S101 to S104 in the embodiment corresponding to fig. 3, which is not described herein again.
Referring to fig. 13, the lookup module 12 may include: an extracting unit 121, a mapping unit 122, and a searching unit 123.
An extracting unit 121 configured to extract a target keyword from the target text;
a mapping unit 122, configured to map the target keyword into a map labeling entity character string;
a searching unit 123, configured to search, from the entity character strings, an entity character string that is the same as the map labeling entity character string as a target entity character string matching the target text.
For specific functional implementation manners of the extracting unit 121, the mapping unit 122, and the searching unit 123, reference may be made to step S102 in the embodiment corresponding to fig. 3, which is not described herein again.
Referring to fig. 13, the extracting unit 121 may include: a dividing subunit 1211, an encoding subunit 1212 and a splicing subunit 1213.
A dividing subunit 1211 configured to divide the target text into a plurality of target unit characters and convert each of the target unit characters into a target unit character vector;
a coding subunit 1212, configured to perform bidirectional cyclic coding on the multiple target unit character vectors based on a coding layer in the first cyclic neural network model, to obtain a forward coding matrix and a reverse coding matrix;
a splicing subunit 1213, configured to splice the forward encoding matrix and the reverse encoding matrix into a hidden state matrix;
the splicing subunit 1213 is further configured to perform sequence tagging on the hidden state matrix based on the conditional random field in the first recurrent neural network, and determine part-of-speech tags corresponding to each target unit character;
the concatenation subunit 1213 is further configured to determine the target keyword according to the part-of-speech tag corresponding to each target unit character.
The specific functional implementation manners of the dividing subunit 1211, the encoding subunit 1212 and the splicing subunit 1213 may refer to step S102 in the embodiment corresponding to fig. 3, which is not described herein again.
Referring to fig. 13, the mapping unit 122 may include: a conversion subunit 1221, a decoding subunit 1222.
A conversion subunit 1221, configured to divide the target keyword into a plurality of key unit characters, and convert each key unit character into a key unit character vector;
the converting subunit 1221 is further configured to encode, based on an encoding layer in the second recurrent neural network model, multiple key unit character vectors to obtain a context vector of the target keyword;
a decoding subunit 1222, configured to decode the context vector based on a decoding layer in the second recurrent neural network model to obtain a hidden state vector of the context vector;
the decoding subunit 1222 is further configured to identify the hidden state vector, obtain a character sequence corresponding to the hidden state vector, and determine the character sequence as the map annotation entity character string.
The specific functional implementation manners of the converting subunit 1221 and the decoding subunit 1222 may refer to step S102 in the embodiment corresponding to fig. 3, which is not described herein again.
Referring to fig. 13, the identification module 13: a conversion unit 131, a combination unit 132, and a recognition unit 133.
A converting unit 131, configured to convert the target entity character string into a target entity word vector;
a combining unit 132, configured to convert the target service attribute character string into a target service attribute word vector;
the combining unit 132 is further configured to combine the target entity word vector and the target service attribute word vector into an input vector, and perform convolution and pooling on the input vector based on a convolution layer and a pooling layer in an intention identification model to obtain an intention feature vector of the target text;
an identifying unit 133, configured to identify, based on the classifier in the intention identification model, matching probabilities between the intention feature vector and multiple intention types in the classifier, and determine, from the multiple matching probabilities, an intention type corresponding to a maximum matching probability as the target intention type.
For specific functional implementation manners of the converting unit 131, the combining unit 132, and the identifying unit 133, reference may be made to step S103 in the embodiment corresponding to fig. 3, which is not described herein again.
Referring to fig. 13, the converting unit 131 may include: a lookup subunit 1311, and a dimension reduction subunit 1312.
A searching subunit 1311, configured to search, from an entity bag, a unique hot code corresponding to the target entity character string as a first vector; the entity word bag comprises the entity character strings in the knowledge graph and unique hot codes respectively corresponding to the entity character strings;
a dimension reduction subunit 1312, configured to perform dimension reduction on the first vector based on a hidden layer in an entity word vector conversion model to obtain the target entity word vector.
The specific functional implementation manners of the searching subunit 1311 and the dimension reducing subunit 1312 may refer to step S103 in the embodiment corresponding to fig. 3.
Referring to fig. 13, the generating module 14 may include: an acquisition unit 141, a determination unit 142, and a generation unit 143.
An obtaining unit 141, configured to obtain a service attribute represented by the target intention type as an intention service attribute;
a determining unit 142, configured to use a target entity string having the intention service attribute as the target intention string;
a generating unit 143, configured to search for the service data associated with the target intention string, and use the service data associated with the target intention string as the recommended service data.
For specific functional implementation manners of the obtaining unit 141, the determining unit 142, and the generating unit 143, reference may be made to step S204 to step S206 in the embodiment corresponding to fig. 9, which is not described herein again.
Referring to fig. 13, the generating unit 143 may include: a first extraction subunit 1431, a selection subunit 1432, a determination subunit 1433.
A first extraction subunit 1431, configured to, when the target intent type belongs to a semantic reasoning type, extract a plurality of first entity character strings from a plurality of entity character strings of the knowledge graph; the first entity string is an entity string of the plurality of entity strings of the knowledge-graph other than the entity string having the intent-service attribute;
a selecting subunit 1432, configured to determine, in the knowledge-graph, a target similarity coefficient between the target intent string and each first entity string;
the first extracting subunit 1431, further configured to select, according to a plurality of target similarity coefficients, a target first entity character string matching the target intent type from the plurality of first entity character strings;
the determining subunit 1433 is configured to find the service data associated with the target first entity string, and use the service data associated with the target first entity string as the recommended service data.
The specific functional implementation manners of the first extracting sub-unit 1431, the selecting sub-unit 1432, and the determining sub-unit 1433 may refer to step S206 in the embodiment corresponding to fig. 9, which is not described herein again.
Referring to fig. 13, the knowledge-graph further includes correlation matching coefficients between a plurality of entity strings; the correlation matching coefficient is obtained through data set statistics related to the knowledge graph;
the selection subunit 1432 may include: start subunit 14321, stop subunit 14322.
A start subunit 14321 configured to extract a test first entity string for polling from the plurality of first entity strings;
the start subunit 14321, further configured to match the association between the target intention string and the test first entity string with a coefficient as a target similarity coefficient between the target intention string and the test first entity string;
a stop subunit 14322 for stopping polling when each first entity string is determined as a test first entity string.
The specific implementation manner of the functions of the start subunit 14321 and the stop subunit 14322 may refer to step S206 in the embodiment corresponding to fig. 9, which is not described herein again.
Referring to fig. 13, the selection subunit 1432 may include: a first acquisition sub-unit 14323, a second acquisition sub-unit 14324, a second extraction sub-unit 14325, and a third extraction sub-unit 14326.
A first obtaining subunit 14323, configured to use a plurality of entity character strings in the knowledge graph and having the intended service attribute as second entity character strings;
a second obtaining subunit 14324, configured to obtain a first graph vector of each first entity string;
the first obtaining subunit 14323 is further configured to obtain a second word vector and a second graph vector of each second entity character string;
a second extracting subunit 14325, configured to extract, from the multiple second word vectors, a second word vector corresponding to the target intention character string as a target word vector, and extract, from the multiple second graph vectors, a second graph vector corresponding to the target intention character string as a target graph vector;
a third extracting subunit 14326, configured to determine a target similarity coefficient between the target intent character string and each first entity character string according to the target entity word vector, the target graph vector, the plurality of first graph vectors, and the plurality of second word vectors.
The specific functional implementation manners of the first obtaining subunit 14323, the second obtaining subunit 14324, the second extracting subunit 14325, and the third extracting subunit 14326 may refer to step S206 in the embodiment corresponding to fig. 9, which is not described herein again.
Referring to fig. 13, the knowledge-graph further includes correlation matching coefficients between the plurality of entity strings;
the second acquisition subunit 14324 may include: an initialization subunit 143241 and a sampling subunit 143242.
An initialization subunit 143241, configured to initialize an original graph vector corresponding to each entity character string in the knowledge graph;
a sampling subunit 143242, configured to sample multiple sample entity character strings from the knowledge graph, obtain an original graph vector corresponding to each sample entity character string, as a sample graph vector, respectively update each sample graph vector by using a gradient descent rule according to an association matching coefficient between the multiple sample entity character strings, obtain an adjusted graph vector, and determine the adjusted graph vector as the original graph vector;
the sampling subunit 143242 is further configured to, when the sampling frequency reaches a frequency threshold, use the updated original graph vector corresponding to each first entity string as the first graph vector.
The specific functional implementation manner of the initialization subunit 143241 and the sampling subunit 143242 may refer to step S206 in the embodiment corresponding to fig. 9, which is not described herein again.
Referring to fig. 13, the third extraction subunit 14326 may include: a first generation subunit 143261 and a second generation subunit 143262.
A first generating subunit 143261, configured to extract a test first entity string for polling from the plurality of first entity strings, and use a first graph vector corresponding to the test first entity string as a test graph vector;
the first generating subunit 143261 is further configured to determine, according to the association matching coefficient, a first similarity coefficient between a second word vector of a second entity character string adjacent to the test first entity character string in the knowledge graph and the target word vector;
the first generation submenu 143261 yuan is further used for determining a second similarity coefficient between the test chart vector and the target chart vector according to the correlation matching coefficient;
a second generating subunit 143262, configured to generate a target similarity coefficient between the target intention string and the test first entity string according to the first similarity coefficient and the second similarity coefficient;
the second generating subunit 143262 is further configured to stop polling when each of the first entity character strings is determined as a test first entity character string.
The specific functional implementation manner of the first generating subunit 143261 and the second generating subunit 143262 may refer to step S206 in the embodiment corresponding to fig. 9, which is not described herein again.
Further, please refer to fig. 14, which is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The terminal device in the embodiments corresponding to fig. 3 to fig. 11 may be an electronic device 1000, and as shown in fig. 14, the electronic device 1000 may include: a user interface 1002, a processor 1004, an encoder 1006, and a memory 1008. Signal receiver 1016 is used to receive or transmit data via cellular interface 1010, WIFI interface 1012. The encoder 1006 encodes the received data into a computer-processed data format. The memory 1008 has stored therein a computer program by which the processor 1004 is arranged to perform the steps of any of the method embodiments described above. The memory 1008 may include volatile memory (e.g., dynamic random access memory DRAM) and may also include non-volatile memory (e.g., one time programmable read only memory OTPROM). In some examples, the memory 1008 can further include memory located remotely from the processor 1004, which can be connected to the electronic device 1000 via a network. The user interface 1002 may include: a keyboard 1018, and a display 1020.
In the electronic device 1000 shown in fig. 14, the processor 1004 may be configured to call the memory 1008 to store a computer program to implement:
acquiring a target text and acquiring a knowledge graph; the knowledge graph comprises a plurality of entity character strings and a service attribute character string corresponding to each entity character string;
searching a target entity character string matched with the target text in the entity character strings, and extracting a target service attribute character string corresponding to the target entity character string;
identifying a target intention type matched with the target text according to the target entity character string and the target service attribute character string;
and determining a target intention character string associated with the target intention type from the target entity character string, and generating recommended service data according to the target intention character string.
It should be understood that the electronic device 1000 described in the embodiment of the present invention may perform the description of the text processing method in the embodiment corresponding to fig. 3 to fig. 12, and may also perform the description of the text processing apparatus 1 in the embodiment corresponding to fig. 13, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.
Furthermore, it is to be noted here that: an embodiment of the present invention further provides a computer storage medium, where the computer program executed by the text processing apparatus 1 mentioned above is stored in the computer storage medium, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the text processing method in the embodiment corresponding to fig. 3 to 12 can be executed, and therefore, details will not be repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer storage medium to which the present invention relates, reference is made to the description of the method embodiments of the present invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims (14)

1. A method of text processing, comprising:
acquiring a target text and acquiring a knowledge graph; the knowledge graph comprises a plurality of entity character strings and a service attribute character string corresponding to each entity character string;
searching a target entity character string matched with the target text in the entity character strings, and extracting a target service attribute character string corresponding to the target entity character string;
identifying a target intention type matched with the target text according to the target entity character string and the target service attribute character string;
determining a target intention character string associated with the target intention type from the target entity character string, and generating recommended service data according to the target intention character string;
wherein the determining a target intention character string associated with the target intention type from the target entity character string and generating recommended service data according to the target intention character string includes:
acquiring a service attribute represented by the target intention type as an intention service attribute;
taking the target entity character string with the intention service attribute as the target intention character string;
and searching business data associated with the target intention character string, and taking the business data associated with the target intention character string as the recommended business data.
2. The method of claim 1, wherein the finding a target entity string in the plurality of entity strings that matches the target text comprises:
extracting target keywords from the target text;
mapping the target key words into map labeling entity character strings;
and searching the entity character strings which are the same as the map labeling entity character strings from the entity character strings to serve as target entity character strings matched with the target text.
3. The method of claim 2, wherein extracting the target keyword from the target text comprises:
dividing the target text into a plurality of target unit characters, and converting each target unit character into a target unit character vector;
based on a coding layer in a first cyclic neural network model, performing bidirectional cyclic coding on a plurality of target unit character vectors to obtain a forward coding matrix and a reverse coding matrix;
splicing the forward coding matrix and the reverse coding matrix into a hidden state matrix;
performing sequence labeling on the hidden state matrix based on the conditional random field in the first recurrent neural network, and determining part-of-speech tags corresponding to each target unit character respectively;
and determining the target keywords according to the part-of-speech labels respectively corresponding to each target unit character.
4. The method of claim 2, wherein mapping the target keyword to a graph labeled entity string comprises:
dividing the target keyword into a plurality of key unit characters, and converting each key unit character into a key unit character vector;
coding a plurality of key unit character vectors based on a coding layer in a second recurrent neural network model to obtain a context vector of the target keyword;
decoding the context vector based on a decoding layer in the second recurrent neural network model to obtain a hidden state vector of the context vector;
and identifying the hidden state vector to obtain a character sequence corresponding to the hidden state vector, and determining the character sequence as the map labeling entity character string.
5. The method of claim 1, wherein the identifying a target intent type matching the target text according to the target entity string and the target business attribute string comprises:
converting the target entity character string into a target entity word vector, and converting the target service attribute character string into a target service attribute word vector;
combining the target entity word vector and the target service attribute word vector into an input vector, and performing convolution and pooling on the input vector based on a convolution layer and a pooling layer in an intention identification model to obtain an intention feature vector of the target text;
based on a classifier in the intention recognition model, matching probabilities between the intention feature vector and a plurality of intention types in the classifier are recognized, and an intention type corresponding to the maximum matching probability is determined as the target intention type from the plurality of matching probabilities.
6. The method of claim 5, wherein converting the target entity string into a target entity word vector comprises:
searching a unique hot code corresponding to the target entity character string from the entity word bag to serve as a first vector; the entity word bag comprises the entity character strings in the knowledge graph and unique hot codes respectively corresponding to the entity character strings;
and performing dimension reduction on the first vector based on a hidden layer in an entity word vector conversion model to obtain the target entity word vector.
7. The method of claim 1, wherein the searching for the business data associated with the target intent string and using the business data associated with the target intent string as the recommended business data comprises:
extracting a plurality of first entity character strings from a plurality of entity character strings of the knowledge graph when the target intent type belongs to a semantic reasoning type; the first entity string is an entity string of the plurality of entity strings of the knowledge-graph other than the entity string having the intent-service attribute;
determining a target similarity coefficient between the target intention character string and each first entity character string in the knowledge graph, and selecting a target first entity character string matched with the target intention type from the plurality of first entity character strings according to a plurality of target similarity coefficients;
and searching the business data associated with the target first entity character string, and taking the business data associated with the target first entity character string as the recommended business data.
8. The method of claim 7, wherein the knowledge-graph further comprises associative matching coefficients between a plurality of entity strings; the correlation matching coefficient is obtained through data set statistics related to the knowledge graph;
the determining, in the knowledge-graph, a target similarity coefficient between the target intent string and each first entity string includes:
extracting a test first entity string for polling from the plurality of first entity strings;
taking the correlation matching coefficient between the target intention character string and the test first entity character string as a target similarity coefficient between the target intention character string and the test first entity character string;
when each first entity string is determined to be a test first entity string, polling is stopped.
9. The method of claim 7, wherein determining, in the knowledge-graph, a target similarity coefficient between the target intent string and each first entity string comprises:
taking a plurality of entity character strings with the intention service attribute in the knowledge graph as second entity character strings;
acquiring a first graph vector of each first entity character string, and acquiring a second word vector and a second graph vector of each second entity character string;
extracting a second word vector corresponding to the target intention character string from the plurality of second word vectors to serve as a target word vector, and extracting a second graph vector corresponding to the target intention character string from the plurality of second graph vectors to serve as a target graph vector;
and determining a target similarity coefficient between the target intention character string and each first entity character string according to the target entity word vector, the target map vector, the plurality of first map vectors and the plurality of second word vectors.
10. The method of claim 9, wherein the knowledge-graph further comprises associative matching coefficients between the plurality of entity strings;
the obtaining a first graph vector of each first entity string includes:
initializing an original graph vector corresponding to each entity character string in the knowledge graph;
sampling a plurality of sample entity character strings from the knowledge graph, acquiring an original graph vector corresponding to each sample entity character string as a sample graph vector, respectively updating each sample graph vector by adopting a gradient descent rule according to the association matching coefficient among the plurality of sample entity character strings to obtain an adjusted graph vector, and determining the adjusted graph vector as the original graph vector;
and when the sampling times reach a time threshold value, taking the updated original image vector corresponding to each first entity character string as the first image vector.
11. The method of claim 10, wherein determining a target similarity coefficient between the target intent string and each first entity string based on the target entity word vector, the target map vector, a plurality of first map vectors, and the plurality of second word vectors comprises:
extracting a test first entity character string for polling from the plurality of first entity character strings, and taking a first graph vector corresponding to the test first entity character string as a test graph vector;
determining a first similarity coefficient between a second word vector of a second entity character string adjacent to the test first entity character string in the knowledge graph and the target word vector according to the association matching coefficient;
determining a second similarity coefficient between the test chart vector and the target chart vector according to the correlation matching coefficient;
generating a target similarity coefficient between the target intention character string and the test first entity character string according to the first similarity coefficient and the second similarity coefficient;
when each first entity string is determined to be a test first entity string, polling is stopped.
12. A text processing apparatus, comprising:
the acquisition module is used for acquiring a target text and acquiring a knowledge graph; the knowledge graph comprises a plurality of entity character strings and a service attribute character string corresponding to each entity character string;
the searching module is used for searching a target entity character string matched with the target text in the entity character strings;
the acquisition module is also used for extracting a target service attribute character string corresponding to the target entity character string;
the identification module is used for identifying a target intention type matched with the target text according to the target entity character string and the target service attribute character string;
the generation module is used for determining a target intention character string associated with the target intention type from the target entity character string and generating recommended service data according to the target intention character string;
wherein the generating module comprises:
the acquisition unit is used for acquiring the service attribute represented by the target intention type as an intention service attribute;
a determining unit, configured to use a target entity string having the intention service attribute as the target intention string;
and the generating unit is used for searching the business data associated with the target intention character string and taking the business data associated with the target intention character string as the recommended business data.
13. An electronic device, comprising: a processor and a memory;
the processor is coupled to a memory, wherein the memory is configured to store program code and the processor is configured to invoke the program code to perform the method of any of claims 1-11.
14. A computer storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method according to any one of claims 1-11.
CN201910277438.7A 2019-04-08 2019-04-08 Text processing method and device and related equipment Active CN110069631B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910277438.7A CN110069631B (en) 2019-04-08 2019-04-08 Text processing method and device and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910277438.7A CN110069631B (en) 2019-04-08 2019-04-08 Text processing method and device and related equipment

Publications (2)

Publication Number Publication Date
CN110069631A CN110069631A (en) 2019-07-30
CN110069631B true CN110069631B (en) 2022-11-29

Family

ID=67367336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910277438.7A Active CN110069631B (en) 2019-04-08 2019-04-08 Text processing method and device and related equipment

Country Status (1)

Country Link
CN (1) CN110069631B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688483B (en) * 2019-09-16 2022-10-18 重庆邮电大学 Dictionary-based noun visibility labeling method, medium and system in context conversion
CN110674316B (en) * 2019-09-27 2022-05-31 腾讯科技(深圳)有限公司 Data conversion method and related device
CN111325037B (en) * 2020-03-05 2022-03-29 苏宁云计算有限公司 Text intention recognition method and device, computer equipment and storage medium
CN111640517B (en) * 2020-05-27 2023-05-26 医渡云(北京)技术有限公司 Medical record coding method and device, storage medium and electronic equipment
CN111785368A (en) * 2020-06-30 2020-10-16 平安科技(深圳)有限公司 Triage method, device, equipment and storage medium based on medical knowledge map
CN111785367A (en) * 2020-06-30 2020-10-16 平安科技(深圳)有限公司 Triage method and device based on neural network model and computer equipment
CN112015917A (en) * 2020-09-07 2020-12-01 平安科技(深圳)有限公司 Data processing method and device based on knowledge graph and computer equipment
CN112201350A (en) * 2020-11-11 2021-01-08 北京嘉和海森健康科技有限公司 Intelligent triage method and device and electronic equipment
CN112270188B (en) * 2020-11-12 2023-12-12 佰聆数据股份有限公司 Questioning type analysis path recommendation method, system and storage medium
CN112988953B (en) * 2021-04-26 2021-09-03 成都索贝数码科技股份有限公司 Adaptive broadcast television news keyword standardization method
CN113342930B (en) * 2021-05-24 2024-03-08 北京明略软件系统有限公司 Text representing method and device based on string vector, electronic equipment and storage medium
CN113255354B (en) * 2021-06-03 2021-12-07 北京达佳互联信息技术有限公司 Search intention recognition method, device, server and storage medium
CN113342964B (en) * 2021-06-03 2022-04-19 云南大学 Recommendation type determination method and system based on mobile service
CN113392203B (en) * 2021-06-23 2023-08-22 泰康保险集团股份有限公司 Intelligent question-answering method, intelligent question-answering device, electronic equipment and computer readable storage medium
CN113611427A (en) * 2021-08-11 2021-11-05 平安医疗健康管理股份有限公司 User portrait generation method, device, equipment and storage medium
CN113377969B (en) * 2021-08-16 2021-11-09 中航信移动科技有限公司 Intention recognition data processing system
CN114579712B (en) * 2022-05-05 2022-07-15 中科雨辰科技有限公司 Text attribute extraction and matching method based on dynamic model
CN117238527B (en) * 2023-11-14 2024-02-13 四川天府智链健康科技有限公司 Medical information management method and system based on big data

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077415A (en) * 2014-07-16 2014-10-01 百度在线网络技术(北京)有限公司 Searching method and device
CN107748757A (en) * 2017-09-21 2018-03-02 北京航空航天大学 A kind of answering method of knowledge based collection of illustrative plates
CN108153901A (en) * 2018-01-16 2018-06-12 北京百度网讯科技有限公司 The information-pushing method and device of knowledge based collection of illustrative plates
CN108304466A (en) * 2017-12-27 2018-07-20 中国银联股份有限公司 A kind of user view recognition methods and user view identifying system
CN108565019A (en) * 2018-04-13 2018-09-21 合肥工业大学 Multidisciplinary applicable clinical examination combined recommendation method and device
CN109033305A (en) * 2018-07-16 2018-12-18 深圳前海微众银行股份有限公司 Question answering method, equipment and computer readable storage medium
CN109065129A (en) * 2018-07-04 2018-12-21 平安科技(深圳)有限公司 Department's recommended method, device, computer equipment and storage medium
CN109213844A (en) * 2018-08-13 2019-01-15 腾讯科技(深圳)有限公司 A kind of text handling method, device and relevant device
CN109285030A (en) * 2018-08-29 2019-01-29 深圳壹账通智能科技有限公司 Products Show method, apparatus, terminal and computer readable storage medium
WO2019024162A1 (en) * 2017-08-04 2019-02-07 平安科技(深圳)有限公司 Intention obtaining method, electronic device, and computer-readable storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077415A (en) * 2014-07-16 2014-10-01 百度在线网络技术(北京)有限公司 Searching method and device
WO2019024162A1 (en) * 2017-08-04 2019-02-07 平安科技(深圳)有限公司 Intention obtaining method, electronic device, and computer-readable storage medium
CN107748757A (en) * 2017-09-21 2018-03-02 北京航空航天大学 A kind of answering method of knowledge based collection of illustrative plates
CN108304466A (en) * 2017-12-27 2018-07-20 中国银联股份有限公司 A kind of user view recognition methods and user view identifying system
CN108153901A (en) * 2018-01-16 2018-06-12 北京百度网讯科技有限公司 The information-pushing method and device of knowledge based collection of illustrative plates
CN108565019A (en) * 2018-04-13 2018-09-21 合肥工业大学 Multidisciplinary applicable clinical examination combined recommendation method and device
CN109065129A (en) * 2018-07-04 2018-12-21 平安科技(深圳)有限公司 Department's recommended method, device, computer equipment and storage medium
CN109033305A (en) * 2018-07-16 2018-12-18 深圳前海微众银行股份有限公司 Question answering method, equipment and computer readable storage medium
CN109213844A (en) * 2018-08-13 2019-01-15 腾讯科技(深圳)有限公司 A kind of text handling method, device and relevant device
CN109285030A (en) * 2018-08-29 2019-01-29 深圳壹账通智能科技有限公司 Products Show method, apparatus, terminal and computer readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Leveraging knowledge graphs for web-scale unsupervised semantic parsing;Larry Heck等;《Proceedings of interspeech》;20130831;1-11页 *
一种基于知识图谱的用户搜索意图挖掘方法的研究;石刚;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170215;I138-4519 *

Also Published As

Publication number Publication date
CN110069631A (en) 2019-07-30

Similar Documents

Publication Publication Date Title
CN110069631B (en) Text processing method and device and related equipment
US11810671B2 (en) System and method for providing health information
CN112992317B (en) Medical data processing method, system, equipment and medium
US10824653B2 (en) Method and system for extracting information from graphs
US20200097814A1 (en) Method and system for enabling interactive dialogue session between user and virtual medical assistant
CN110114764B (en) Providing dietary assistance in conversation
CN111538894B (en) Query feedback method and device, computer equipment and storage medium
WO2021139232A1 (en) Medical knowledge graph-based triage method and apparatus, device, and storage medium
CN110364234B (en) Intelligent storage, analysis and retrieval system and method for electronic medical records
CN110476169B (en) Providing emotion care in a conversation
CN112131393A (en) Construction method of medical knowledge map question-answering system based on BERT and similarity algorithm
WO2023029506A1 (en) Illness state analysis method and apparatus, electronic device, and storage medium
US11900518B2 (en) Interactive systems and methods
CN113704428B (en) Intelligent inquiry method, intelligent inquiry device, electronic equipment and storage medium
US11468989B2 (en) Machine-aided dialog system and medical condition inquiry apparatus and method
WO2023165012A1 (en) Consultation method and apparatus, electronic device, and storage medium
CN111755118A (en) Medical information processing method, medical information processing device, electronic equipment and storage medium
KR102246827B1 (en) A Symptom Recognition Method of Diseases for Senior User Chatbot Based on Language Model
EP3901875A1 (en) Topic modelling of short medical inquiries
CN116453674A (en) Intelligent medical system
CN113657086B (en) Word processing method, device, equipment and storage medium
CN111524515A (en) Voice interaction method and device, electronic equipment and readable storage medium
US20240028838A1 (en) Speech signal processing using artificial intelligence
US20240087687A1 (en) Systems and methods for ontology matching
JP2023514023A (en) Question retrieval device, question retrieval method, device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant