CN116992002A

CN116992002A - Intelligent care scheme response method and system

Info

Publication number: CN116992002A
Application number: CN202311021783.7A
Authority: CN
Inventors: 肖文洁; 俞静娴; 黄婷; 钱琨; 王治勋; 韩士斌
Original assignee: Zhongshan Hospital Fudan University
Current assignee: Zhongshan Hospital Fudan University
Priority date: 2023-08-14
Filing date: 2023-08-14
Publication date: 2023-11-03

Abstract

The invention provides an intelligent nursing scheme response method and system, the method comprises S2, obtaining question sentences, and inquiring candidate inquiry graph complete sets of the question sentences in a general nursing knowledge base constructed in the step S1; s3, preprocessing a question; s4, carrying out nursing information naming entity identification on the standardized question sentence by adopting deep neural network model identification based on windows, and optimizing a calculation formula of a loss function; s5, carrying out entity link expansion on the entity set obtained by recognition and the general care knowledge base; s6, selecting an optimal query graph and converting the optimal query graph into answer sentences; and S7, matching and reasoning the answer sentences and the patient pathology information knowledge base to obtain nursing measures under the answer sentences, and outputting the nursing measures. According to the method, the candidate query graphs are queried in a staged mode in the generation stage of the candidate query graphs, and the candidate query graphs are screened by utilizing the pathological data information of the patient, so that a nursing scheme which is more fit with the state of illness and actual requirements of the patient is obtained, and the response accuracy of the nursing scheme is improved.

Description

Intelligent care scheme response method and system

Technical Field

The invention relates to the technical field of intelligent equipment, in particular to the technical field of rehabilitation intelligent response equipment.

Background

Nursing is of great significance for patient rehabilitation. Nursing is a specialized medical service, and comprehensive and personalized nursing services are provided for patients by comprehensively evaluating and managing physical and psychological states of the patients so as to promote rehabilitation and recovery of the patients. In particular for postoperative patients, post-operative discharge care is important for patient rehabilitation. Post-operative care includes monitoring the patient's physical condition, assisting the patient in restorative training, guiding the patient in a reasonable diet and lifestyle, and the like. Patients after discharge need to be recovered and treated for a long period of time, and the quality of discharge nursing will directly affect the rehabilitation effect of the patients. Discharge nursing is an important circle after operation, and reasonable discharge nursing can effectively promote recovery of patients and improve life quality.

The traditional diversified health science popularization and education mode can only meet part of the demands of patients and lacks pertinence, timeliness and interactivity. On the one hand, the acquisition and understanding of the health knowledge of the patient are limited, and the targeted information which can be used for the patient is difficult to find from a large amount of static information, so that the self health monitoring and home health management of the patient are greatly influenced. On the other hand, the Internet searches for medical health knowledge fish-eye mixing, a plurality of traps exist, authenticity and authority are doubtful, and the validity and the practicability of the medical knowledge are difficult to effectively distinguish by means of the knowledge of a patient. At present, the self-care of patients after discharge is still in a 'self-management' state, which is unfavorable for the rehabilitation of patients.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides an intelligent nursing scheme response method, which can rapidly give accurate answers through analyzing multidimensional information such as user questions, dialogue history, medical record information and the like, provides a comprehensive and personalized intelligent nursing scheme, helps patients to better manage and control own diseases, and improves the quality of life after discharge.

In order to achieve the above object, the present invention provides an intelligent care plan response method, comprising the steps of:

s1, constructing a patient pathology information knowledge base, a general care knowledge base, a synonym knowledge base in the care field and an attribute knowledge base;

the patient pathology information knowledge base at least comprises medical record information, age, medical history and medication information of a patient;

s2, acquiring a question, and inquiring a candidate inquiry graph corpus of the question in the general care knowledge base constructed in the step S1;

the query adopts a staged form, and entity semantic limitation, type limitation, time limitation, sequence limitation, attribute relationship limitation and association relationship construction are sequentially carried out after a main path query graph is generated;

the attribute relationship limitation includes: extracting attribute sets in sequence-limited candidate query graph sets obtained after sequence limitation, querying personalized information related to the attribute sets in a patient pathology information knowledge base, querying query graphs related to the personalized information in a general care knowledge base to obtain personalized information query graphs, screening the sequence-limited candidate query graph sets by utilizing the personalized information query graphs, removing parts, conflicting with the personalized information query graphs, of the sequence-limited candidate query graph sets, and merging the personalized information query graphs to obtain attribute relation-limited query graph sets;

The association relation construction comprises the following steps: extracting answer entities in the attribute relation restriction query graph set, querying entity node query graphs with association relation with the answer entities in a general care knowledge base, merging with the attribute relation restriction query graph set, and obtaining a candidate query graph complete set;

s3, preprocessing the question obtained in the step S2, wherein the preprocessing comprises the steps of replacing synonyms by utilizing the synonym knowledge base in the nursing field constructed in the step S1 to obtain a standardized question and obtaining a grammar analysis graph G of the question _gram And semantic component G _i And standardized question and semantic component G of question _i Encoding into a feature vector;

s4, carrying out nursing information naming entity recognition by adopting the standardized question sentence obtained in the step S3 of deep neural network model recognition based on the window to obtain an entity set;

in a deep neural network model based on a window, a care domain loss function shown in a formula (I) is adopted for calculation:

wherein n is the sample size; m is the class of the tag; eta is the activation function, is the model to sample x ⁽ⁱ⁾ Is a predictive probability distribution of (1); j. k and l are node subscripts of the window layer, the hidden layer and the output layer respectively;and->Is a parameter of the model, respectively table Weights of input layer to hidden layer and hidden layer to output layer are shown; h is the number of hidden layer nodes; c is the window size; d is the dimension of the word vector; w is a parameter of the first layer network; v is a parameter of the hidden layer; c is a regularization term identified by the nursing information entity;

s5, carrying out entity linking on the entity set obtained by the identification in the step S4 and the candidate query graph corpus constructed in the step S2 to obtain a first query set; extracting answers in the first query set as answer entities, and carrying out entity link on the answer entities and the candidate query graph corpus constructed in the step S2 to obtain a final query set;

s6, calculating the overall association score between the standardized question sentence and the final query set obtained in the step S5, selecting the answer with the highest score as an optimal query graph, and converting the optimal query graph into an answer sentence;

and S7, matching and reasoning the answer sentence obtained in the step S6 with a patient pathology information knowledge base to obtain a nursing measure corresponding to the answer sentence, and outputting an answer.

Preferably, in step S2, a parallel depth-first algorithm is used in the phased form query process to improve the query efficiency.

Preferably, in step S2, the parallel depth-first algorithm includes the following steps:

Dividing the search space into a plurality of subspaces, wherein each subspace can be searched by an independent computing unit; when searching the query graph, simultaneously starting a plurality of computing units to search in parallel; communication and synchronization are carried out among the computing units in the searching process, so that repeated computation is avoided; and merging search results of all the computing units.

Preferably, the step S2 of the phased form of the query includes the steps of:

s21, extracting an entity, a predicate, a type, a time and a sequence in a standardized statement through a dependency grammar analysis technology, and linking the entity, the predicate, the type, the time and the sequence into a general care knowledge base, wherein all the linked results are candidate query graph complete sets;

s22, screening the parts of the entity and predicate of the standardized question sentence in the candidate query graph total set to obtain a main path;

s23, adding entity semantic restrictions to the main path, querying query graphs related to all entities in the main path in the general care knowledge base constructed in the step S1, and linking the query graphs to the main path to obtain an entity restriction query graph set;

s24, adding type limitation to the entity-limited query graph set, selecting predicates directly connected with answers in the entity-limited query graph set to infer hidden types, filtering inferred hidden type related query graphs according to the type of standardized questions, and merging the filtered hidden type related query graphs with the entity-limited query graph set to obtain the type-limited query graph set;

S25, adding time limitation to the type limitation query graph set, filtering the type limitation query graph set according to the time of the standardized question, and removing query graphs in the time which does not accord with the standardized question to obtain the time limitation query graph set;

s26, adding sequence limitation to the time-limited query graph set, filtering the time-limited query graph set according to the sequence of the standardized question, and providing query graphs which do not accord with the sequence of the standardized question to obtain the sequence-limited query graph set.

Preferably, in step S4, the regularization term C identified by the care information entity in formula (i) is obtained according to the following method:

(1) Model training is carried out by using a data set NCBIDisease Corpus with disease name labels, and the difference between a general loss function calculation formula (II) and a real result is calculated to obtain an initial value C' of C;

(2) Performing cross verification on the patient pathology information knowledge base constructed in the step S1 by adopting a formula (III) added with an initial value C 'of C, and correcting the C' according to a verification result to obtain a C value;

the invention also provides an intelligent nursing scheme response system for the discharged patient, which comprises a question receiving module, a question response module and a nursing scheme output module; the question answering module comprises a computer readable storage medium of a computer program for implementing the method according to the above technical solution.

Compared with the prior art, the application has the beneficial effects that:

1. the application provides an intelligent response method and an intelligent response system for generating a nursing scheme based on personalized pathological conditions of a patient, wherein the candidate query diagram is queried in a staged mode in the generation stage of the candidate query diagram, and the candidate query diagram is screened by utilizing pathological data information of the patient so as to obtain the nursing scheme which is more fit with the illness state and actual requirements of the patient, and the response accuracy of the nursing scheme is improved.

2. According to the application, in the semantic matching process of the neural network, the loss function of the nursing information naming entity is optimized, the regularization term C for nursing information entity identification in the nursing field is introduced, the difference between the predicted result and the real result is corrected, and the accuracy of entity identification in the nursing field is improved.

3. In some preferred schemes of the application, the candidate query graphs are queried in a parallel depth-first mode in the generation process, so that the generation rate of the candidate query graphs is obviously improved, and the response speed of the nursing scheme is improved.

Drawings

Fig. 1 is a flow chart of the intelligent care scheme response method of the application.

Detailed Description

The technical scheme of the present application will be described below with reference to the accompanying drawings. It is apparent that the described examples are only some, but not all, embodiments of the application; and the structures shown in the drawings are merely schematic and do not represent a physical object. It is intended that all other embodiments obtained by those skilled in the art based on these embodiments of the present application fall within the scope of the present application.

The invention adopts a light and effective neural network model to solve the automatic question-answering task with complex semantics, and is used for improving the semantic similarity calculation between question sentences and complex query graphs. The model is based on the coding of question and predicate sequences, and represents the question and predicate sequences as semantic vectors in the same semantic space. Unlike previous approaches, the model of the present invention combines the vectors encoded by each semantic component to form a semantic vector representation of the query graph as a whole. By means of the integration method, an existing entity linking tool is improved, candidate entities obtained from questions are enriched, and the overall effect of a task is further improved. Meanwhile, in order to make up for the unequal information between the question and the semantic components, the invention adopts dependency grammar analysis to find out local signals related to a specific predicate sequence in the question, and the local signals are used as supplements for the literal information of the question, so that the model can better align the question with different semantic components. Therefore, the invention provides an effective technical scheme for solving the automatic question-answering task in the nursing field.

The invention relates to a method for constructing candidate query graphs of questions based on pattern graph ideas, and improves type semantic limitations and time semantic limitations. The method adopts a multi-stage generation mode, and constructs the query graph by using the pattern graph ideas, so that the semantic expression of the question-answering system and the complexity of the query graph are improved. In addition, the invention also provides a lightweight neural network model for calculating the semantic matching degree between the question and the query graph. The method is characterized in that continuous semantic representation of the whole complex query graph is learned for the first time in knowledge base question-answering research, so that accuracy and efficiency of a question-answering system are improved.

In addition, the invention improves the expression learning of the question, and introduces a dependency grammar path as the supplement of the word sequence information of the question. By associating the dependency grammar path with the question, the association of the question and the specific semantic component can be better embodied, and the accuracy of understanding and expressing the intention of the question by the question-answering system is improved.

In addition, the method also expands the results of the existing entity link tool through an integration method, improves the recall rate of the candidate query graph, and simultaneously keeps the entity link accuracy from being greatly influenced. By integrating the results of the existing entity linking tool, the entity information in the knowledge base can be more comprehensively captured, candidate query graphs with rich semantics are generated, and the recall capability of the question-answering system and the quality of the query graphs are improved.

In summary, the main innovation point of the invention includes constructing candidate query graphs of questions by using a multi-stage generation mode, improving type semantic limitations and time semantic limitations, providing a neural network model for learning continuous semantic representations of the whole complex query graph, introducing dependency grammar paths to improve the question representations, and expanding entity link results by an integration method. The innovation points can improve the semantic expression capability of the question-answering system, the complexity of the query graph and the recall rate, improve the accuracy and the efficiency of the system, and bring new breakthrough and progress to the research and the application of the knowledge base question-answering field.

For the problem posed by the user, the system first uses dependency grammar analysis (Dependency Parsing) techniques to extract the local signals associated with the particular predicate sequence in the problem as a complement to the literal information of the problem and encodes the problem into a semantic vector representation. The candidate entities obtained from the questions are then enriched using existing entity linking tools and improved by an integrated approach to better match the answers to the questions. And then using a neural network model to encode predicate sequences in the problem and the complex query graph, and representing the predicate sequences as semantic vectors under the same semantic space. The model combines the vectors encoded by the semantic components to form a semantic vector representation of the query graph as a whole. And finally, selecting the best answer from the candidate entities by calculating the semantic similarity between the question vector and the candidate entities and matching the entities and the relation in the query graph. Experiments of the system show that the system has strong competitiveness and accuracy on a plurality of automatic question-answer data sets.

Specifically, the invention provides an intelligent care scheme response method, which comprises the following steps:

the invention provides an intelligent nursing scheme response method, which comprises the following steps:

S1, constructing a patient pathology information knowledge base, a general care knowledge base, a synonym knowledge base in the care field and an attribute knowledge base; the patient pathology information knowledge base at least comprises medical record information, age, medical history and medication information of the patient.

the query adopts a staged form, and entity semantic limitation, type limitation, time limitation, sequence limitation, attribute relationship limitation and association relationship construction are sequentially carried out after a main path query graph is generated; preferably comprising the steps of:

1) Generating a corpus of candidate query graphs based on a phased approach

In the invention, all candidate query graphs are generated based on a staged mode (a candidate query graph complete set is generated), namely the process of generating the query graphs is divided into a plurality of stages, each stage generates the query graphs aiming at specific semantic information, and the generated results are screened and adjusted in the next stage until the candidate query graph set meeting all semantic limitations is finally obtained. The method can enable the generated candidate query graphs to be more accurate and efficient, and can be better adapted to the situations of different semantic restrictions.

In the generation process of the staged candidate query graph, the invention optimizes the strategy of candidate generation, and mainly utilizes the implicit limitation on the answer type in the query graph and the special design for maintaining and relevant to the time period facts in the knowledge base. Four different semantic restrictions are mainly considered, namely entity, type, time and sequence restrictions. For example, in a complex semantic question, an entity restriction describes the association of an answer with a known entity, and a sequential restriction describes the number that the answer has ordered in some way.

(1) Related node linking

This step finds words or phrases in question that represent related entities, types, times, sequences, and links to the knowledge base.

In the present invention, the term "entity" refers to a word or phrase related to a specific entity in a knowledge base. In identifying entities, we can judge based on the semantics of the words, the context, and domain knowledge of the problem. For example, person names, place names, organization names, etc. in question sentences generally represent entities.

In the present invention, the "type" refers to a vocabulary or phrase related to a category or classification of an entity. In identifying the type, we can rely on the classification hierarchy or type hierarchy in the knowledge base to make the determination. For example, profession, species, brands, etc. in a question typically represent the type of entity.

In the present invention, the term "time" refers to a term or phrase related to time, and may refer to a specific date, time period or time sequence. When the time is identified, the judgment can be carried out according to the semantic, the context and the time expression mode of the words. For example, the year, date, time word, etc. in a question typically represents time.

In the present invention, the term "order" refers to a word or phrase that is related to a sequence number, order, or sequential relationship between entities or events. In identifying order, we can rely on the semantics of the words, the context, and the vocabulary that indicates the order to make the determination. For example, "first", "last", "before", "after", etc. in a question typically represent a sequential relationship.

In the present invention, a "candidate query graph" is a data structure generated during question understanding and query generation to represent query semantics and semantic constraints associated with a question. The method is constructed according to semantic requirements in questions and a query graph generation strategy, aims at capturing semantic information of questions and provides support for subsequent answer generation and entity link.

The relevant nodes are used as leaf nodes of candidate query graphs (the candidate query graphs are a data structure generated in the process of question understanding and query generation and are used for representing query semantics and semantic restrictions related to questions, the relevant nodes are constructed according to semantic requirements in the questions and a query graph generation strategy, aim at capturing semantic information of questions and provide support for subsequent answer generation and entity link), and are starting points of semantic restrictions of different categories. Possible < phrase, leaf node > pairs are listed, and the same phrase may correspond to multiple candidate leaf nodes. Leaf nodes of different semantic restriction categories (entity, type, time, order) have respective linking means. The existing linking tool S-MART is used for entity linking (S-MART (Scalable Matching and Reasoning Tool) is an entity linking tool that uses a rule-based and similarity-based approach to match and link entities in questions to entities in a knowledge base S-MART also supports multiple types of entity matching and attribute matching and can handle various forms of questions including natural language text and structured queries). All possible < phrases, entities > are scored using the S-MART, and up to the top ten sets of results are retained. For type linking, considering that the number of different types in a knowledge base is limited, enumerating all phrases with the length not exceeding 5 in question sentences, calculating cosine similarity between different phrases and types according to pre-trained word vectors, and reserving at most the first twenty groups of results. For time links, all time-related words that occur in a sentence are identified by regular expressions. For sequential linking, a pre-defined adjective highest level vocabulary list (e.g., the earliest, first class highest level vocabulary describing objective facts) is utilized and the highest level vocabulary, or phrase of "ordinal words", such as "second", is matched in question. The corresponding leaf nodes represent sequence values, if the sequence values are matched with ordinal words, the sequence values are numbers corresponding to the ordinal words, and otherwise, the sequence values are 1. "earliest", 1> is the unique sequential link generated.

(2) Generating a primary path

The main path is generated by connecting an answer node as a starting point to a certain entity node (the source of the entity node is a knowledge base, and the entity node represents specific things, objects or concepts in the real world, such as characters, places, events and the like in the knowledge base. The main path is the basis of a query graph and represents the most dominant semantics of the question. The process of generating the main path is to enumerate all linked entities and their linked legal predicate sequences in the knowledge base ("legal predicate sequences" refer to valid predicate connection paths existing in the knowledge base, from answer nodes to a certain entity node through a series of predicate connections when generating the main path. The predicate sequence is 1 or 2 in length, the substance of which is to describe the association of some two entities in the multivariate relationship. The main path is the basis of a query graph and represents the most dominant semantics of the question.

In the method for generating the query graph, a main path is defined as a basis of the query graph and represents the most main semantics of the question. Since almost all factual class questions are related to at least one entity in a question, the main path is defined as the path from the answer through the predicate sequence to a node of some entity, equivalent to a query graph of a simple question. The present generates a series of candidate primary paths by enumerating all linked entities and their legal predicate sequences that are linked in a knowledge base. The predicate sequence is 1 or 2 in length, the substance of which is to describe the association of some two entities in the multivariate relationship. Question is "how can me bathe the day the earliest after discharge? "by way of example," can me bathe at the earliest days after discharge? When the question is generated and the main path is generated, the answer node and the intermediate node need to be determined first. In this example, answer node A represents the time of a bath and intermediate node v1 represents the discharge time. Where answer node a and intermediate node v1 are both variable nodes. The following is an analysis process to generate the main path:

(1) determining answer nodes: by the keyword "bathe" in question, we determine that answer node a represents the time of the bath.

(2) Determining an intermediate node: by the keyword "post discharge" in the question, we determine that the intermediate node v1 represents discharge time.

(3) Enumerating legal predicate sequences: we find legal predicate sequences in the knowledge base that connect answer node a and intermediate node v1. Depending on the semantics of the problem, we can consider the following legal predicate sequence:

time predicates between a and v1, such as "after", "post", etc. This may represent a time relationship after discharge.

Time predicates between a and a certain time node, such as "at", "time", etc. This may indicate a specific bath time.

(4) Generating a main path: by combining legal predicate sequences, we can generate a main path. For example, one possible primary path may be: answer node A is connected to a time node through an "in" predicate, which in turn is connected to intermediate node v1 through a "after" predicate.

The main path, which represents the time relationship between the bath time and the discharge time, can be obtained through the above analysis process. This main path is the basis of the query graph and represents the most dominant semantics of the question. In practical applications, multiple candidate main paths can be obtained by generating different legal predicate sequences to more comprehensively express the semantics and possible answers of the questions.

In a question-and-answer system, the variable nodes are generally used for representing information which needs to be answered in a question, such as values of answers or other attributes which need to be queried.

(3) Adding additional entity semantic restrictions

This step aims at generating a query graph with a complex structure by extending the semantic restrictions on the entity on the main path. First, a simple pattern diagram composed of only a skeleton is found by a Subject-Object Pair (Subject-Object Pair) which is a concept in linguistics for describing a relationship between a Subject and an Object in one sentence.

The following is the step of completing the search through the subject-object pair:

i. first, subjects and objects in the question are identified. A subject is typically the entity that performs the action or the subject that issued the action, while an object is the recipient of the action or the influencing object.

Based on the recognition result of the subject and object, the entity nodes related to the subject and object are found in the query graph and used as candidate starting points and end points.

For each subject-guest pair, a simple pattern graph is generated, which consists of a single predicate of the subject node to the object node. This simple pattern diagram contains only skeleton structures without other semantic restrictions.

Gradually adding the limit matched with the relation triplet on the basis of the simple mode diagram. The corresponding restrictions are connected to the existing candidate main paths by identifying the restrictions conditions in the problem, such as time, type, order, etc.

Using a recursive approach, new constraints are continually connected to existing candidate primary paths. In this way, a pattern diagram with a complex structure can be generated, which contains semantic restrictions related to the subject-object pair.

Through the steps, the semantic representation of the query graph can be expanded by utilizing the subject-object pairs, a more complex pattern graph is generated, and the structure and semantic information of the query graph are enriched so as to more accurately express the semantics of the question.

The generation of the pattern graph with the limit is to gradually add a limit matched with a Relation triplet (the Relation triplet is a basic data structure in a knowledge base and is used for representing the Relation among entities) by taking the simple pattern graph as a starting point, the Relation triplet is composed of three elements, namely a Head Entity (Head Entity), a Relation (relationship) and a Tail Entity (Tail Entity), the Relation triplet describes the association Relation between the Head Entity and the Tail Entity and specifically illustrates the semantic Relation between the Head Entity and the Tail Entity, the Relation triplet can collect and construct through a previous knowledge base, a corpus and related data sources, and the new limit is connected to the existing candidates (the existing candidates refer to the pattern graph which is generated in the generation process and comprise some nodes and edges and have a certain semantic representation) through recursion, so that the pattern graph with a complex structure is generated. The specific operation is as follows:

i. starting with a simple pattern diagram, the simple pattern diagram may contain only a small number of nodes and edges.

Traversing the set of relationship triples in a recursive or cyclic manner, each relationship triplet being examined one by one.

For each relationship triplet, the constraints therein, such as specific entity, type, time or order requirements, are checked.

if there are already nodes and edges in the current pattern graph that meet the constraint, the constraint is added to the pattern graph, expanding the semantic representation and complexity of the pattern graph.

Gradually adding the limits, and combining new limits with existing limits in a continuous iterative mode to generate a more complex mode diagram.

To avoid generating a large number of nonsensical paths, predicate path lengths are limited, up to 3. In the candidate generation process, a depth-first search mode is adopted, and a query graph is generated from simple to complex. For each query graph in the search space, a single predicate is tried to connect different variable nodes and entity nodes to construct query graphs with different complexity. Compared with a candidate generation method based on a template, the depth-first search method has higher coverage rate, and meanwhile, query graphs which cannot generate answers can be eliminated through pruning strategies, so that the candidate generation speed is improved.

(4) Adding type restrictions

The purpose of this step is to incorporate semantic information about the answer nodes in the type restrictions. The IsA predicate is used to connect a specific relevant type node to an answer node. IsA predicates are a fundamental semantic relationship in a knowledge base that is used to represent the relationship between an entity and the type to which it belongs. For example, an IsA predicate may be used to relate an entity to a category (e.g., "physiological," "psychological," "functional," etc.) to which it belongs. Through the IsA relationship, entities in the knowledge base can be organized into a hierarchical structure, so that the relationship between the entities is clearer and more orderly. Unlike previous methods, predicates directly connected with answer nodes are used for estimating the implicit types of the answer nodes, filtering is performed according to the inclusion relations of the types, semantic deviation is prevented, and the generation speed of candidate graphs is improved. Specifically, associations between types are defined using Freebase type hierarchy built by relaxed type inclusion. Specifically, a particular relevant type node is connected to an answer node by an IsA predicate, thereby inferring the implicit type of the answer node), or is not contained by any implicit type, and is considered an irrelevant type, not used for candidate generation. Therefore, the generated candidate graphs and answer nodes can be ensured to have certain semantic relevance, and the quality and speed of the candidate graphs are improved. For example, assume that there is an answer node that is a specific entity, such as "entecavir tab", that may not have a direct type label. Through the IsA predicate, we can find the type nodes related to "entecavir tablet", such as "oral", "antiviral", "inhibitory drug", etc. These types of nodes may be considered implicit types of answer nodes in that they are associated with answer nodes, but are not necessarily marked directly on answer nodes.

(5) Adding generation time and order restrictions

After completing the addition of the step (4) type restrictions, the types of all variable nodes on the main path, including explicit type restrictions and implicit types, may be determined. At this point, enumeration is subordinate to these types of particular predicates, completing the addition of time and order constraints.

"particular predicate" refers to a particular predicate sequence that is used to represent time and order constraints. These predicate sequences consist of predicates of length 2, which are used to add time and order constraints.

(1) Time limitation predicate: the time constraint may be represented by a predicate sequence of length 2. The first predicate points to a time-dependent entity node, such as a particular date or time period. The second predicate is a virtual predicate that specifies the direction of comparison with a particular time. This virtual predicate is determined by prepositions of the question that are located before time, such as "before.

(2) Order constraint predicate: the order constraint may also be represented by a predicate sequence of length 2.

The first predicate points to an integer, floating point, or time-dependent entity node, representing the basis of the ordering.

The second predicate represents a descending order of ordering, indicating the order of ordering.

The source of a particular predicate is predicate information in the knowledge base. In the query graph generation process, predicates belonging to a specific type are enumerated according to the requirement to meet the requirements of time and sequence limitation. For time constraints, time-dependent predicates, such as "start time" and "end time," may be used. For order limitation, predicates related to ordering, such as "rank" and "rank number" may be used.

(3) By using specific predicates, in conjunction with interfacing with patient data, more accurate time constraints and order constraints can be described in the query statement. This ensures that the relevant time in the question is limited to a time period, not just to the starting or ending point in time. The definition and the source of the specific predicates are defined in advance in the knowledge base, and can be set and expanded according to specific fields and application requirements.

For example, the time constraint may be represented by a predicate sequence of length 2, where a first predicate points to time and a second predicate is a virtual predicate indicating a direction of comparison with a particular time, as determined by the prepositions in the question that precede the time. Similarly, the order constraint may also be represented by a predicate sequence of length 2, where the first predicate points to an integer, floating point number, or time, and the second predicate represents a descending order. For time constraints, paired time predicates are used to describe more accurate time constraints, which are used to describe facts about a time period. By simple name matching, pairs of predicates are formed in the knowledge base, thereby ensuring that the correlation time in the question can be limited to a time period, rather than just being equivalent to the start or end time point. For example, the "post discharge" appears in the previous question, where the start time predicate is used to connect, but when the query statement is generated, both the start and stop predicates are used (by interfacing with the patient data, the discharge time to the time of day for the patient is queried), thereby ensuring that the relevant time in the question can be limited to a time period, not just equivalent. Compared with the existing system, the candidate graph generation of the step uses fewer manual rules, improves the type limitation and the time limitation, accelerates the generation speed, and describes more accurate semantic limitation.

In the generation process of the staged candidate query graphs, the invention optimizes the strategy of candidate generation, introduces semantic relations in a knowledge base besides utilizing the implicit limitation on the answer types in the query graphs, and utilizes other semantic relations in the knowledge base, such as attribute relations, association relations and the like, so as to further optimize the generation process of the candidate query graphs.

In the care knowledge base, there are other semantic relationships, such as attribute relationships and association relationships, in addition to the IsA relationships. These relationships may further guide the generation process of candidate query graphs, enhancing the semantic expressive power of the query graphs. The following is a detailed description of the processing procedure:

(4) utilization of attribute relationships: the attribute relationship describes a particular attribute or feature of an entity. In the generation of candidate query graphs, the candidate nodes may be considered restricted or filtered using attribute relationships. For example, for a certain entity node, an attribute node related to the entity can be found through an attribute relationship, and used as a leaf node in a candidate query graph. Therefore, the semantics of the query graph can be further constrained, and the quality of the candidate graph is improved.

Assume that the task of generating a candidate query graph is to answer the following questions: "what is the body temperature of patient a? "find patient-related attribute relationships in the knowledge base, such as" body temperature "attributes. By utilizing the attribute relationship, attribute nodes related to the body temperature of the patient A are screened out and used as leaf nodes in the candidate query graph. An attribute node associated with the body temperature of patient a is found by the attribute relationship. This attribute node may be denoted as "body temperature" and is connected to the entity node of patient a to form a candidate query graph. In the query graph, the attribute node acts as a leaf node, representing the body temperature information of interest in the problem.

By limiting the candidate nodes by utilizing the attribute relationship, the generated query graph is ensured to be consistent with the semantic requirement of the problem as much as possible, so that the quality of the candidate graph is improved. In the field of care, attribute relationships may relate to specific attributes in terms of physiological parameters, medical history, diagnostic results, etc. of a patient, and by utilizing these attribute relationships, more accurate and targeted candidate query graphs may be generated, providing answer candidates related to the specific attributes of the patient.

(5) Utilization of association relation: the association relationship describes an association or connection between entities. In the generation process of the candidate query graph, the introduction of more relevant entity nodes by using the association relationship can be considered. By traversing the association, other entity nodes related to the answer node can be found and added to the candidate query graph. Thus, semantic content of the query graph can be enriched, and more comprehensive answer candidates can be provided.

In the care field, associations may be used to describe associations or connections between entities, such as associations between patients and diseases. The following is an example to illustrate how the association relationship is used to introduce more relevant entity nodes:

assume that the task of generating a candidate query graph is to answer the following questions: "which diseases patient a suffers from? "in the knowledge base, the disease node related to the patient A is found by finding the entity node of the patient A and by the association relation. By traversing the association, the disease node associated with patient A can be discovered and added to the candidate query graph. The candidate query graph includes the relevant disease nodes that patient a has, thereby providing more comprehensive answer candidates.

By introducing more relevant entity nodes by utilizing the association relationship, the semantic content of the query graph is enriched, so that the query graph covers more information related to the problem. In the nursing field, the association relationship can relate to the association of the medical history, the medication condition, the operation record and the like of the patient, and through traversing and utilizing the association relationship, a candidate query graph with more relevance and comprehensiveness can be generated to provide answer candidates related to the patient.

In the candidate generation process, a depth-first search mode is adopted, and a query graph is generated from simple to complex. Meanwhile, the invention adopts a parallel depth-first search algorithm. Parallel depth-first search is used to explore different query paths simultaneously. Thus, a plurality of computing units can be utilized to search candidate query graphs in parallel, and the searching speed is increased.

The following is an explanation of the processing procedure of the parallelized search:

(1) Dividing a search space: the entire search space is divided into a plurality of subspaces, each subspace containing a portion of the possible query paths. The manner of partitioning may be based on different strategies, such as uniform partitioning, adaptive partitioning, etc. Each subspace may be searched by an independent computing unit.

(2) Parallel search: the divided subspaces are distributed to different computing units, and a plurality of computing units are started to search in parallel. Each computing unit independently executes a search algorithm, explores its assigned subspace, and generates a candidate query graph.

(3) Communication and synchronization: during parallel searching, different computing units may explore query paths with overlapping portions. To avoid duplicate computation and redundant results, communication and synchronization operations are required. Information can be exchanged between the computing units through message transmission or shared memory and the like, and accessed paths or results are shared so as to avoid repeated work.

(4) Results were combined: after all computing units complete the search task, the candidate query graphs they generate need to be combined as a result. The merging mode can be designed according to specific situations, such as merging, de-duplication, merging and sorting. Finally, the generated candidate query graphs may be used for further semantic matching and answer generation.

The pseudocode is described as follows:

in the above-described pseudo code of parallelized depth-first search, parallel loops are used to explore each neighbor node simultaneously, thereby enabling parallel computation. Each parallel subtask is independent, and conflicts between different search paths are avoided by creating new path copies. When the search reaches the leaf node, the current path is added to the candidate query graph list, and no backtracking operation is required, because each search path is independent and no shared state exists. Through parallelization calculation, a plurality of search paths can be explored simultaneously, so that the generation speed of candidate query graphs is greatly increased.

Through the parallelization depth-first search algorithm, a plurality of computing units can be utilized to search different query paths at the same time, computing resources are fully utilized, and the generation speed of candidate query graphs is increased. Parallelized searches may explore more paths in a shorter time, providing more candidate results, thereby improving the efficiency and performance of the system.

S3, preprocessing the question obtained in the step S2, wherein the preprocessing comprises the steps of replacing synonyms by utilizing the synonym knowledge base in the nursing field constructed in the step S1 to obtain a standardized question and obtaining a grammar analysis graph G of the question _gram And semantic component G _i And standardized question and semantic component G of question _i Encoding into a feature vector; wherein the semantic components (including constraint semantic components and intent semantic components) are paths in which each leaf node is located;

the pre-processed question and semantic components are encoded as feature vectors, respectively, in order to convert natural language text into numerical form that can be processed by a computer for further application to various algorithms, thereby facilitating the semantic similarity calculation of later steps (semantic similarity (semantic similarity) refers to the degree of similarity in meaning of two sentences, words or phrases;

i. Semantic component encoding method

Predicate sequences for semantic componentsFirst by word vector matrix-> Changing the original sequence into a word vector +.>Wherein |V _w The i represents the number of natural language words and d represents the word vector dimension. Then the semantic vector of the whole name sequence is calculated by means of word averaging, i.e. +.>For predicate number sequence->The whole sequence is regarded as a whole and vector matrix according to sequence level +.>Direct conversion to semantic vector representation, wherein |V _p The i represents the number of different numbered sequences in the training data. The number sequence as a whole, and the semantic reasons are represented without using vector average or cyclic neural layer of numbers, as follows: 1) According to the candidate graph generation mode, the predicate number sequence length of each semantic component is not more than 3; 2) In general, a single predicate sequence is subjected to disorder rearrangement operation, and a new sequence is illegal and cannot appear in other query graphs; 3) The number of different predicate sequences is approximately equal to the number of different predicates in the knowledge base, and the multiple increase is not caused. The vectors of the name sequence and the number sequence are added by position, resulting in a vector representation of a single predicate sequence, p=p ^(w) +p ^(id) 。

Meanwhile, in the semantic component coding, when a semantic vector of the whole noun sequence is calculated in a word average mode, weighting processing is carried out by using TF-IDF. The weighting process is described as follows

i. Word frequency statistics: a corpus based on the constructed nursing knowledge base calculates the word Frequency (TF) of each word in each document of the corpus, namely the number of times the word appears in the document. By counting the word frequency of each word, the importance degree of each word in the document can be known.

Calculating the inverse document frequency: the inverse document frequency (IDF, inverse Document Frequency) of each term is calculated. The inverse document frequency represents how rare the terms are in the overall corpus, i.e., to measure the distinctiveness and importance of the terms. The calculation mode is that the total number of the documents in the corpus is divided by the number of the documents containing the words, and the logarithm is taken. The inverse document frequency can help identify terms that are rare but of importance throughout the corpus.

TF-IDF calculation: combining the word frequency with the inverse document frequency, and calculating the TF-IDF value of each word. The TF-IDF value is equal to the word frequency times the inverse document frequency. This value can measure the importance and degree of distinction of words in a document. Words with high TF-IDF values are relatively more of a particular meaning and importance.

Weighting: and according to the calculated TF-IDF value, corresponding weight is given to each word. The weights may be used as part of the feature vector to represent the importance and semantic information of the terms in the name sequence. The weights may be used as coefficients of the word vector and multiplied by the word vector to obtain a weighted word vector representation.

By using the TF-IDF for weighting, words with higher importance and degree of distinction can be highlighted, thereby better capturing semantic information of name sequences in the care knowledge base. The weighting processing method can improve the expression capability of the feature vector, so that the computer can more accurately understand and process the nursing knowledge in the subsequent tasks of semantic similarity calculation and the like.

Coding of questions

The coding of questions requires consideration of both global and local levels, which aims to capture the meaning of a question with a particular meaningComponent p-related semantic information. And coding the global semantics of the question, wherein the input information is a question word sequence. Vectorizing the word sequence by using the same word vector matrix Ew to obtainIn a bi-directional GRU layer, the input sequence is simultaneously imported into two-directional GRU units, namely forward and reverse, the forward GRU unit calculates the output step by step in the order of the input sequence, and the reverse GRU unit calculates the reverse of the input sequence, so that the bi-directional GRU layer can simultaneously consider the context information of the current position to capture more comprehensive sequence information, and splice the last hidden states of the forward sequence and the backward sequence as semantic vectors of the whole word sequence:

In order to represent the local semantics of a question, the core is to extract information corresponding to specific semantic components. Using dependency grammar analysis in the model to find the dependency relationship between the answer and the entity in the semantic component, using another two-way GRU layer with different parameters to encode the dependency path, generating vector representationWhich contains features at the grammar level and directly related to the semantic component p. Finally, the vectors of the sentences on both granularities are added together in a position-wise manner, and the vectors to the whole question are represented by vectors corresponding to specific semantic components, < >>

Semantic merge

Given N semantic componentsIs a query graph g= { p ⁽¹⁾ ,…,p ⁽ⁿ⁾ Each semantic component has been projected onto a different vector on the same contiguous semantic space, embodying hidden features of different aspects. The application of convolutional neural networks to two-dimensional image processing has inspired that the feature representation of the image as a whole depends on whether certain local regions exist, whose patterns coincide with corresponding hidden features, and the relative positions of these local regions are ignored. In particular, in a query graph, each node is represented as a vector representing semantic information of the node, the Max Pooling operation will select a maximum value from among vectors of all nodes for each dimension (feature) to form a new vector as a combined semantic representation, resulting in a combined semantic representation of the entire query graph. Correspondingly, aiming at the semantic representation of the question corresponding to each semantic component, the maximum pooling operation is carried out, and a plurality of semantic vectors are combined into the integral representation of the question. Finally, calculating the semantic similarity degree between the question and the whole query graph by using the cosine similarity:

Based on the above framework, the semantic similarity model can make question sentences and single semantic components have comparability as much as possible, and simultaneously capture complementary semantic features among different parts of the query graph.

And S4, carrying out nursing information naming entity recognition by adopting the standardized question sentence obtained in the step S3 of deep neural network model recognition based on the window to obtain an entity set.

In the present invention, the nursing information includes information such as medical orders, nursing orders, medical records, and the like. Labeling entity samples of the defined five nursing information domains, so as to realize named entity identification in the domain, wherein the entity identification result is an entity set E= (E1, E2, E3, …, en), n is the number of entities, and ei is the entity; named entity recognition adopts a deep neural network model based on windows, and a loss function (the loss function is a function for measuring the difference between a predicted result and a real result of the model.) the objective of the deep neural network model is to minimize the loss function so that the predicted result and the real result of the model are as close as possible

In a deep neural network model based on a window, the invention adopts a care domain loss function shown in a formula (I) to calculate:

wherein n is the sample size; m is the class of the tag; eta is the activation function, is the model to sample x ⁽ⁱ⁾ Is a predictive probability distribution of (1); j. k and l are node subscripts of the window layer, the hidden layer and the output layer respectively;and->The model parameters respectively represent weights from an input layer to a hidden layer and from the hidden layer to an output layer, and the model can be prevented from excessively depending on certain characteristics by punishing the square sum of the weights, so that the generalization capability of the model is improved; h is the number of hidden layer nodes; c is the window size; d is the dimension of the word vector; w is a parameter of the first layer network; v is a parameter of the hidden layer; c is a regularization term identified by the nursing information entity;

the regularization term C identified by the nursing information entity serves as a constant and adjusts the convergence speed of the loss function. In the entity identification link, the constant C is a super-parameter in the loss function, commonly referred to as regularization term. It serves to balance two objectives: minimizing negative log likelihood loss and reducing complexity of the model. The data set NCBI Disease Corpus disclosed by the disease entity identification in the biomedical field with the labeling entity (which is a disease entity identification data set used in the biomedical field and comprises labeling of disease names) is preferentially used for training a model to obtain initial values when the experience parameters are obtained, and the parameters C with better effects are obtained after cross-validation adjustment is carried out by combining with the test data sets related to nursing obtained by the practice of the project.

In addition to minimizing negative log likelihood, L2 regularization terms of W and V are added to the loss function, wherein W is a parameter of the first layer network (the parameter of the first layer network is typically determined by the input data and the model structure), that is, the parameter of the first layer network is typically an input layer, the parameter of which mainly includes the dimension of the input data and the input weight of the model. The first layer may also include bias terms (bias) to adjust the offset of the model to further enhance the performance of the model. -V is a parameter of the hidden layer (in neural networks, the parameters of the hidden layer typically include a weight matrix and bias terms, these parameters are used to map the input data to the hidden layer, thus enabling feature extraction and nonlinear transformation). The reason is that the parameters of the softmax function are redundant, i.e., the minuscule is not unique, and the regularization term is added in order to uniqueness the solution. On the other hand, the L2 regularization is equivalent to adding Gaussian prior to the parameters from the probability perspective, controlling the variance of the parameters and punishing the parameters with overlarge punishment, and is helpful for improving the generalization capability of the model. The penalty factor lambda adjusts the weights of the regularization terms, the larger the value, the greater the penalty on large parameters. Let lambda be simply taken as c later. Note that the bias parameters b1 and b2 are not included in the regular term. The formula provided by the invention can improve the entity identification accuracy in the nursing field.

In the invention, a regularization term C for nursing information entity identification in the formula (I) is obtained according to the following method:

and when semantic merging is carried out, vector normalization is carried out on semantic representations of the question and the query graph respectively. So as to ensure that the question sentence and the query graph have similar scales on semantic representation, and facilitate subsequent similarity calculation or other semantically related operations. Respectively using formula for semantic representation of question sentence and query graph Vector normalization (MinValue, maxValue is minimum and maximum respectively) was performed.

For semantic representation of a question, it may be represented as a feature vector, which is then normalized. Normalization can make the semantic representation of the question consistent in the value range of each dimension, and avoid the overlarge influence of the larger numerical features on the overall similarity calculation in certain dimensions.

For semantic representation of the query graph, feature vectors for each node may be normalized. Therefore, the semantic representations of different nodes can be consistent in value range in each dimension, and similarity calculation or other semantic related operations between the nodes are facilitated.

S5, carrying out entity linking on the entity set obtained by the identification in the step S4 and the candidate query graph corpus constructed in the step S2 to obtain a final query set;

step S5 is to augment the existing entity linking results based on an integrated manner, the goal of the entity linking is to match specific words or phrases in the text with the entities in the knowledge base in order to obtain more information about these entities. The link probabilities may be calculated based on a variety of factors, including the context of the words in the text, the frequency of occurrence of the entities in the knowledge base, the semantic similarity between the entities and the words, and so forth.

The entity linking result is expanded based on an integrated mode to solve the problem that the generated result tends to be high in accuracy and a certain recall rate is sacrificed, and a better balance between accuracy and recall rate is found in the entity linking step. (integration means that the results of multiple entity linking tools are fused to obtain a more comprehensive and accurate entity linking result.) by combining the outputs of the multiple tools, the limitations of each tool can be remedied and a balance can be achieved between accuracy and recall. The extended entity link can further improve the overall performance of the question-answering system, and is a good supplement to the semantic matching model. First, a large table of correspondence of phrases, entities is created through the existing care knowledge base. Each pair of < phrase, entity > is then associated with a set of statistical features including the probability of linking the entity (the probability of linking an entity refers to the probability of associating a particular word or phrase with an entity in a knowledge base in a natural language processing or information retrieval task). Finally, using a two-layer fully connected linear regression model, all phrase entity pairs appearing in the S-MART link results are used as model training data to fit the S-MART link scores of each pair. After the model is trained, each pair of items in the phrase entity corresponding table calculates a virtual link score. For each problem, an entry that is not in the S-MART existing results and whose score line is K-bit ahead is selected as an extension of the entity linking results, a threshold K (threshold K refers to a superparameter in the machine learning or statistical model for controlling the decision boundary or classification threshold of the model, which is a preset value for judging whether the continuous value or probability of the model output meets or exceeds the threshold, so as to make a two-class or multi-class decision) is model superparameter (model superparameter refers to a parameter that needs to be manually set in the machine learning or statistical model for controlling the behavior and performance of the model).

The goal of entity linking is to match specific words or phrases in text with entities in the knowledge base in order to obtain more information about these entities. The link probabilities may be calculated based on a variety of factors, including the context of the words in the text, the frequency of occurrence of the entities in the knowledge base, the semantic similarity between the entities and the words, and so forth.

Taking context information into account when expanding entity links can help to improve accuracy and recall of entity links. Other semantic information of the question, the context of the question, the association between the question and the candidate entity, etc. may be utilized to assist in the decision of entity linking to obtain a more accurate linking result.

The invention carries out context modeling by constructing a long and short term memory network (LSTM) model. The model construction steps are as follows:

(1) Data preprocessing: first, care-related problem data needs to be preprocessed. This includes the steps of text cleaning, word segmentation, and vocabulary building. The question text is converted into a numerical form that the model can handle.

(2) Constructing a word embedding layer: to represent text as a vector form, word embedding techniques are used to map words into a continuous low-dimensional vector space. A word embedding layer is constructed to represent each word in a vector form.

(3) Constructing an LSTM model: LSTM is a recurrent neural network model suitable for sequence data that is capable of capturing long-term dependencies. When constructing the LSTM model, the method can be carried out according to the following steps:

i. defining an input layer: and taking the preprocessed text data as an input sequence. Each word may be represented as a vector and sequentially form an input sequence.

Definition of LSTM layer: the LSTM layer is composed of a series of LSTM cells, each of which includes an input gate, a forget gate, an output gate and a memory cell. The appropriate number of LSTM layers and number of hidden units may be selected to accommodate the complexity and task requirements of the model.

Defining an output layer: after the LSTM layer, a fully connected layer or output layer may be added for predictive or categorizing tasks. The arrangement of the output layer depends on the particular problem.

Compiling a model: compiling LSTM model using cross entropy loss function and SGD optimizer

(4) Model training: and inputting the preprocessed problem data into an LSTM model to train the model. The model weights are updated by a back propagation algorithm and optimizer to enable the model to learn gradually patterns and correlations in the problem data.

(5) Model evaluation: the model is evaluated by using the reserved test set, and performance indexes of the model on nursing related problems, such as accuracy, recall, F1 score and the like, are calculated.

By constructing the LSTM model, long-term dependencies can be modeled using its memory cells and gating mechanisms, and contextual information in the problem sequence is captured. This helps to improve understanding and questioning ability of the care questions.

to predict the best query graph from a series of candidates, the overall relevance score between question q and query graph G is represented by S (q, G). The semantic matching model of the previous subsection focuses on the similarity of predicate-path levels, while the overall relevance score also relates to more dimensional features, such as confidence of entity links, and structural features of the query graph itself. S (q, G) is a weighted sum of features at the query structure level for a series of entity links, semantic matches. The entity link feature is the sum of the link scores, the source of each link (S-MART or link extension); semantic matching features, i.e. the degree of semantic similarity S between the question in S3 and the whole query graph _rm (q, G); the query graph structure features the number of different category restrictions, the main path length, and the number of final answers output. Model training is carried out by using the maximum interval loss function, and the score difference between the query graph G+ and the query graph G-is as good as possible:

loss＝max{0,λ-S(q,G ⁺ )+S(q,G ⁺ )}

because the question-answer data set only contains correct answers and does not label the query graph, positive and negative samples are distinguished according to F1 scores corresponding to answers generated by the query graph. For each query graph with F1 score above a certain threshold (set to 0.1), it is considered as a positive sample g+ and up to 20 query graphs with lower F1 are randomly selected from the candidate set as G-, making up different sample pairs.

Converting the best query graph into answer sentences extracts specified knowledge from the constructed care knowledge base and converts the knowledge into a logic formula. And after extracting relevant information from the knowledge graph, converting the information into a logic formula. Where triples in the knowledge-graph will be encoded.

First, the entities in the information are all converted into logical variables. For example, primary liver cancer, post-operation, two weeks, taking medicine, they are encoded as b1, o1, t1, d1. Predicates are converted into conjunctions. The codes are → or ∈V.sup.with the codes.

Through the two steps, the subgraph in the knowledge graph is converted into a formula set, the formula set comprises the whole content in the subgraph, and after the code conversion, the logic reasoning module can use the code. Equation S:

S＝{b ₁ ∧o ₁ ∧t ₁ →d ₁ }

diseases are saved to variables b1, b2, respectively, by extracting information therein. The symptoms were saved to variables a1, a2. The method finally realizes the process of converting the subgraph in the knowledge graph into the logic variable. And because there are correlations, union, intersection, and negation in the variables.

Matching and reasoning the questions and answers of the user with the existing knowledge graph, and providing more accurate answers. Including in particular but not limited to ontology-based knowledge representation, rule-based reasoning, statistical-based natural language processing, filtering of answer sets, etc.

The set of formulas derived by combining the problems given by the patient with the historical dialogue and the multi-data source information of the patient is a set for guiding all possible results, and the reasoning module is used for deducing out the unconditional nursing knowledge from the set and removing the unconditional nursing knowledge. The set of formulas for which the patient gives care questions is as follows:

f ₁ ＝(b ₁ →a ₂ ∨a ₃ ∨a ₄ ∨a ₇ )∧(b ₂ →a ₅ ∧a ₆ ∨a ₇ ∨a ₈ )∧(b ₃ →a ₇ ∨a ₈ ∨a ₉ )∧(b ₁ ∧t ₁ →a ₁ )∧(b ₆ ∧u ₁ →a ₆ ∨a ₈ ∨a ₉ )∧(b ₁ →a ₇ )∧(b ₂ →a ₇ )∧(b ₂ →a ₈ )∧(b ₃ →a ₇ )∧(b ₃ →a ₈ )∧(b ₄ →b ₃ )∧(b ₅ →b ₃ )∧(b ₆ ∧u ₁ →a ₈ )∧(b ₁ →c ₁ )∧(b ₁ ∧t ₁ →c ₂ )∧(b ₂ →c ₃ )∧(b ₄ →c ₄ )∧(b ₅ →c ₅ )

The information of the user nursing evaluation is expressed as a formula, so that the information that the user has no pressure sore risk and no falling risk is obtained, and the user can know

f2 is chosen for extraction because the "contribution" of each care risk factor is calculated from the knowledge base when the platform answers, which is the probability that a risk is likely to occur. Further, from the reasoning logic, if the conjunctive is selected, the satisfaction judgment of (b1→a2, a3, a4, a 7). Lamda.f2 is necessarily false.

The expression of the logic formula f in the development program still continues with the combination in the logic variables. The different variables are combined together in an and or form and finally assigned to the f-variables to form a logical equation. And whether the disease meets the requirement or not is authenticated by the intersection of the formula f and the care measure b in the preset database in the reasoning module through the SMT solver z 3. The z3 solver is created by the solver method and can be used to generate an answer. The constraint, namely the logical equation f, is added by the add method. The check method calculates whether a solution exists after the constraint is added. There is a solution echo sat. At this time, the program puts the nursing measure b into the nursing measure candidate set, and completes the operation of screening the nursing measure by the reasoning module, so as to generate the nursing measure.

In order to verify the performance of the intelligent care scheme response method in a simple semantic scene, a question-answer experiment in the open field is carried out by using a data set of ComplexQuest (https:// github. Com/syxu 828/QuestionAnsweringingOverFB). The main purpose of the dataset is to conduct additional experiments to verify the performance of models capable of answering complex questions in a simple semantic scenario. By using this dataset, the performance of the method of the present invention in handling problems of various difficulty levels is evaluated and the ability of the question-answering system is further improved.

The average F1 score is a commonly used evaluation index for measuring the accuracy and integrity of the question answering system in answering questions. It combines accuracy (Precision) and Recall (Recall) to comprehensively evaluate the performance of the system.

In a question-answering system, accuracy refers to the correct proportion of answers given by the system, and recall refers to the proportion of answers that the system can correctly find to all correct answers. The F1 score is a harmonic average of the accuracy and recall that is more focused on the performance of the system while maintaining both high accuracy and high recall.

The step of calculating the average F1 score is to first calculate the F1 score between the answer given by the system and the standard answer for each test question, and then average the F1 scores of all questions. The higher the average F1 score, the better the accuracy and integrity of the system in answering the question.

For experimental comparison, when the end-to-end test is performed using the comp q dataset, the F1 score is calculated ignoring the case using the average F1 score as the main evaluation index. To quantify the overall performance of the system on this ComQ dataset. The relative performance of the methods of the present invention was verified by comparing the average F1 scores of the different methods or models. The calculation results are shown in Table 1, and the method of the invention improves the average F1 fraction by 1.8 over other existing methods on the CompQ data set.

TABLE 1

Experimental results show that the method of the invention has excellent effect on complex problem data sets and still maintains competitiveness on simple problem data sets. More comparison experiment results show that the integral continuous characteristic representation of the learning query graph is beneficial to improving the performance of the question-answering system.

In some preferred schemes of the invention, the system further comprises a user feedback module, and the calculation formula parameters, the threshold value and the like of the intelligent care scheme response method are optimized according to the satisfaction degree, the accuracy degree and the like of the user on the care scheme, so that the accuracy of outputting the answer is improved.

In some embodiments, the system of the present invention may implement the following functions:

(1) Data acquisition and processing: care related data, such as patient health, medical records, medication records, etc., is collected from a plurality of data sources and processed and purged to improve the quality and accuracy of the data.

(2) Conversation history: a history of the user's dialog with the platform is recorded, including the user's questions and platform's answers, for subsequent dialog reasoning and analysis.

(3) And (3) multi-information source fusion: and fusing the data of the information sources by using a correlation algorithm to obtain a comprehensive analysis result. For example, the patient's health status, medical records, and medication records are fused together to analyze the patient's disease status and treatment regimen.

(4) Dialogue reasoning and analysis: dialog reasoning and analysis is performed through analysis of user questions and dialog history to understand the user's intent and to provide corresponding answers and suggestions.

(5) Intelligent response: based on the results of the conversational reasoning and analysis, intelligent responses are provided, such as answering patient health questions, planning care, providing medication recommendations, etc.

(6) Human-computer interaction interface: and a user-friendly interaction interface is designed, so that a user can conveniently interact with the platform in a dialogue manner, and relevant nursing knowledge and resources are provided.

The platform adopts a C/B+S deployment mode, and a server side consists of an intelligent voice server and a specialized knowledge server; the B end is usually combined with an Internet hospital or a hospital Internet portal; the C-terminal is usually referred to as an intelligent terminal, which is an independent device connected to the platform service terminal through the internet.

The system provided by the invention comprises the following steps of

(1) And (3) user login: before using the platform, a user needs to log in, input a user name and a password, sweep one time, and log in by voiceprint.

(2) Asking questions: a user may ask questions to the platform in text or speech form, such as "i have had breakfast now, which rehabilitation things are to be done? "

(3) And (3) multi-information source fusion: the platform obtains relevant information from a plurality of information sources according to the questions posed by the user, including disease diagnosis, care schemes and the like.

(4) Dialog history analysis: the platform will analyze the history of previous conversations of the user and provide more personalized care advice based on the condition and health of the user.

(5) Logical reasoning and knowledge graph: the platform can perform logic reasoning, match and reason the questions and answers of the user with the existing knowledge graph, and provide more accurate answers.

(6) Answer: the platform will give corresponding answers according to the above steps, such as "do you need to take entecavir tablet on an empty stomach, ask you to take? ", patient answer: "have not yet taken medicine". ", platform response: "if the medicine is not taken yet, the medicine is taken at least two hours after meal, and you will be reminded to take the medicine after two hours. "

(7) User feedback: the user may feed back answers given by the platform, such as "this suggestion is very practical, i would try to do so. "

(8) Through the steps, the user can conveniently obtain personalized nursing suggestions, and can ask questions to the platform at any time so as to be helped in time.

Application scenario 1: when the query is a catheterized discharge patient, the system determines whether a shower is available based on the patient's background information and a record of the visit. The system may answer that the patient currently has a catheter not pulled out, disable showering and bathing, and provide a proposal for scrubbing to keep personal hygiene.

Application scenario 2: when the query is a post-discharge query of an open surgery patient, the system will determine when the shower can be performed based on the patient's background information and the record of the visit. The system may answer that the patient may be bathed three days after the disconnect and prompt the patient to confirm whether the disconnect has been made because the system has not queried a disconnect record at the hospital.

By providing personalized answers according to the specific conditions and the treatment records of the patients, the technical scheme can more accurately respond to the nursing demands of different patients, provide targeted nursing suggestions and guidance for the patients, and improve the efficiency and user satisfaction of the nursing question-answering system.

The above application scenario is only an example, and the method and system of the present invention can also be applied to intelligent care solution response of other care problems.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. An intelligent care scheme response method is characterized by comprising the following steps of:

wherein n is the sample size; m is the class of the tag; eta is the activation function, is the model to sample x ⁽ⁱ⁾ Is a predictive probability distribution of (1); j. k and l are node subscripts of the window layer, the hidden layer and the output layer respectively;and->Is a parameter of the model, and represents weights from the input layer to the hidden layer and from the hidden layer to the output layer respectively; h is the number of hidden layer nodes; c is the window size; d is the dimension of the word vector; w is a parameter of the first layer network; v is a parameter of the hidden layer; c is a regularization term identified by the nursing information entity;

2. The intelligent care regimen response method of claim 1, wherein in step S2, a parallel depth-first algorithm is employed in the phased form of the query process to increase the query efficiency.

3. The intelligent care regimen response method as recited in claim 2, wherein in step S2, the parallel depth-first algorithm comprises the steps of:

4. A method of answering a smart care regimen according to any one of claims 1-3, wherein the staged form of query of step S2 includes the steps of:

5. The intelligent care regimen response method as recited in claim 1, wherein in step S4, the regularization term C identified by the care information entity in formula (i) is obtained as follows:

6. the intelligent nursing scheme response system is characterized by comprising a question receiving module, a question response module and a nursing scheme output module;

a question answering module comprising a computer readable storage medium embodying a computer program for implementing the method of any one of claims 1 to 5.