CN111666477B - Data processing method, device, intelligent equipment and medium - Google Patents

Data processing method, device, intelligent equipment and medium Download PDF

Info

Publication number
CN111666477B
CN111666477B CN202010570923.6A CN202010570923A CN111666477B CN 111666477 B CN111666477 B CN 111666477B CN 202010570923 A CN202010570923 A CN 202010570923A CN 111666477 B CN111666477 B CN 111666477B
Authority
CN
China
Prior art keywords
information
patient
disease
path
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010570923.6A
Other languages
Chinese (zh)
Other versions
CN111666477A (en
Inventor
陈曦
于苗苗
管冲
文瑞
高文龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010570923.6A priority Critical patent/CN111666477B/en
Publication of CN111666477A publication Critical patent/CN111666477A/en
Application granted granted Critical
Publication of CN111666477B publication Critical patent/CN111666477B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The embodiment of the application discloses a data processing method, a data processing device, intelligent equipment and a computer readable storage medium. The method comprises the following steps: the method comprises the steps of obtaining description information and static attribute information of a target user, taking the description information and the static attribute information as input of a data analysis model constructed by combining artificial intelligence and machine learning, obtaining analysis results output by the data analysis model after analysis processing of the description information and the static attribute information, and outputting disease prompt information of the target user according to the analysis results. The embodiment of the application combines the artificial intelligence and the machine learning model to mine hidden information with potential value in the existing medical record data, so that the illness information of the target user can be predicted based on the information and the description information of the target user, and a diagnosis reference is provided for the target user.

Description

Data processing method, device, intelligent equipment and medium
Technical Field
The present application relates to the field of computer applications, and in particular, to a data processing method, apparatus, intelligent device, and computer readable storage medium.
Background
In daily life, people are unavoidably uncomfortable and need to see the doctor to seek medical attention. However, the medical resources in society are relatively deficient at present, and queuing is carried out for a long time for a plurality of hours, but the actual doctor diagnosis only happens in a few minutes, so that for some small diseases, the user can avoid visiting the hospital.
With the development of computer technology, network technology and electronic technology, users can obtain diagnosis guidance of doctors active in the network on the medical platform of the internet through devices such as personal computers, mobile phones and the like by means of releasing symptom information and the like, but the timeliness of the method is low, and the diagnosis guidance is not enough in time.
Disclosure of Invention
The embodiment of the application discloses a data processing method, a data processing device, intelligent equipment and a computer readable storage medium, which can predict disease information of a target user in time and provide a diagnosis reference for the user.
In one aspect, an embodiment of the present application provides a data processing method, where the method includes:
acquiring description information and static attribute information of a target user, wherein the description information is used for describing symptoms of the target user;
Taking the description information and the static attribute information as input of a data analysis model, and acquiring an analysis result which is output by the data analysis model after analysis processing is carried out on the description information and the static attribute information;
outputting disease prompt information of the target user according to the analysis result;
the data analysis model is constructed based on a heterogeneous graph neural network, and the graph structure of the heterogeneous graph neural network comprises disease data characteristic nodes, patient data characteristic nodes and symptom data characteristic nodes; the data analysis model determines patient expression characteristics and disease expression characteristics based on the disease data characteristic nodes, patient data characteristic nodes, and symptom data characteristic nodes so as to obtain the analysis result.
In one aspect, the present application provides a data processing apparatus comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring description information and static attribute information of a target user, and the description information is used for describing symptoms of the target user;
the processing unit is used for taking the description information and the static attribute information as input of a data analysis model, acquiring an analysis result output by the data analysis model after analysis processing is carried out on the description information and the static attribute information, and outputting the disease prompt information of the target user according to the analysis result;
The data analysis model is constructed based on a heterogeneous graph neural network, and the graph structure of the heterogeneous graph neural network comprises disease data characteristic nodes, patient data characteristic nodes and symptom data characteristic nodes; the data analysis model determines patient expression characteristics and disease expression characteristics based on the disease data characteristic nodes, patient data characteristic nodes, and symptom data characteristic nodes so as to obtain the analysis result.
In one aspect, the present application provides an intelligent device, including a processor, a memory, and a communication interface, where the processor, the memory, and the communication interface are connected to each other, where the memory is configured to store a computer program, the computer program includes program instructions, and the processor is configured to invoke the program instructions to execute the data processing method described above.
In one aspect, the present application provides a computer readable storage medium having stored thereon one or more instructions adapted to be loaded by a processor and to perform the data processing method described above.
In the embodiment of the application, the description information and the static attribute information of the target user are acquired, the description information and the static attribute information are used as the input of a data analysis model, the analysis result of the data analysis model, which is output after the analysis processing of the description information and the static attribute information, is acquired, and the disease prompt information of the target user is output according to the analysis result. According to the embodiment of the application, the disease information of the target user can be predicted in time through the data analysis model, and a diagnosis reference is provided for the target user.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1a is a schematic diagram of a data processing scenario according to an embodiment of the present application;
FIG. 1b is a schematic diagram of a data processing method according to an embodiment of the present application;
FIG. 2 is a flowchart of a data processing method according to an embodiment of the present application;
FIG. 3a is a schematic diagram of an information input window according to an embodiment of the present application;
FIG. 3b is a schematic diagram of a heterogeneous neural network according to an embodiment of the present application;
FIG. 3c is a schematic diagram of an interface for displaying disease prompt information according to an embodiment of the present application;
FIG. 4 is a flowchart of another data processing method according to an embodiment of the present application;
FIG. 5a is a schematic diagram of extracting atomic information according to an embodiment of the present application;
FIG. 5b is a schematic diagram of a standard symptom combination according to an embodiment of the present application;
FIG. 5c is a schematic diagram of another standard symptom combination provided by an embodiment of the present application;
FIG. 5d is a schematic diagram of a meta-path set according to an embodiment of the present application;
fig. 5e is a schematic diagram of a heterogeneous information network diagram according to an embodiment of the present application;
FIG. 6 is a flowchart of another data processing method according to an embodiment of the present application;
FIG. 7a is a schematic diagram of extracting patient expression features of a training patient according to an embodiment of the present application;
FIG. 7b is a schematic diagram of extracting disease expression features of a training patient according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an intelligent device according to an embodiment of the present application.
Detailed Description
The technical scheme in the embodiment of the application will be described below with reference to the accompanying drawings.
The embodiment of the application mainly relates to artificial intelligence (Artificial Intelligence, AI), natural language processing (Nature Language processing, NLP), machine Learning (ML) and electronic medical records (Electronic Health Records, EHR), and hidden information with potential value in the EHR can be mined by combining the AI, the NLP and the ML, so that a server can more accurately predict diseases possibly suffered by a target user according to the description information of the target user. The AI is a theory, a method, a technology and an application system which simulate, extend and extend human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
AI technology is a comprehensive discipline, and relates to a wide range of technologies, both hardware and software. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, processing technology for large applications, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
NLP is an important direction in the computer science and AI fields. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. NLP is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. NLP techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
ML is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. ML is the core of artificial intelligence, the fundamental way for computers to have intelligence, which is applied throughout the various fields of artificial intelligence. ML and deep learning typically includes techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
EHR refers to digital medical health records that are stored, managed, and transmitted with electronic devices.
In terms of data processing, embodiments of the present application also relate to graph neural networks (Graph Neural Network, GNN), heterogeneous graph neural networks (Heterogeneous Graph Neural Network, HGNN), meta paths (Meta Path) and heterogeneous information networks (Heterogeneous Information Network, HIN). Wherein, GNN is a method for processing the graph structure information based on deep learning. Graph structures model a group of objects (nodes) and relationships (edges), and because of their strong expressive power, graph analysis by machine learning methods is increasingly being considered in many fields, and GNN is more interpreted than other neural networks.
HGNN is a graph neural network that includes nodes and edges of various nature. The graphs in the real world can be modeled as heterogeneous graphs naturally, so heterogeneous graphs are widely applied in industry, and research of heterogeneous graphs is more significant in industry.
A meta-path is a path defined on a heterogeneous graph that connects two objects that contains a sequence of relationships. For network schemaWherein- >Representing node set,/->Representing a set of relationships between nodes, then node +.>To node->Meta Path of (2) may be +.>
HIN refers to an information network that contains multiple types of nodes or edges.
The embodiment of the application can obtain the disease prompt information of the user by simply inputting the description information and the attribute information of the symptoms generated by the user based on the data analysis model obtained by training and optimizing, and the disease prompt information can provide references for the user whether to visit a doctor or purchase what medicine and the like. Referring to fig. 1a in detail, fig. 1a is a schematic diagram of a scenario for data processing according to an embodiment of the present application. As shown in fig. 1a, a scene architecture diagram includes a terminal device 101 and a server 102. Wherein the target user or doctor inputs the description information of the target user through the terminal device 101. The number of terminal devices 101 may be one or more, and the form of the terminal device 101 is merely for example, and the terminal device 101 may include, but is not limited to: smart phones (such as Android phones, iOS phones, etc.), tablet computers, portable personal computers, mobile internet devices (MID for short), etc., the embodiments of the present application are not limited. The server 102 has deployed therein a data analysis model. The server 102 may be a server, a server cluster comprising a plurality of servers, or a cloud computing service center.
The data processing flow mainly comprises: the server 102 obtains description information (such as fever, arthralgia, etc.) and static attribute information (such as gender, age, etc.) of the target user, which may be input by the target user or doctor through the terminal device 101; the description information and the static attribute information of the target user are used as input of a data analysis model (such as a heterogeneous graph neural network model), and analysis results (such as 90% of cold probability, 70% of pneumonia probability and the like) which are output by the data analysis model after analysis processing is carried out on the description information and the static attribute information of the target user are obtained; and the server outputs disease prompt information of the target user according to the analysis result, wherein the disease prompt information is used for prompting the current possible disease of the target user.
Fig. 1b is a schematic diagram of a data processing method according to an embodiment of the present application. In one embodiment, the main flow of data processing includes: the target user inputs description information 'the two-day dotted point' and static attribute information (sex: male, age: 18) into the terminal device 101 through the data interaction window 103, and after the user clicks the "submit" button, the terminal device 101 sends the static attribute information and the description information to the server 102, and of course, the static attribute information may also be information input by the user when registering the corresponding user account, and the server only needs to submit the description information input this time, and the server searches the static attribute information of the user based on the account of the target user.
After receiving the description information, the server 102 may extract atomic information from the description information using a Named Entity Recognition (NER) model ("dotted", "two days", "fever"), normalize the atomic information using a standard mapping dictionary to obtain atomic standard information ("slight", "two days", "fever"), and combine the atomic standard information to obtain a standard symptom combination; the standard symptom combination is used as input of the data analysis model, and an analysis result output by the data analysis model is obtained, for example, the probability of the analysis result being a cold is 90%, disease prompt information can be generated according to the analysis result, the disease prompt information is sent to the terminal equipment 101, and after the terminal equipment 101 receives the disease prompt information, the disease prompt information is displayed to a user, for example, the disease prompt information is: you may get cold, and a display "you may get cold" in the terminal device makes a reference to the user.
In order to provide auxiliary diagnosis for medical staff and target users, the application provides a data processing method which is used for processing the description information and static attribute information of the target users to obtain disease prompt information of the target users, so that the medical staff and the target users can know possible diseases of the target users according to the disease prompt information, and the data processing method is described in detail below.
Referring to fig. 2, fig. 2 is a flowchart of a data processing method according to an embodiment of the application. The method may be performed by a smart device, which may specifically be the server 102 shown in fig. 1a, where basic information is collected by the terminal device, and where data analysis and processing is performed. Of course, in some embodiments, it may also be performed by a terminal device on which an application designed based on a corresponding data analysis model is installed, and the method of the embodiment of the present application includes the following steps.
S201: and acquiring the description information and static attribute information of the target user. The descriptive information is used for describing symptoms of the target user; for example, fever for half a day, headache with accompanying cough, etc. The static attribute information refers to personal information of the target user; such as gender, age, height, weight, etc.
In one embodiment, the target user or doctor inputs the description information of the target user (i.e., symptoms of the target user) and the static attribute information through an information input window. Fig. 3a is a schematic diagram of an information input window according to an embodiment of the present application. As shown in fig. 3a, the target user or medical staff may input the description information of the target user through an input box 301, the sex of the target user through an input box 302, and the age of the target user through an input box 303; and after the information input is completed, the description information and the static attribute information of the target user are sent to the server by clicking a submit button. Of course, the static attribute information may also be information entered by the user when registering the corresponding user account, the server searches the static attribute information of the user based on the account of the target user, and displays the searched static attribute information in the input box 302 and the input box 303 correspondingly, if the static attribute information is correct, the user only needs to input description information.
S202: and taking the description information and the static attribute information as input of a data analysis model, and acquiring an analysis result output by the data analysis model after analysis processing is carried out on the description information and the static attribute information. The data analysis model is constructed based on a heterogeneous graph neural network, wherein the heterogeneous graph neural network comprises 3 types of data characteristic nodes: disease data characteristic nodes, patient data characteristic nodes, symptom data characteristic nodes. Fig. 3b is a schematic diagram of a heterogeneous neural network according to an embodiment of the present application. As shown in fig. 3b, the heterograph neural network includes a D node (disease data feature node), an S node (symptom data feature node) and a P node (patient data feature node), and the nodes are connected through an association relationship.
Each disease data characteristic node corresponds to a disease, and the characteristics of the corresponding disease are recorded in the disease data characteristic nodes, and the characteristics of the disease can be determined according to subjects (types) to which the disease belongs, the infectivity of the disease, the hereditary property of the disease, and the like. For example, the disease corresponding to the disease data feature node D is a viral influenza, and the features carried in the disease data feature node D include subjects of the viral influenza: department of respiration; infection grade: grade 3, etc.
Similarly, each symptom data feature node corresponds to one symptom, and the symptom data feature node carries the feature of the corresponding symptom, and the feature of the symptom can be determined according to the grade of the symptom, the position to which the symptom belongs, and the like. For example, the symptom corresponding to the symptom data characteristic node S is lumbar pain, and the characteristics carried in the symptom data characteristic node S include: the part to which the symptom belongs: lumbar vertebrae; pain rating: and 7 stages. Each patient data characteristic node corresponds to a patient, and the characteristics of the patient are carried in the patient data characteristic nodes, and the characteristics of the patient can be determined according to gender, age, height, weight and the like. For example, the patient corresponding to the patient data feature node P is a small, and the features carried in the patient data feature node P include gender: man, height: 170cm.
In a heterograph neural network, these 3 classes of nodes can establish connections through associations. For example, the viral influenza corresponds to the disease data feature node D, and the fever corresponds to the symptom data feature node S, and since a symptom of fever may occur when the viral influenza is suffered, the connection between the disease data feature node D and the symptom data feature node S is established. Further, patient expression characteristics may be determined from the patient data characteristic nodes and the disease data characteristic nodes and symptom data characteristic nodes associated therewith; similarly, disease expression signatures may be determined by disease data signature nodes and patient data signature nodes and symptom data signature nodes associated therewith.
S203: and outputting disease prompt information of the target user according to the analysis result. The disease hint information is used to indicate the disease the target user may suffer from. In one embodiment, the analysis results include probabilities of the target user suffering from various diseases, and the server ranks the various diseases that the target user may suffer from in order of the probability values from high to low. And taking information of diseases arranged in front of a preset position (such as diseases with the highest probability value in the probabilities of various diseases of target users or diseases with the first three probability values) as disease prompt information, and sending the information to the terminal equipment.
Optionally, the disease prompt information may further include drug information, notes, symptom relief methods, etc. corresponding to the disease. Specifically, according to the diseases prompted by the disease prompt information, corresponding drug information, notes, symptom relief methods and the like can be found by searching a database and the like. The terminal device can display disease prompt information, and even can set a medicine network link for medicine information, so that a user can link to a corresponding selling address after clicking a corresponding medicine name. Fig. 3c is a schematic diagram of an interface for displaying disease prompt information and a schematic diagram of an interface for selling online stores to which a user clicks a drug name and then links according to an embodiment of the present application.
In the embodiment of the application, the description information and the static attribute information of the target user are acquired, the description information and the static attribute information are used as the input of a data analysis model, the analysis result of the data analysis model, which is output after the analysis processing of the description information and the static attribute information, is acquired, and the disease prompt information of the target user is output according to the analysis result. According to the embodiment of the application, the disease information of the target user can be predicted in time through the data analysis model, and a diagnosis reference is provided for the target user.
Referring to fig. 4, fig. 4 is a flowchart of another data processing method according to an embodiment of the present application, where the method may be performed by an intelligent device, and the intelligent device may specifically be the server 102 shown in fig. 1a, where basic information is collected by a terminal device, and data analysis and processing are performed in the server. Of course, in some embodiments, it may also be performed by a terminal device on which an application designed based on a corresponding data analysis model is installed, and the method of the embodiment of the present application includes the following steps.
S401: acquiring text information of a target user, extracting atomic information from the text information, and carrying out standardized processing on the atomic information to obtain a plurality of atomic standard information of the text information. The text information is used for describing symptoms of the target user; for example, fever for half a day, headache with accompanying cough, etc. The text information may be obtained by referring to the method for obtaining the information described in step S201 of fig. 2, which is not described herein. The atomic information refers to information which is not split in the text information; for example, the atomic information of "joint pain" is "joint" and "pain". The atomic information includes symptom type information for describing symptoms (such as cough, fever, etc.) of the target user, and symptom attribute information for describing characteristics of symptoms of the target user, the symptom attribute information including at least one of the following information: location information, degree information, time information. The location information is used for describing a body part to which the symptom of the target user belongs (for example, the painful part is a wrist), the degree information is used for describing the grade of the symptom of the target user (for example, fever 37.5-39 degrees is grade 1, fever 39-40 degrees is grade 2, and fever more than 40 degrees is grade 3), and the time information is used for describing the duration of the symptom of the target user (for example, fever 6 hours, pain 1 week).
In one embodiment, the server extracts atomic information from the text information using an information recognition model, which may specifically be a BERT-bismt-CRF-NER model, to recognize symptom entities, where BERT (Bidirectional Encoder Representations from Transformers, a transducer-based Bi-directional encoder characterization), bismt (Bi-directional Long Short-Term Memory, a Bi-directional long and short Term Memory network), CRF (Conditional Random Field ), NER (Named Entity Recognition, named entity recognition). Fig. 5a is a schematic diagram of extracting atomic information according to an embodiment of the present application. As shown in fig. 5a, the text information is "throat pain", and the server extracts atomic information from the text information as "throat" and "pain" using the information recognition model. The server acquires a standard mapping dictionary for standardized processing of the atomic information. The standard mapping dictionary includes a mapping relationship between atomic information and atomic standard information (e.g., atomic standard information corresponding to atomic information "dotted" is "slight"). The server replaces the atomic information according to the mapping relation between the atomic information and the atomic standard information included in the standard mapping dictionary to obtain the atomic standard information corresponding to the text information; for example, the text information is "throat uncomfortable, pain", the atomic information is "throat" and "pain" obtained by the information recognition model, and the "throat" and "pain" are normalized by the standard mapping dictionary, so that the "throat uncomfortable, pain" corresponding to the atomic standard information is "throat (converted from throat)" and "pain (converted from pain)", and the discomfort is not related to medical entities such as symptoms, diseases or parts, and the like, and therefore, the discomfort is not recognized.
In yet another embodiment, the information identification model may be optimized by the following manner, so as to improve the identification degree of the atomic information. The specific method is as follows: the method comprises the steps that a server obtains electronic medical record sample data, training and identifying characters in the electronic medical record sample data by adopting an initial identification model to obtain entity information, wherein the entity information comprises word entities or word entities (for example, the word entities are headache); constructing an atomic mapping dictionary according to the entity information, wherein the atomic mapping dictionary comprises the mapping relation between the entity information and the atomic information (such as word entity 'headache' can be further mapped into atomic information 'head' and 'pain'); and training and identifying the initial identification model again according to the atomic mapping dictionary and the electronic medical record sample data so as to obtain an information identification model.
S402: and combining the atomic standard information to obtain a standard symptom combination corresponding to the text information, and taking the standard symptom combination as the description information of the target user. In one embodiment, the server combines the atomic standard information according to semantic logic of the atomic standard information to obtain standard symptom combinations corresponding to the text information. Fig. 5b is a schematic diagram of a standard symptom combination according to an embodiment of the present application. As shown in fig. 5b, the atomic standard information is "joint", "ache" and "10 years", and the server combines the atomic standard information according to the semantic logic of each atomic standard information, so as to obtain the standard symptom combination corresponding to the text information as "10 years of soreness of joints". Fig. 5c is a schematic diagram of another standard symptom combination provided in an embodiment of the present application. As shown in fig. 5c, the atomic standard information is "sore throat" and "sore throat", and the server combines the atomic standard information according to the semantic logic of each atomic standard information, so as to obtain the standard symptom combination corresponding to the text information as "sore throat".
S403: and taking the description information and the static attribute information as input of a data analysis model, and acquiring an analysis result output by the data analysis model after analysis processing is carried out on the description information and the static attribute information. In one embodiment, the heterograph neural network includes a first set of meta-paths for generating patient expression features and a second set of meta-paths for generating disease expression features. In one embodiment, the server aggregates the data feature nodes in at least one path in the first set of meta-paths using an aggregation function to obtain the patient expression feature. The aggregation function may include, but is not limited to: an average (mean) aggregation function, a pool (pooling) aggregation function, and a short-term memory (STM) aggregation function. And similarly, the server aggregates the data characteristic nodes in at least one path in the second binary path set by adopting an aggregation function to obtain the patient expression characteristics. The paths in the first meta-path set and the second meta-path set are directed paths, the first meta-path set is a set of paths with the path end points being patient data characteristic nodes, and the second meta-path set is a set of paths with the path end points being disease data characteristic nodes.
Fig. 5d is a schematic diagram of a meta-path set according to an embodiment of the present application. As shown in fig. 5D, the D node is a disease data feature node, the D1 node and the D2 node are a first disease data feature node and a second disease data feature node, respectively, the S node is a symptom data feature node, and the P node is a patient data feature node. The first meta-path set comprises a first path and a second path, wherein the starting point of the first path is a disease data characteristic node, the middle node is a symptom data characteristic node, and the end point is a patient data characteristic node; the start point of the second path is a symptom data characteristic node, the middle node is a disease data characteristic node, and the end point is a patient data characteristic node. The second binary path set comprises a third path and a fourth path, wherein the starting point of the third path is a first disease data characteristic node, the middle node is a symptom data characteristic node, and the end point is a second disease data characteristic node, and it can be understood that the symptoms of two diseases corresponding to a D1 node and a D2 node both comprise symptoms corresponding to an S node, wherein the third path represents the relation between the two diseases comprising the same symptom; the starting point of the fourth path is a symptom data characteristic node, the middle node is a patient data characteristic node, and the end point is a disease data characteristic node.
It will be appreciated that the first set of meta-paths may also include other paths whose endpoints are patient data characteristic nodes, such as a fifth path: symptom data feature node→patient data feature node, sixth path: first symptom data characteristic node → disease data characteristic node → second symptom data characteristic node → patient data characteristic node. The patient expression signature is proportional to the number of data signature nodes contained in the path. Similarly, the second binary path set may also include paths with other endpoints being disease data characteristic nodes, the disease expression characteristic being proportional to the number of data characteristic nodes contained in the path.
S404: and outputting disease prompt information of the target user according to the analysis result. The specific embodiment of step S404 can refer to the embodiment of step S203 in fig. 2, and will not be described herein. In one embodiment, the service derives a heterogeneous information network graph (Heterogeneous Information Network) (i.e., generates a knowledge graph) from a heterogeneous graph neural network, which refers to an information network that contains multiple types of nodes or edges. Fig. 5e is a schematic diagram of a heterogeneous information network diagram according to an embodiment of the present application. As shown in fig. 5e, the heterogeneous information network diagram clearly shows the subjects (symptoms, patients, diseases) and the inter-subject interactions in the disease-assisted diagnosis. The persuasion of the disease prompt information can be enhanced through the heterogeneous information network diagram. Wherein alternate long and short lines represent a diseased relationship, such as patient 1 suffering from a cold and patient 2 suffering from a pneumonia. The dashed connecting lines indicate the relationship that occurs, such as cough in patient 1 and cough in patient 2. The connection line of the solid line indicates the relationship of manifestations such as pneumonia manifests as asthma, cough, cold manifests as cough, sore throat.
Therefore, by implementing the method described in fig. 4 and standardizing the description information, the description words are unified, so that the misdiagnosis rate caused by inaccurate description words is reduced, and the prediction accuracy is improved. In addition, through unifying the descriptive words, the vocabulary which needs to be processed by the model is reduced, and the complexity of the model is reduced.
Referring to fig. 6, fig. 6 is a flowchart of another data processing method according to an embodiment of the application. The method may be performed by a smart device, which may specifically be the server 102 shown in fig. 1a, where basic information is collected by the terminal device, and where data analysis and processing is performed. Of course, in some embodiments, it may also be performed by a terminal device on which an application designed based on a corresponding data analysis model is installed, and the method of the embodiment of the present application includes the following steps.
S601: medical record information of a training patient is obtained. The medical record information of the training patient comprises static attribute training information of the training patient, description training information and label information, wherein the label information is used for recording actual disease information of the training patient.
S602: and taking the static attribute training information and the description training information as input of an initial model, and obtaining an output result of the initial model after analyzing and processing the static attribute training information and the description training information. In one embodiment, the server extracts patient expression features of the trained patient from the first set of meta-to-be-trained paths and the descriptive training information.
The patient-expressed features of the training patient are aggregated by a first patient-expressed feature of the first path to be trained and a second patient-expressed feature of the second path to be trained. The first patient expression features of the first path to be trained are aggregated from at least one associated symptom expression feature on the first path to be trained, and each symptom expression feature is aggregated from at least one associated disease expression feature on the first path to be trained. The second patient expression profile of the second path to be trained is aggregated from at least one associated disease expression profile on the second path to be trained, each disease expression profile is aggregated from at least one associated symptom expression profile on the second path to be trained. The association between a patient and a symptom refers to the occurrence of the symptom (such as cough and headache of the patient), the association between the symptom and a disease refers to the possible occurrence of the symptom (such as weakness of cold and sneeze) when suffering from the disease, and the association between the patient and the disease refers to the possible occurrence of the disease (such as fever, cold and pneumonia) when suffering from the certain symptom.
Fig. 7a is a schematic diagram of extracting patient expression characteristics of a training patient according to an embodiment of the present application. As shown in FIG. 7a, a patient is trained to express a characteristic P i Is expressed by a first patient of a first path to be trained i DSP And a second patient-expressed feature P of a second path to be trained i SDP Obtained by polymerisation, i.e. P i =g(P i DSP ,P i SDP ). Which is a kind ofWhere g (x) is an aggregation function, which may include, but is not limited to: an average (mean) aggregation function, a pool (pooling) aggregation function, and a short-term memory (STM) aggregation function.
First patient of first path to be trained expresses characteristic P i DSP Is characterized by expressing the feature S through the first-order neighbor first symptom 1 And second symptom expression signature S 2 Obtained by polymerisation, i.e. P i DSP =g(S 1 ,S 2 ) The method comprises the steps of carrying out a first treatment on the surface of the Indicating that the patient has two symptoms at the same time, e.g. S 1 Fever, S 2 Cough, P i DSP =g(S 1 ,S 2 ) Indicating that the patient has symptoms of fever and cough. First symptomatic expression signature S 1 Is expressed by the first disease expression profile D 1 Obtained by polymerisation, i.e. S 1 =g(D 1 ) The method comprises the steps of carrying out a first treatment on the surface of the For example, S 1 Fever, D 1 Is common cold, S 1 =g(D 1 ) The disease that indicates the symptoms of fever are common cold. Second symptomatic expression signature S 2 Is through the expression of the characteristic D of the second disease 2 Obtained by polymerisation, i.e. S 2 =g(D 2 )。
Second patient-expressed feature P of second path to be trained i SDP Is characterized by expressing D through first-order neighbor first disease 1 And second disease expression signature D 2 Obtained by polymerisation, i.e. P i SDP =g(D 1 ,D 2 ) The method comprises the steps of carrying out a first treatment on the surface of the Indicating that the patient is simultaneously suffering from two diseases, e.g. D 1 Diabetes mellitus, D 2 Gout, P i SDP =g(D 1 ,D 2 ) Indicating that the patient has diabetes and gout. First disease expression signature D 1 Is characterized by expressing feature S through first symptoms 1 Obtained by polymerisation, i.e. D 1 =g(S 1 ). Second disease expression signature D 2 Is characterized by the expression of feature S by the second symptom 2 And second symptom expression signature S 3 Obtained by polymerisation, i.e. D 2 =g(S 2 ,S 3 ) The method comprises the steps of carrying out a first treatment on the surface of the Two symptoms of a disease are indicated, e.g. D 2 Is common cold, S 2 Fever, S 3 Cough, D 2 =g(S 2 ,S 3 ) Indicating symptoms of both fever and cough in common cold.
Similarly, the server extracts disease expression characteristics of the training patient through the second binary path set to be trained and the description training information. The disease expression characteristics of the training patient are obtained by aggregation according to the first disease expression characteristics of the third to-be-trained path and the second disease expression characteristics of the fourth to-be-trained path in the second binary to-be-trained path set. The first disease expression features of the third path to be trained are aggregated from at least one associated symptom expression feature on the third path to be trained, and each symptom expression feature is aggregated from at least one associated disease expression feature on the third path to be trained. The second disease expression features of the fourth path to be trained are aggregated from at least one associated patient expression feature on the fourth path to be trained, each patient expression feature being aggregated from at least one associated symptom expression feature on the fourth path to be trained. Fig. 7b is a schematic diagram of extracting disease expression characteristics of a training patient according to an embodiment of the present application.
As shown in FIG. 7b, the disease expression profile D of the patient is trained i Is the first disease expression profile through the third path to be trainedAnd second disease expression signature of the fourth route to be trained +.>Obtained by polymerisation, i.eFirst disease expression signature of third pathway to be trained ∈>Is characterized by expressing the feature S through the first-order neighbor first symptom 1 And second symptom expression signature S 2 Obtained by polymerisation, i.e.)>Two symptoms are indicated for a disease, e.g., +.>Is common cold, S 1 Fever, S 2 For cough, fright>Indicating that the cold has symptoms of fever and cough. First symptomatic expression signature S 1 Is expressed by the first disease expression profile D 1 Obtained by polymerisation, i.e. S 1 =g(D 1 ) The method comprises the steps of carrying out a first treatment on the surface of the For example, S 1 Fever, D 1 Pneumonia, S 1 =g(D 1 ) The disease showing fever symptoms is pneumonia. Second symptomatic expression signature S 2 Is through the expression of the characteristic D of the second disease 2 Obtained by polymerisation, i.e. S 2 =g(D 2 )。
Second disease expression signature of fourth path to be trainedIs characterized by expressing the characteristic P through a first-order neighbor first patient 1 And a second patient expresses profile P 2 Obtained by polymerisation, i.e.)>Indicating the presence of two patients with a disease, e.g. < - > for example +.>For common cold, P 1 To be small and bright, P 2 Reddish, 18->Indicating that both Xiaoming and Xiaohong had a cold. First patient expression profile P 1 Is characterized by expressing feature S through first symptoms 1 Obtained by polymerisation, i.e. P 1 =g(S 1 ) The method comprises the steps of carrying out a first treatment on the surface of the For example, P 1 For small brightness, S 1 For fever, P 1 =g(S 1 ) Indicating symptoms of fever in the small form. Second patient expression profile P 2 Is characterized by the expression of feature S by the second symptom 2 Obtained by polymerisation, i.e. P 2 =g(S 2 ). Alternatively, feature P is expressed in a first patient through a first order neighbor 1 And a second patient expresses profile P 2 Polymerization of second disease expression profile->When feature P is expressed using only the first patient 1 And a second patient expresses profile P 2 Expression of feature S from first symptom 1 And second symptom expression signature S 2 Without using the first patient data feature node P 1 And a second patient data characteristic node P 2 Patient characteristics of the patient being carried.
The server establishes connection (namely feature fusion) between the patient expression features and the disease expression features of the training patient for the static attribute training information of the training patient, and performs probability analysis on the connection relation between the patient expression features and the disease expression features of the training patient through a probability mapping function to be trained to obtain connection probability between the patient expression features and the disease expression features of the training patient. The probability mapping function to be trained may include, but is not limited to, a sigmoid function, a softplus function, and a softmax function, among others. Taking a sigmoid function as an example, the formula for calculating the connection probability between the patient expression features and the disease expression features of a training patient is:
Wherein,,to train the connection probability between the patient-expressed and disease-expressed features of the patient, P i For patient expression profile, D j Is characteristic of disease expression, S ij For symptomatic expression of features, f (x) is a feature fusion function for fusing P i ,D j And S is ij Is characterized by (3).
For example, patient expression of training patient P1 is characterized by fever, cough, runny nose, disease expression of D1 is cold, and disease expression of D2 is pneumonia; assuming that the static attribute training information for training patient P1 is 12 years old and that people under 18 years old are not prone to pneumonia, the association of disease expression signature D1 with patient expression signature is stronger (i.e., more tightly linked) than the association of disease expression signature D2 with patient expression signature. And carrying out probability analysis on the connection relation between the patient expression characteristics and the disease expression characteristics of the training patient through a probability mapping function to obtain that the connection probability of the training patient P1 and the disease expression characteristics D1 is 80% (namely, the probability of the training patient P1 suffering from common cold is 80%), and the connection probability of the training patient P1 and the disease expression characteristics D2 is 5% (namely, the probability of the training patient P1 suffering from pneumonia is 5%).
S603: and carrying out loss calculation on the connection probability and the label information by adopting a loss function, and adjusting parameters of the initial model according to the result of operation processing to obtain a data analysis model. Wherein the loss function includes, but is not limited to: hinge loss function, cross entropy loss function, exponential loss function. Taking a cross entropy function as an example, the loss calculation formula of the connection probability and the label information is as follows:
Wherein,,for prediction result (i.e.)>) And the actual result (i.e. y ij ) Error of y ij Is obtained by the tag information. In one embodiment, the server obtains a plurality of pieces of training data, and according to +_for each piece of training data>Values of (2)Parameters (such as a path set to be trained, a probability mapping function to be trained and the like) in the initial model are adjusted until the corresponding +.>The service takes the initial model obtained by adjustment as a data analysis model.
The steps S601 to S603 may be executed in a separate model training server. The initial model is trained to obtain a data analysis model by steps S601 to S603 at any time before step S403 in fig. 4 is performed, or the data analysis model is optimized by steps S601 to S603 at any time between steps S401 to S404 in fig. 4 is performed.
Therefore, by implementing the method described in fig. 6, the initial model is trained by adopting the medical record information of the trained patient, and the parameters in the initial model can be optimized to obtain the data analysis model, so that the prediction accuracy of the data analysis model is improved.
The foregoing details of the method of embodiments of the present application are provided for the purpose of better implementing the foregoing aspects of embodiments of the present application, and accordingly, the following provides an apparatus of embodiments of the present application.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a data processing apparatus according to an exemplary embodiment of the present application, where the apparatus may be mounted on an intelligent device in the foregoing method embodiment, and the intelligent device may specifically be the server 102 shown in fig. 1 a. Of course, in some embodiments, it may also be installed on a terminal device, on which an application designed based on the corresponding data analysis model is installed. The data processing device shown in fig. 8 may be used to perform some or all of the functions of the method embodiments described above with respect to fig. 2 and 4 and 6. Wherein, the detailed description of each unit is as follows:
an obtaining unit 801, configured to obtain description information and static attribute information of a target user, where the description information is used to describe symptoms of the target user;
the processing unit 802 is configured to take the description information and the static attribute information as input of a data analysis model, and obtain an analysis result output by the data analysis model after analysis processing is performed on the description information and the static attribute information; outputting disease prompt information of the target user according to the analysis result;
The data analysis model is constructed based on a heterogeneous graph neural network, and the graph structure of the heterogeneous graph neural network comprises disease data characteristic nodes, patient data characteristic nodes and symptom data characteristic nodes; the data analysis model determines patient expression characteristics and disease expression characteristics based on the disease data characteristic nodes, patient data characteristic nodes, and symptom data characteristic nodes so as to obtain the analysis result.
In one embodiment, when a data analysis model is built based on the graph structure of the heterogeneous graph neural network, a first meta-path set and a second meta-path set are built;
wherein the first and second sets of meta-paths are constructed based on the disease data feature node, the patient data feature node, and the symptom data feature node; the patient expression features are aggregated based on each data feature node in the first set of meta-paths, and the disease expression features are aggregated based on each data feature node in the second set of meta-paths.
In one embodiment, the paths in the first and second sets of meta-paths are directed paths; the first meta-path set is a set of paths with path end points being patient data characteristic nodes; the second binary path set is a set of paths with path end points being disease data characteristic nodes.
In one embodiment, the first set of meta-paths includes a first path that, when constructed, takes the disease data feature node as a starting point, the symptom data feature node as an intermediate node, the patient data feature node as an ending point, and a second path that, when constructed, takes the symptom data feature node as a starting point, the disease data feature node as an intermediate node, and the patient data feature node as an ending point;
the disease characteristic nodes comprise a first disease characteristic node and a second disease characteristic node, the second binary path set comprises a third path and a fourth path, the third path takes the first disease data characteristic node as a starting point, the symptom data characteristic node as an intermediate node when being constructed, the second disease data characteristic node as an end point, the fourth path takes the symptom data characteristic node as a starting point, the patient data characteristic node as an intermediate node and the disease data characteristic node as an end point when being constructed.
In one embodiment, the processing unit 802 is specifically configured to: acquiring description information of a target user;
acquiring text information of the target user, extracting atomic information from the text information, and carrying out standardized processing on the atomic information to obtain a plurality of atomic standard information of the text information, wherein the atomic information comprises symptom type information and symptom attribute information, the symptom type information is used for describing symptoms of the target user, and the symptom attribute information is used for describing characteristics of the symptoms of the target user;
Combining the atomic standard information to obtain standard symptom combinations corresponding to the text information;
and combining the standard symptoms as descriptive information of the target user.
In one embodiment, the symptom attribute information includes: at least one of location information, degree information, and time information; wherein the location information is used for describing a body location to which the symptom of the target user belongs, the degree information is used for describing a grade of the symptom of the target user, and the time information is used for describing duration of the symptom of the target user.
In one embodiment, the processing unit 802 is specifically configured to: extracting atom information from the text information, and carrying out standardized processing on the atom information to obtain a plurality of atom information of the text information;
extracting atomic information from the text information by adopting an information identification model;
obtaining a standard mapping dictionary, wherein the standard mapping dictionary comprises a mapping relation between atom information and atom standard information;
and obtaining target atomic standard information corresponding to the atomic information according to the atomic information and the mapping relation between the atomic information and the atomic standard information.
In one embodiment, the information recognition model is trained from electronic medical record sample data; the processing unit 802 is specifically configured to: training to obtain the information identification model;
training and identifying characters in the electronic medical record sample data by adopting an initial identification model to obtain entity information, wherein the entity information comprises word entities or word entities;
constructing an atomic mapping dictionary according to the entity information, wherein the atomic mapping dictionary comprises a mapping relation between the entity information and the atomic information;
and training and identifying the initial identification model again according to the atomic mapping dictionary and the electronic medical record sample data so as to obtain an information identification model.
In one embodiment, the processing unit 802 is further configured to:
acquiring medical record information of a training patient, wherein the medical record information comprises static attribute training information of the training patient, description training information and label information;
taking the static attribute training information and the description training information as input of an initial model, and obtaining an output result of the initial model, wherein the output result comprises connection probability between patient expression characteristics and disease expression characteristics of the training patient;
Carrying out loss calculation on the connection probability and the label information by adopting a loss function, and adjusting parameters of the initial model according to a loss calculation result so as to obtain a data analysis model according to the adjusted parameters;
the initial model extracts patient expression characteristics of the training patient through a first to-be-trained element path set and the description training information; extracting disease expression characteristics of the training patient through a second meta-path set to be trained and the description training information; the probability mapping function to be optimized is used for carrying out probability analysis on the static attribute training information, the patient expression characteristics and the disease expression characteristics of the training patient, so as to obtain the connection probability between the patient expression characteristics and the disease expression characteristics of the training patient.
In one embodiment, the patient expression features of the training patient are obtained by aggregation according to a first patient expression feature of a first to-be-trained path and a second patient expression feature of a second to-be-trained path in the first to-be-trained meta-path set;
the first patient expression features of the first path to be trained are obtained by aggregation according to at least one associated symptom expression feature on the first path to be trained, and each symptom expression feature is obtained by aggregation according to at least one associated disease expression feature on the first path to be trained;
The second patient expression features of the second path to be trained are aggregated from at least one associated disease expression feature on the second path to be trained, and each disease expression feature is aggregated from at least one associated symptom expression feature on the second path to be trained.
In one embodiment, the disease expression features of the trained patient are obtained by aggregating the first disease expression features of the third to-be-trained path and the second disease expression features of the fourth to-be-trained path in the second to-be-trained meta-path set;
the first disease expression features of the third path to be trained are obtained by aggregation according to at least one associated symptom expression feature on the third path to be trained, and each symptom expression feature is obtained by aggregation according to at least one associated disease expression feature on the third path to be trained;
the second disease expression features of the fourth path to be trained are aggregated according to at least one associated patient expression feature on the fourth path to be trained, and each patient expression feature is aggregated according to at least one associated symptom expression feature on the third path to be trained.
In one embodiment, the processing unit 802 is specifically configured to: carrying out probability analysis on the static attribute training information, the patient expression characteristics and the disease expression characteristics of the training patient through a probability mapping function to be optimized to obtain the connection probability between the patient expression characteristics and the disease expression characteristics of the training patient;
Performing feature fusion on the static attribute training information, the patient expression features and the disease expression features of the training patient;
and carrying out probability analysis on the fused features through a probability mapping function to be optimized to obtain the connection probability between the patient expression features and the disease expression features of the training patient.
According to one embodiment of the present application, part of the steps involved in the data processing methods shown in fig. 2, 4 and 6 may be performed by respective units in the data processing apparatus shown in fig. 8. For example, step S201 shown in fig. 2 may be performed by the acquisition unit 801 shown in fig. 8, and steps S202 and S203 may be performed by the processing unit 802 shown in fig. 8. Step S401 shown in fig. 4 may be performed by the acquisition unit 801 shown in fig. 8, and steps S402 to S404 may be performed by the processing unit 802 shown in fig. 8. Step S601 shown in fig. 6 may be performed by the acquisition unit 801 shown in fig. 8, and steps S602 and S603 may be performed by the processing unit 802 shown in fig. 8. The respective units in the data processing apparatus shown in fig. 8 may be individually or collectively constituted as one or several additional units, or some unit(s) thereof may be further divided into a plurality of units smaller in function, which can achieve the same operation without affecting the achievement of the technical effects of the embodiments of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the application, the data processing apparatus may also comprise other units, and in practical applications, these functions may also be realized with the assistance of other units, and may be realized by cooperation of a plurality of units.
According to another embodiment of the present application, a data processing apparatus as shown in fig. 8 may be constructed by running a computer program (including program code) capable of executing the steps involved in the respective methods as shown in fig. 2, 4 and 6 on a general-purpose computing apparatus such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and implementing the data processing method of the embodiment of the present application. The computer program may be recorded on, for example, a computer-readable recording medium, and loaded into and run in the above-described computing device through the computer-readable recording medium.
Based on the same inventive concept, the principle and beneficial effects of the data processing device provided in the embodiments of the present application for solving the problems are similar to those of the data processing device in the embodiments of the method of the present application, and may refer to the principle and beneficial effects of implementation of the method, which are not described herein for brevity.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a smart device according to an exemplary embodiment of the present application, where the smart device includes at least a processor 901, a communication interface 902, and a memory 903. Wherein the processor 901, the communication interface 902, and the memory 903 may be connected by a bus or other means. The processor 901 (or called central processing unit (Central Processing Unit, CPU)) is a computing core and a control core of the terminal, which can parse various instructions in the terminal and process various data of the terminal, for example: the CPU can be used for analyzing a startup and shutdown instruction sent by a user to the terminal and controlling the terminal to perform startup and shutdown operation; and the following steps: the CPU can transmit various kinds of interactive data between the internal structures of the terminal, and so on. Communication interface 902 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI, mobile communication interface, etc.), and may be controlled by processor 901 to receive and transmit data; the communication interface 902 may also be used for transmission and interaction of data inside the terminal. The Memory 903 (Memory) is a Memory device in the terminal for storing programs and data. It will be appreciated that the memory 903 here may include both the internal memory of the terminal and the expansion memory supported by the terminal. The memory 903 provides storage space that stores the operating system of the terminal, which may include, but is not limited to: android systems, iOS systems, windows Phone systems, etc., the application is not limited in this regard.
In an embodiment of the present application, the processor 901 performs the following operations by executing executable program code in the memory 903:
acquiring description information and static attribute information of a target user through a communication interface 902, wherein the description information is used for describing symptoms of the target user;
taking the description information and the static attribute information as input of a data analysis model, and acquiring an analysis result which is output by the data analysis model after analysis processing is carried out on the description information and the static attribute information;
outputting disease prompt information of the target user according to the analysis result;
the data analysis model is constructed based on a heterogeneous graph neural network, and the graph structure of the heterogeneous graph neural network comprises disease data characteristic nodes, patient data characteristic nodes and symptom data characteristic nodes; the data analysis model determines patient expression characteristics and disease expression characteristics based on the disease data characteristic nodes, patient data characteristic nodes, and symptom data characteristic nodes so as to obtain the analysis result.
As an optional embodiment, when constructing a data analysis model based on the graph structure of the heterogeneous graph neural network, a first meta-path set and a second meta-path set are constructed;
Wherein the first and second sets of meta-paths are constructed based on the disease data feature node, the patient data feature node, and the symptom data feature node; the patient expression features are aggregated based on each data feature node in the first set of meta-paths, and the disease expression features are aggregated based on each data feature node in the second set of meta-paths.
As an alternative embodiment, the paths in the first meta-path set and the second meta-path set are directed paths; the first meta-path set is a set of paths with path end points being patient data characteristic nodes; the second binary path set is a set of paths with path end points being disease data characteristic nodes.
As an alternative embodiment, the first set of meta-paths includes a first path and a second path, the first path having the disease data feature node as a starting point, the symptom data feature node as an intermediate node, the patient data feature node as an end point, the second path having the symptom data feature node as a starting point, the disease data feature node as an intermediate node, and the patient data feature node as an end point at the time of construction;
The disease characteristic nodes comprise a first disease characteristic node and a second disease characteristic node, the second binary path set comprises a third path and a fourth path, the third path takes the first disease data characteristic node as a starting point, the symptom data characteristic node as an intermediate node when being constructed, the second disease data characteristic node as an end point, the fourth path takes the symptom data characteristic node as a starting point, the patient data characteristic node as an intermediate node and the disease data characteristic node as an end point when being constructed.
As an alternative embodiment, the specific embodiment of the processor 901 obtaining the description information of the target user through the communication interface 902 is:
acquiring text information of the target user through a communication interface 902, extracting atomic information from the text information, and performing standardized processing on the atomic information to obtain a plurality of atomic standard information of the text information, wherein the atomic information comprises symptom type information and symptom attribute information, the symptom type information is used for describing symptoms of the target user, and the symptom attribute information is used for describing characteristics of the symptoms of the target user;
Combining the atomic standard information to obtain standard symptom combinations corresponding to the text information;
and combining the standard symptoms as descriptive information of the target user.
As an alternative embodiment, the symptom attribute information includes: at least one of location information, degree information, and time information;
wherein the location information is used for describing a body location to which the symptom of the target user belongs, the degree information is used for describing a grade of the symptom of the target user, and the time information is used for describing duration of the symptom of the target user.
As an alternative embodiment, the processor 901 extracts atomic information from the text information, and performs normalization processing on the atomic information, so as to obtain a plurality of atomic information specific embodiments of the text information are as follows:
extracting atomic information from the text information by adopting an information identification model;
obtaining a standard mapping dictionary, wherein the standard mapping dictionary comprises a mapping relation between atom information and atom standard information;
and obtaining target atomic standard information corresponding to the atomic information according to the atomic information and the mapping relation between the atomic information and the atomic standard information.
As an optional embodiment, the information identification model is trained according to electronic medical record sample data; the specific embodiment of training the processor 901 to obtain the information recognition model is as follows:
training and identifying characters in the electronic medical record sample data by adopting an initial identification model to obtain entity information, wherein the entity information comprises word entities or word entities;
constructing an atomic mapping dictionary according to the entity information, wherein the atomic mapping dictionary comprises a mapping relation between the entity information and the atomic information;
and training and identifying the initial identification model again according to the atomic mapping dictionary and the electronic medical record sample data so as to obtain an information identification model.
As an alternative embodiment, the processor 901 further performs the following operations:
acquiring medical record information of a training patient, wherein the medical record information comprises static attribute training information of the training patient, description training information and label information;
taking the static attribute training information and the description training information as input of an initial model, and obtaining an output result of the initial model, wherein the output result comprises connection probability between patient expression characteristics and disease expression characteristics of the training patient;
Carrying out loss calculation on the connection probability and the label information by adopting a loss function, and adjusting parameters of the initial model according to a loss calculation result so as to obtain a data analysis model according to the adjusted parameters;
the initial model extracts patient expression characteristics of the training patient through a first to-be-trained element path set and the description training information; extracting disease expression characteristics of the training patient through a second meta-path set to be trained and the description training information; the probability mapping function to be optimized is used for carrying out probability analysis on the static attribute training information, the patient expression characteristics and the disease expression characteristics of the training patient, so as to obtain the connection probability between the patient expression characteristics and the disease expression characteristics of the training patient.
As an optional embodiment, the patient expression features of the training patient are obtained by aggregation according to the first patient expression features of the first to-be-trained path and the second patient expression features of the second to-be-trained path in the first to-be-trained meta-path set;
the first patient expression features of the first path to be trained are obtained by aggregation according to at least one associated symptom expression feature on the first path to be trained, and each symptom expression feature is obtained by aggregation according to at least one associated disease expression feature on the first path to be trained;
The second patient expression features of the second path to be trained are aggregated from at least one associated disease expression feature on the second path to be trained, and each disease expression feature is aggregated from at least one associated symptom expression feature on the second path to be trained.
As an optional embodiment, the disease expression feature of the training patient is obtained by aggregating the first disease expression feature of the third to-be-trained path and the second disease expression feature of the fourth to-be-trained path in the second to-be-trained meta-path set;
the first disease expression features of the third path to be trained are obtained by aggregation according to at least one associated symptom expression feature on the third path to be trained, and each symptom expression feature is obtained by aggregation according to at least one associated disease expression feature on the third path to be trained;
the second disease expression features of the fourth path to be trained are aggregated according to at least one associated patient expression feature on the fourth path to be trained, and each patient expression feature is aggregated according to at least one associated symptom expression feature on the third path to be trained.
As an optional embodiment, the processor 901 performs probability analysis on the static attribute training information, the patient expression feature and the disease expression feature of the training patient through a probability mapping function to be optimized, and a specific embodiment of the connection probability between the patient expression feature and the disease expression feature of the training patient is:
performing feature fusion on the static attribute training information, the patient expression features and the disease expression features of the training patient;
and carrying out probability analysis on the fused features through a probability mapping function to be optimized to obtain the connection probability between the patient expression features and the disease expression features of the training patient.
Based on the same inventive concept, the principle and beneficial effects of the intelligent device provided in the embodiment of the present application are similar to those of the data processing method in the embodiment of the present application, and may refer to the principle and beneficial effects of the implementation of the method, which are not described herein for brevity.
Embodiments of the present application also provide a computer readable storage medium having one or more instructions stored therein, the one or more instructions being adapted to be loaded by a processor and to perform the data processing method described in the method embodiments above.
The embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the data processing method described in the method embodiments above.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of action described, as some steps may be performed in other order or simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.
The modules in the device of the embodiment of the application can be combined, divided and deleted according to actual needs.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the readable storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
The above disclosure is only a preferred embodiment of the present application, and it should be understood that the scope of the application is not limited thereto, and those skilled in the art will appreciate that all or part of the procedures described above can be performed according to the equivalent changes of the claims, and still fall within the scope of the present application.

Claims (14)

1. A method of data processing, the method comprising:
acquiring description information and static attribute information of a target user, wherein the description information is used for describing symptoms of the target user;
taking the description information and the static attribute information as input of a data analysis model, and acquiring an analysis result which is output by the data analysis model after analysis processing is carried out on the description information and the static attribute information;
outputting disease prompt information of the target user according to the analysis result;
the data analysis model is constructed based on a heterogeneous graph neural network, and the graph structure of the heterogeneous graph neural network comprises disease data characteristic nodes, patient data characteristic nodes and symptom data characteristic nodes; the data analysis model is used for determining patient expression characteristics and disease expression characteristics based on the disease data characteristic nodes, the patient data characteristic nodes and the symptom data characteristic nodes so as to obtain the analysis result;
When a data analysis model is built based on the graph structure of the heterogeneous graph neural network, a first meta-path set and a second meta-path set are built; the first and second sets of meta-paths are constructed based on the disease data feature node, the patient data feature node, and the symptom data feature node; the paths in the first meta-path set and the second meta-path set are directed paths; the first meta-path set is a set of paths with path end points being patient data characteristic nodes; the second binary path set is a set of paths with path end points being disease data characteristic nodes.
2. The method of claim 1, wherein the patient expression features are aggregated based on individual data feature nodes in the first set of meta-paths and the disease expression features are aggregated based on individual data feature nodes in the second set of meta-paths.
3. The method of claim 2, wherein the first set of meta-paths includes a first path and a second path, the first path having the disease data feature node as a starting point, the symptom data feature node as an intermediate node, the patient data feature node as an ending point at construction time, the second path having the symptom data feature node as a starting point, the disease data feature node as an intermediate node, and the patient data feature node as an ending point at construction time;
The disease data characteristic nodes comprise first disease data characteristic nodes and second disease data characteristic nodes, the second binary path set comprises a third path and a fourth path, the third path takes the first disease data characteristic nodes as starting points, symptom data characteristic nodes as intermediate nodes when being constructed, the second disease data characteristic nodes as end points, the fourth path takes the symptom data characteristic nodes as starting points, patient data characteristic nodes as intermediate nodes and disease data characteristic nodes as end points when being constructed.
4. The method of claim 1, wherein the obtaining the description information of the target user comprises:
acquiring text information of the target user, extracting atomic information from the text information, and carrying out standardized processing on the atomic information to obtain a plurality of atomic standard information of the text information, wherein the atomic information comprises symptom type information and symptom attribute information, the symptom type information is used for describing symptoms of the target user, and the symptom attribute information is used for describing characteristics of the symptoms of the target user;
combining the atomic standard information to obtain standard symptom combinations corresponding to the text information;
And combining the standard symptoms as descriptive information of the target user.
5. The method of claim 4, wherein the symptom attribute information comprises: at least one of location information, degree information, and time information;
wherein the location information is used for describing a body location to which the symptom of the target user belongs, the degree information is used for describing a grade of the symptom of the target user, and the time information is used for describing duration of the symptom of the target user.
6. The method of claim 4, wherein the extracting the atomic information from the text information and normalizing the atomic information to obtain the plurality of atomic information of the text information comprises:
extracting atomic information from the text information by adopting an information identification model;
obtaining a standard mapping dictionary, wherein the standard mapping dictionary comprises a mapping relation between atom information and atom standard information;
and obtaining target atomic standard information corresponding to the atomic information according to the atomic information and the mapping relation between the atomic information and the atomic standard information.
7. The method of claim 6, wherein the information recognition model is trained from electronic medical record sample data, the training to obtain the information recognition model comprising:
Training and identifying characters in the electronic medical record sample data by adopting an initial identification model to obtain entity information, wherein the entity information comprises word entities or word entities;
constructing an atomic mapping dictionary according to the entity information, wherein the atomic mapping dictionary comprises a mapping relation between the entity information and the atomic information;
and training and identifying the initial identification model again according to the atomic mapping dictionary and the electronic medical record sample data so as to obtain an information identification model.
8. The method of claim 1, wherein the method further comprises:
acquiring medical record information of a training patient, wherein the medical record information comprises static attribute training information of the training patient, description training information and label information;
taking the static attribute training information and the description training information as input of an initial model, and obtaining an output result of the initial model, wherein the output result comprises connection probability between patient expression characteristics and disease expression characteristics of the training patient;
carrying out loss calculation on the connection probability and the label information by adopting a loss function, and adjusting parameters of the initial model according to a loss calculation result so as to obtain a data analysis model according to the adjusted parameters;
The initial model extracts patient expression characteristics of the training patient through a first to-be-trained element path set and the description training information; extracting disease expression characteristics of the training patient through a second meta-path set to be trained and the description training information; the probability mapping function to be optimized is used for carrying out probability analysis on the static attribute training information, the patient expression characteristics and the disease expression characteristics of the training patient, so as to obtain the connection probability between the patient expression characteristics and the disease expression characteristics of the training patient.
9. The method of claim 8, wherein,
the patient expression characteristics of the training patient are obtained by aggregation according to the first patient expression characteristics of a first to-be-trained path and the second patient expression characteristics of a second to-be-trained path in the first to-be-trained element path set;
the first patient expression features of the first path to be trained are obtained by aggregation according to at least one associated symptom expression feature on the first path to be trained, and each symptom expression feature is obtained by aggregation according to at least one associated disease expression feature on the first path to be trained;
the second patient expression features of the second path to be trained are aggregated from at least one associated disease expression feature on the second path to be trained, and each disease expression feature is aggregated from at least one associated symptom expression feature on the second path to be trained.
10. The method of claim 8, wherein,
the disease expression characteristics of the training patient are obtained by aggregation according to the first disease expression characteristics of a third to-be-trained path and the second disease expression characteristics of a fourth to-be-trained path in the second to-be-trained element path set;
the first disease expression features of the third path to be trained are obtained by aggregation according to at least one associated symptom expression feature on the third path to be trained, and each symptom expression feature is obtained by aggregation according to at least one associated disease expression feature on the third path to be trained;
the second disease expression features of the fourth path to be trained are aggregated according to at least one associated patient expression feature on the fourth path to be trained, and each patient expression feature is aggregated according to at least one associated symptom expression feature on the third path to be trained.
11. The method of claim 8, wherein the probability analysis of the static attribute training information, the patient expression features, and the disease expression features of the training patient by the probability mapping function to be optimized to obtain the connection probability between the patient expression features and the disease expression features of the training patient, comprises:
Performing feature fusion on the static attribute training information, the patient expression features and the disease expression features of the training patient;
and carrying out probability analysis on the fused features through a probability mapping function to be optimized to obtain the connection probability between the patient expression features and the disease expression features of the training patient.
12. A data processing apparatus, comprising:
the receiving unit is used for acquiring description information and static attribute information of a patient, wherein the description information is used for describing symptoms of the patient;
the processing unit is used for taking the description information and the static attribute information as input of a data analysis model, acquiring an analysis result output by the data analysis model after analysis processing is carried out on the description information and the static attribute information, and determining the disease information of the patient according to the analysis result;
the data analysis model is constructed based on a heterogeneous graph neural network, and when the data analysis model is constructed, the graph structure of the corresponding heterogeneous graph neural network comprises: disease data characteristic nodes, patient data characteristic nodes, symptom data characteristic nodes, a first meta-path set and a second meta-path set; obtaining patient expression characteristics based on the first element path set in the heterogeneous graph neural network, and obtaining disease expression characteristics based on the second element path set in the heterogeneous graph neural network;
When a data analysis model is built based on the graph structure of the heterogeneous graph neural network, a first meta-path set and a second meta-path set are built; the first and second sets of meta-paths are constructed based on the disease data feature node, the patient data feature node, and the symptom data feature node; the paths in the first meta-path set and the second meta-path set are directed paths; the first meta-path set is a set of paths with path end points being patient data characteristic nodes; the second binary path set is a set of paths with path end points being disease data characteristic nodes.
13. A smart device comprising a processor, a memory and a communication interface, the processor, the memory and the communication interface being interconnected, wherein the memory is adapted to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the data processing method of any of claims 1 to 11.
14. A computer readable storage medium storing one or more instructions adapted to be loaded by a processor and to perform a data processing method according to any one of claims 1 to 11.
CN202010570923.6A 2020-06-19 2020-06-19 Data processing method, device, intelligent equipment and medium Active CN111666477B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010570923.6A CN111666477B (en) 2020-06-19 2020-06-19 Data processing method, device, intelligent equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010570923.6A CN111666477B (en) 2020-06-19 2020-06-19 Data processing method, device, intelligent equipment and medium

Publications (2)

Publication Number Publication Date
CN111666477A CN111666477A (en) 2020-09-15
CN111666477B true CN111666477B (en) 2023-10-20

Family

ID=72389077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010570923.6A Active CN111666477B (en) 2020-06-19 2020-06-19 Data processing method, device, intelligent equipment and medium

Country Status (1)

Country Link
CN (1) CN111666477B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364650A (en) * 2020-09-30 2021-02-12 深圳市罗湖区人民医院 Entity relationship joint extraction method, terminal and storage medium
CN112632972B (en) * 2020-12-25 2024-03-15 浙江国际海运职业技术学院 Method for rapidly extracting fault information in power grid equipment fault report
CN113160964A (en) * 2020-12-31 2021-07-23 上海明品医学数据科技有限公司 Intelligent medical brain model establishing system, method, service system and medium
CN113161016A (en) * 2020-12-31 2021-07-23 上海明品医学数据科技有限公司 Intelligent medical service system, method and storage medium
CN113035368A (en) * 2021-04-13 2021-06-25 桂林电子科技大学 Disease propagation prediction method based on differential migration diagram neural network
CN113223723B (en) * 2021-05-11 2023-08-25 福建省立医院 Method for predicting difficulty and complications of kidney-protecting operation of multi-mode kidney tumor
CN113257412B (en) * 2021-06-16 2022-02-11 腾讯科技(深圳)有限公司 Information processing method, information processing device, computer equipment and storage medium
CN113951169B (en) * 2021-12-16 2022-04-22 山东新希望六和集团有限公司 Training method, measuring method and device for growth performance measuring model
CN113990495B (en) 2021-12-27 2022-04-29 之江实验室 Disease diagnosis prediction system based on graph neural network
CN114334065B (en) * 2022-03-07 2022-06-28 阿里巴巴达摩院(杭州)科技有限公司 Medical record processing method, computer readable storage medium and computer device
CN114783601A (en) * 2022-03-28 2022-07-22 腾讯科技(深圳)有限公司 Physiological data analysis method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182262A (en) * 2018-01-04 2018-06-19 华侨大学 Intelligent Answer System construction method and system based on deep learning and knowledge mapping
US20190046148A1 (en) * 2017-08-11 2019-02-14 Siemens Healthcare Gmbh Method for analyzing image data from a patient after a minimally invasive intervention, analysis apparatus, computer program and electronically readable data storage medium
CN110046698A (en) * 2019-04-28 2019-07-23 北京邮电大学 Heterogeneous figure neural network generation method, device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190046148A1 (en) * 2017-08-11 2019-02-14 Siemens Healthcare Gmbh Method for analyzing image data from a patient after a minimally invasive intervention, analysis apparatus, computer program and electronically readable data storage medium
CN108182262A (en) * 2018-01-04 2018-06-19 华侨大学 Intelligent Answer System construction method and system based on deep learning and knowledge mapping
CN110046698A (en) * 2019-04-28 2019-07-23 北京邮电大学 Heterogeneous figure neural network generation method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111666477A (en) 2020-09-15

Similar Documents

Publication Publication Date Title
CN111666477B (en) Data processing method, device, intelligent equipment and medium
Dharwadkar et al. A medical chatbot
JP7100087B2 (en) How and equipment to output information
US11810671B2 (en) System and method for providing health information
CN112100406B (en) Data processing method, device, equipment and medium
US20200211709A1 (en) Method and system to provide medical advice to a user in real time based on medical triage conversation
WO2020228636A1 (en) Training method and apparatus, dialogue processing method and system, and medium
CN113707299B (en) Auxiliary diagnosis method and device based on inquiry session and computer equipment
CN114765075A (en) Medicine recommendation method, device and system, electronic equipment and storage medium
US20230316095A1 (en) Systems and methods for automated scribes based on knowledge graphs of clinical information
CN116383413B (en) Knowledge graph updating method and system based on medical data extraction
Khilji et al. Healfavor: Dataset and a prototype system for healthcare chatbot
US11557399B2 (en) Integrative machine learning framework for combining sentiment-based and symptom-based predictive inferences
CN113657105A (en) Medical entity extraction method, device, equipment and medium based on vocabulary enhancement
CN114708976A (en) Method, device, equipment and storage medium for assisting diagnosis technology
CN113571184A (en) Dialogue interaction design method and system for mental health assessment
CN114783601A (en) Physiological data analysis method and device, electronic equipment and storage medium
Jia et al. DKDR: An approach of knowledge graph and deep reinforcement learning for disease diagnosis
CN117473057A (en) Question-answering processing method, system, equipment and storage medium
CN116956934A (en) Task processing method, device, equipment and storage medium
CN114117082B (en) Method, apparatus, and medium for correcting data to be corrected
Varshney et al. Cdialog: A multi-turn covid-19 conversation dataset for entity-aware dialog generation
Kim et al. Automatic diagnosis of medical conditions using deep learning with Symptom2VEC
Kumar et al. Deep learning Based Patient-Friendly Clinical Expert Recommendation Framework
Jayaratna et al. HL7 v3 message extraction using semantic web techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40028460

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant