WO2022206599A1 - 分诊方法及装置、计算机可存储介质 - Google Patents

分诊方法及装置、计算机可存储介质 Download PDF

Info

Publication number
WO2022206599A1
WO2022206599A1 PCT/CN2022/083036 CN2022083036W WO2022206599A1 WO 2022206599 A1 WO2022206599 A1 WO 2022206599A1 CN 2022083036 W CN2022083036 W CN 2022083036W WO 2022206599 A1 WO2022206599 A1 WO 2022206599A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
complaint information
processed
entity name
main complaint
Prior art date
Application number
PCT/CN2022/083036
Other languages
English (en)
French (fr)
Inventor
康西龙
黄亮
李鑫
郭旭炀
Original Assignee
北京京东拓先科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东拓先科技有限公司 filed Critical 北京京东拓先科技有限公司
Publication of WO2022206599A1 publication Critical patent/WO2022206599A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application is based on the CN application number 202110361861.2 and the filing date is April 2, 2021, and claims its priority.
  • the disclosure content of the CN application is hereby incorporated into the present application as a whole.
  • the present disclosure relates to the field of computer technology, and in particular, to a triage method and device, and a computer-storable medium.
  • the relevant entity name is identified from the patient's chief complaint information, and an entity matching algorithm is used to determine the target department. Or input the patient's chief complaint information into the trained deep learning model, so as to use the trained deep learning model to predict the target department.
  • a triage method comprising: acquiring information on the main complaint to be processed; determining at least one entity name in the information on the main complaint to be processed and an entity type corresponding to each entity name; using an entity A matching algorithm to determine the candidate department corresponding to each entity name in the to-be-processed main complaint information; according to the to-be-processed main complaint information, use the trained deep learning model to predict the candidate department corresponding to the to-be-processed main complaint information ; According to the priority of the entity type and the priority of the trained deep learning model, determine the target department from the candidate departments, as the triage result corresponding to the main complaint information to be processed.
  • the triage method further includes: determining the priority of the entity type and the priority of the trained deep learning model according to a plurality of pieces of test data, where each piece of test data is a piece of chief complaint information marked with an actual department .
  • determining the priority of the entity type and the priority of the trained deep learning model includes: determining at least one entity name in each piece of test data and an entity type corresponding to each entity name; using the entity The matching algorithm determines the department corresponding to each entity name in each piece of test data; for each entity type corresponding to at least one entity name in the main complaint information to be processed, determine the The determined entity type includes the quantity of the test data of each entity type as the first quantity; it is determined that in the test data of the first quantity, the marked actual department is the same as the department determined by using the entity matching algorithm.
  • the quantity of test data as the second quantity; for each entity type corresponding to at least one entity name in the main complaint information to be processed, calculate the ratio of the second quantity to the first quantity; using the trained deep learning model, predict the department corresponding to each piece of test data; determine the number of test data that is the same as the department predicted by the trained deep learning model in the multiple pieces of test data, as the third number; For the trained deep learning model, calculate the ratio of the third quantity to the total quantity of the multiple pieces of test data; The ratio and the ratio corresponding to the trained deep learning model determine the corresponding priority, and the priority is positively correlated with the ratio.
  • At least one entity name in the to-be-processed main complaint information includes multiple entity names, the multiple entity names correspond to multiple entity types, and determining the corresponding priority includes: comparing the multiple entity names with the multiple entity types.
  • the multiple ratios corresponding to the types and the ratios corresponding to the trained deep learning models are sorted; according to the sorting results, the corresponding priorities are determined.
  • the triage method further includes: for each piece of training data, determining at least one entity name in the training data, an entity type corresponding to each entity name, and an attribute of the training data.
  • each piece of training data is a piece of chief complaint information that has been marked with an actual department; using the multiple pieces of training data, multiple entity names in the multiple pieces of training data, the entity type corresponding to each entity name and the multiple The pinyin of the training data is used to train the deep learning model to obtain the trained deep learning model.
  • the multiple pieces of training data are obtained by performing data enhancement on multiple pieces of main complaint information to be trained.
  • the deep learning model includes a bidirectional encoder representation from the transformer BERT model and a feedforward neural network model.
  • using an entity matching algorithm to determine a candidate department corresponding to each entity name in the pending main complaint information includes: from a preset knowledge base, selecting a candidate department corresponding to each entity name in the pending main complaint information Multiple entity names with the same entity type corresponding to each entity name, as multiple entity names to be matched, the knowledge base includes the corresponding relationship between entity names, entity types and candidate departments; for the pending main complaint information For each entity name of the to-be-matched entity name, determine the entity name that matches each entity name in the to-be-processed main complaint information from the plurality of to-be-matched entity names; The candidate department corresponding to the entity name that matches each entity name in the above is determined as the candidate department corresponding to each entity name in the main complaint information to be processed.
  • determining an entity name that matches each entity name in the pending main complaint information includes: if the character length of each entity name in the pending main complaint information is greater than a length threshold, Perform a fuzzy matching operation on each entity name in the main complaint information to be processed and the multiple entity names to be matched to obtain an entity name matching each entity name in the main complaint information to be processed; When the character length of each entity name in the main complaint information to be processed is less than or equal to the length threshold, perform a full matching operation on each entity name in the main complaint information to be processed and the multiple entity names to be matched , to obtain an entity name matching each entity name in the main complaint information to be processed.
  • performing a fuzzy matching operation on each entity name in the to-be-processed main complaint information and the plurality of to-be-matched entity names includes: calculating the relationship between each entity name in the to-be-processed main complaint information and each The similarity between the names of the entities to be matched, the similarity is negatively correlated with the editing ratio, and the editing ratio is the edit distance and the character length of each entity name in the main complaint information to be processed and each entity to be matched.
  • the ratio of the maximum character length in the character length of the name, and the edit distance is the number of edits to modify the name of each entity to be matched to the name of each entity in the main complaint information to be processed;
  • the entity name to be matched whose similarity between each entity name is the largest and is greater than the similarity threshold is determined as the entity name that matches each entity name in the main complaint information to be processed.
  • determining at least one entity name in the to-be-processed main complaint information and an entity type corresponding to each entity name includes: performing word embedding processing on the to-be-processed main complaint information to obtain a to-be-processed main complaint vector; Described main complaint vector to be processed, utilizes pre-trained lattice long short-term memory Lattice LSTM model and conditional random field CRF model, determine at least one entity name in described main complaint information to be processed and the entity type corresponding to each entity name .
  • determining the target department includes: combining the highest priority among the priority of the entity type corresponding to the at least one entity name in the pending main complaint information and the priority of the trained deep learning model The corresponding candidate department is determined as the target department.
  • a triage device comprising: an acquisition module configured to acquire pending main complaint information; a first determination module configured to determine at least one entity name in the pending main complaint information and the entity type corresponding to each entity name; the second determination module is configured to use the entity matching algorithm to determine the candidate department corresponding to each entity name in the main complaint information to be processed; the prediction module is configured to The main complaint information to be processed uses the trained deep learning model to predict the candidate department corresponding to the main complaint information to be processed; the third determination module is configured to be based on the priority of the entity type and the trained deep learning model. The priority of the model, and the target department is determined from the candidate departments, as the triage result corresponding to the main complaint information to be processed.
  • a triage device comprising: a memory; and a processor coupled to the memory, the processor being configured to execute any of the foregoing based on instructions stored in the memory
  • the triage method described in the examples comprising: a memory; and a processor coupled to the memory, the processor being configured to execute any of the foregoing based on instructions stored in the memory
  • a computer-storable medium on which computer program instructions are stored, and when the instructions are executed by a processor, implement the triage method described in any of the foregoing embodiments.
  • FIG. 1 is a flowchart illustrating a triage method according to some embodiments of the present disclosure
  • FIG. 2 is a flowchart illustrating a triage method according to further embodiments of the present disclosure
  • FIG. 3 is a flowchart illustrating prioritization of entity types and prioritization of trained deep learning models in accordance with some embodiments of the present disclosure
  • FIG. 4 is a flowchart illustrating a triage method according to further embodiments of the present disclosure.
  • FIG. 5 is a block diagram illustrating a triage device according to some embodiments of the present disclosure.
  • FIG. 6 is a block diagram illustrating a triage apparatus according to further embodiments of the present disclosure.
  • FIG. 7 is a block diagram illustrating a computer system for implementing some embodiments of the present disclosure.
  • multiple entity names may be identified in the patient's chief complaint information, and different entity names may correspond to different departments, and the target department cannot be accurately determined by using the entity matching algorithm. There may also be inaccurate predictions in the way that the trained deep learning model is used to predict the target department. Even the target departments identified by the entity matching algorithm and the trained deep learning model may be different.
  • the present disclosure proposes a triage method, which can improve the accuracy of triage results.
  • FIG. 1 is a flowchart illustrating a triage method according to some embodiments of the present disclosure.
  • the triage method includes: step S110, obtaining main complaint information to be processed; step S120, determining at least one entity name in the main complaint information to be processed and the entity type corresponding to each entity name; step S130, using the entity A matching algorithm to determine the candidate department corresponding to each entity name in the pending main complaint information; Step S140, according to the pending main complaint information, use the trained deep learning model to predict the candidate department corresponding to the pending main complaint information; and step S150, according to the priority of the entity type and the priority of the trained deep learning model, determine the target department from the candidate departments, as the triage result corresponding to the main complaint information to be processed.
  • the triage method is performed by a triage device.
  • step S110 the pending main complaint information is acquired.
  • the main complaint information from the user is received through a specific interface as the main complaint information to be processed.
  • the main complaint information to be processed may include one entity name or multiple entity names. Multiple entity names can correspond to one or more entity types.
  • the main complaint information to be processed includes entity names "cough", “maoling granules” and “tenofovir fumarate”.
  • the entity names "cough”, “maoling granules” and “tenofovir fumarate” correspond to the entity types "symptoms", "drugs” and “drugs", respectively. That is, the main complaint information to be processed includes 3 entity names, corresponding to 2 entity types.
  • step S120 shown in FIG. 1 is implemented in the following manner.
  • Word embedding is performed on the main complaint information to be processed to obtain the main complaint vector to be processed.
  • Word embedding processing is an encoding operation that converts each word in the main complaint information to be processed into a digital vector that can represent the semantic information of the word, thereby obtaining the main complaint vector to be processed.
  • the main complaint vector to be processed use the pre-trained Lattice LSTM (Lattice Long-Short Term Memory) model and CRF (Conditional Random Field, conditional random field) model to determine the main complaint information to be processed. At least one entity name.
  • the Lattice LSTM model can perform secondary encoding on the main complaint vector to be processed, so that the main complaint vector to be processed can fuse context information to obtain the main complaint vector to be processed fused with context information, and the CRF model can decode the main complaint vector to be processed fused with context information.
  • the vector is converted into entity tags, that is, at least one entity name in the main complaint information to be processed and the entity type corresponding to each entity name is determined.
  • medical-related entity names and entity types can be extracted from some public knowledge graphs, and then based on open domain text (ie, string text), the medical-related entity names and entity types can be analyzed using a remote supervision method.
  • Type is expanded to get a dictionary.
  • the dictionary includes correspondence between entity names and entity types. Using this dictionary, the Lattice LSTM model and the CRF model can be trained, and the trained Lattice LSTM model and CRF model can be obtained.
  • the types of entities that can be identified by the trained Lattice LSTM model and CRF model include disease (Dis), medicine (Med), traditional Chinese medicine (Tcm), department (Dep), traditional Chinese medicine (Cmed), and body-specific (Bod). , Surgery (Sur), Treatment (Tre), Symptoms (Sym), Department Specific (Des), Examination (Exa) and Other (Other).
  • Disease Disease
  • medicine Med
  • Tem traditional Chinese medicine
  • Dep traditional Chinese medicine
  • Cmed traditional Chinese medicine
  • Bod body-specific
  • Surgery Treatment
  • Treatment Treatment
  • Symptoms Sym
  • Department Specific Department Specific
  • Examination Exa
  • Other Other
  • step S130 the candidate department corresponding to each entity name in the main complaint information to be processed is determined by using the entity matching algorithm.
  • step S130 shown in FIG. 1 is implemented in the following manner.
  • the knowledge base includes the correspondence between entity names, entity types and candidate departments. For example, Table 1 shows the partial correspondences in the knowledge base.
  • Table 1 shows four correspondences, and there is a correspondence between the entity name “cough”, the entity type "symptom” and the candidate department "respiratory medicine”. There is a corresponding relationship between the entity name “Gao Gan Ling Granule”, the entity type “Drug” and the candidate department "Respiratory Medicine”. There is a correspondence between the entity name “Tenofovir fumarate”, the entity type “Drug” and the candidate department "Gastroenterology”. There is a correspondence between the entity name “abdominal pain”, the entity type "drug” and the candidate department "gastroenterology”.
  • an entity name matching each entity name in the main complaint information to be processed is determined from the plurality of entity names to be matched.
  • each entity name in the main complaint information to be processed when the character length of each entity name in the main complaint information to be processed is greater than the length threshold, perform a fuzzy matching operation between each entity name in the main complaint information to be processed and the names of multiple entities to be matched, and obtain a matching operation with the main complaint to be processed. Each entity name in the message matches the entity name.
  • the length threshold can be 3.
  • the similarity between each entity name in the main complaint information to be processed and each entity name to be matched is calculated. Similarity is negatively correlated with edit scale.
  • the edit ratio is the ratio of the edit distance to the character length of each entity name in the main complaint information to be processed and the maximum character length among the character lengths of each entity name to be matched.
  • the entity name to be matched with the maximum similarity with each entity name in the main complaint information to be processed and greater than the similarity threshold is determined as the entity name matching the entity name.
  • the similarity threshold is 80.
  • each entity name in the main complaint information to be processed is less than or equal to the length threshold
  • a full matching operation is performed between each entity name in the main complaint information to be processed and multiple entity names to be matched to obtain the same The entity name that matches each entity name in the pending chief complaint information.
  • the candidate department corresponding to the entity name in the knowledge base that matches each entity name in the pending main complaint information is determined as the candidate department corresponding to each entity name in the pending main complaint information.
  • the trained deep learning model is used to predict the candidate department corresponding to the main complaint information to be processed.
  • deep learning models include BERT (Bidirectional Encoder Representation from Transformers) models and feedforward neural network models.
  • the main complaint information to be processed is input into the BERT model to obtain the semantic code of the main complaint information to be processed.
  • the semantic coding of the main complaint information to be processed is input into the feedforward neural network model (full connection neural network model), and the candidate department corresponding to the main complaint information to be processed is obtained.
  • step S150 according to the priority of the entity type and the priority of the trained deep learning model, the target department is determined from the candidate departments as the triage result corresponding to the main complaint information to be processed.
  • the priority of the entity type is the priority of the entity type corresponding to at least one entity name in the main complaint information to be processed. In the case where at least one entity name corresponds to multiple entity types, the priority is that of each entity type.
  • the candidate department corresponding to the highest priority among the priority of the entity type corresponding to the at least one entity name in the main complaint information to be processed and the priority of the trained deep learning model is determined as the target department .
  • the entity names in the main complaint information to be processed are cough, Ganmaoling granules, and tenofovir fumarate, and the corresponding entity types are symptoms, medicines, and medicines, respectively.
  • Table 1 it can be determined that the candidate departments corresponding to cough, Ganmaoling granules and tenofovir fumarate are respiratory medicine, respiratory medicine and gastroenterology respectively.
  • the candidate department predicted by the trained deep learning model is "Cardiovascular Medicine". Assuming that the priority of the entity type "symptom" is higher than the priority of the entity type "drug”, and the priority of the entity type "drug" is higher than the priority of the trained deep learning model, the target department is respiratory medicine.
  • the priority of the entity type and the priority of the trained deep learning model are used, and the triage result of the entity matching algorithm and the triage result of the deep learning model are comprehensively considered, so that the accuracy of the triage can be improved.
  • FIG. 2 is a flowchart illustrating a triage method according to further embodiments of the present disclosure.
  • the triage method includes steps S100 to S150.
  • FIG. 2 shows step S100 further included in the triage method of other embodiments. Only the differences between FIG. 2 and FIG. 1 will be described below, and the similarities will not be repeated.
  • step S100 the priority of the entity type and the priority of the trained deep learning model are determined according to the multiple pieces of test data.
  • Each piece of test data is a piece of chief complaint information marked with the actual department.
  • the multiple pieces of test data include test data 1, 2, and 3.
  • the actual departments marked by test data 1, 2 and 3 are respiratory medicine, respiratory medicine and cardiovascular medicine respectively.
  • the priority of the entity type and the priority of the trained deep learning model are determined by using the test data with label information, so that the determination of the priority is more accurate, thereby further improving the accuracy of triage.
  • step S100 shown in FIG. 2 is implemented through steps S1001 to S1009 shown in FIG. 3 .
  • FIG. 3 is a flowchart illustrating prioritizing entity types and prioritizing trained deep learning models in accordance with some embodiments of the present disclosure.
  • step S1001 at least one entity name in each piece of test data and an entity type corresponding to each entity name are determined.
  • at least one entity name in each piece of test data includes multiple entity names, and the multiple entity names correspond to multiple entity types.
  • test data 1 includes entity names A and C. Entity names A and C correspond to entity types "symptom" and "drug", respectively.
  • Test data 2 includes entity names A, B, and D. Entity names B and D correspond to entity types "drugs" and “checks", respectively.
  • Test data 3 includes entity names A and D.
  • step S1002 the department corresponding to each entity name in each piece of test data is determined by using the entity matching algorithm.
  • the entity names A, B, C, and D correspond to the departments of respiratory medicine, gastroenterology, respiratory medicine, and skin surgery, respectively.
  • step S1003 for each entity type corresponding to at least one entity name in the main complaint information to be processed, it is determined that the determined entity type in the multiple pieces of test data includes the quantity of the test data of each entity type, as the first quantity .
  • the entity types corresponding to the entity names included in the main complaint information to be processed include symptoms, drugs and medicines as an example
  • each item in test data 1, 2, and 3 The types of entities identified by the test data include symptoms. That is, for the entity type "symptom" corresponding to the entity name included in the main complaint information to be processed, the first number is 3.
  • the first quantity corresponding to the entity type "drug" corresponding to the entity name included in the main complaint information to be processed is 2.
  • step S1004 in the test data of the first quantity, the quantity of the marked actual department and the department determined by the entity matching algorithm is determined as the second quantity.
  • the entity types "symptom", “drug” and “drug” corresponding to the entity name included in the main complaint information to be processed are determined as the second quantity.
  • the identified department is respiratory medicine.
  • the actual department marked as respiratory medicine is the test data 1 and 2. That is, for the entity type "symptom” corresponding to the entity name included in the main complaint information to be processed, the second number is 2.
  • the second quantity corresponding to the entity type "drug” corresponding to the entity name included in the main complaint information to be processed is 1.
  • step S1005 for each entity type corresponding to at least one entity name in the main complaint information to be processed, the ratio of the second quantity to the first quantity is calculated.
  • the second quantity is the same as the first The ratio of a quantity is 2/3.
  • the ratio of the second quantity to the first quantity is 1/2.
  • step S1006 the trained deep learning model is used to predict the department corresponding to each piece of test data.
  • the departments predicted by the trained deep learning model are respiratory medicine, gastroenterology and dermatology respectively.
  • step S1007 the number of test data in which the marked actual department is the same as the department predicted by the trained deep learning model in the multiple pieces of test data is determined as the third number. Taking the departments marked in test data 1, 2, and 3 as respiratory medicine, respiratory medicine and cardiovascular medicine respectively, only the marked actual department in test data 1 is the same as the predicted department, both of which are respiratory medicine. That is, the third number is one.
  • step S1008 for the trained deep learning model, the ratio of the third number to the total number of pieces of test data is calculated. Taking test data 1, 2, and 3 as examples, the ratio of the third quantity to the total quantity is 1/3.
  • the corresponding priority is determined according to the ratio corresponding to each entity type corresponding to at least one entity name in the main complaint information to be processed and the ratio corresponding to the trained deep learning model. Priority is positively related to the ratio. For example, in the case that there are multiple entity names in the main complaint information to be processed, and the multiple entity names correspond to multiple entity types, the priority of each entity type is determined.
  • the corresponding priority may be determined in the following manner.
  • the multiple ratios corresponding to the multiple entity types corresponding to the multiple entity names in the main complaint information to be processed and the ratios corresponding to the trained deep learning model are sorted. Then, according to the sorting result, determine the corresponding priority.
  • the ratio corresponding to the entity type "symptom” is 2/3
  • the ratio corresponding to the entity type "drug” is 1/2
  • the ratio corresponding to the trained deep learning model is 1/3. Comparing the values 2/3, 1/2 and 1/3 for sorting, you can get 2/3>1/2>1/3.
  • the priority order between the entity type "symptom", the entity type “drug” and the trained deep learning model is that the priority of the entity type "symptom” is higher than that of the entity type "drug", the entity The type “pharmaceuticals” has a higher priority than a trained deep learning model.
  • each priority can also be assigned a value to show the high and low order of the priority. For example, the priority of the entity type "symptom” is set to 1, the priority of the entity type "pharmaceuticals” is set to 2, and the priority of the trained deep learning model is set to 3. The smaller the value of the priority, the higher the priority.
  • FIG. 4 is a flowchart illustrating a triage method according to still further embodiments of the present disclosure.
  • the triage method includes steps S101 to S150.
  • FIG. 4 shows steps S101 to S102 further included in the triage method according to some embodiments. Only the differences between FIG. 4 and FIG. 1 will be described below, and the similarities will not be repeated.
  • step S101 for each piece of training data in the plurality of pieces of training data, at least one entity name in the training data, an entity type corresponding to each entity name, and the training data are determined.
  • Each piece of training data is a piece of chief complaint information that has been marked with an actual department.
  • multiple pieces of training data are obtained by performing data enhancement on multiple pieces of main complaint information to be trained.
  • the data enhancement operation includes at least one of synonym replacement, random insertion, random exchange, and random deletion.
  • whether to perform data augmentation on a certain piece of training data may be determined in a probabilistic random manner.
  • step S102 use multiple pieces of training data, multiple entity names in the multiple pieces of training data, the entity type corresponding to each entity name, and the pinyin of multiple pieces of training data to train a deep learning model to obtain a trained deep learning model Model.
  • a grid search method can be used to optimize hyperparameters during training. For example, 10% of the training data can be used as the data to test the performance of the deep learning model during the training process.
  • the training data, the entity name in the training data, the entity type corresponding to the entity name, and the pinyin of the training data are used to train the deep learning model, which can improve the reliability and accuracy of the deep learning model.
  • Using the pinyin of the training data in the training process can reduce the impact of the user's possible input of typos in the process of inputting the main complaint information, thereby improving the reliability and accuracy of the deep learning model.
  • FIG. 5 is a block diagram illustrating a triage apparatus according to some embodiments of the present disclosure.
  • the triage device 5 includes an acquisition module 51 , a first determination module 52 , a second determination module 53 , a prediction module 54 and a third determination module 55 .
  • the obtaining module 51 is configured to obtain the main complaint information to be processed, for example, performing step S110 as shown in FIG. 1 .
  • the first determination module 52 is configured to determine at least one entity name in the main complaint information to be processed and the entity type corresponding to each entity name, for example, to perform step S120 as shown in FIG. 1 .
  • the second determination module 53 is configured to use an entity matching algorithm to determine a candidate department corresponding to each entity name in the main complaint information to be processed, for example, to perform step S130 shown in FIG. 1 .
  • the prediction module 54 is configured to predict the candidate department corresponding to the main complaint information to be processed by using the trained deep learning model according to the main complaint information to be processed, for example, to perform step S140 as shown in FIG. 1 .
  • the third determination module 55 is configured to determine the target department from the candidate departments according to the priority of the entity type and the priority of the trained deep learning model, as the triage result corresponding to the main complaint information to be processed, for example, as shown in Figure 1 shown in step S150.
  • FIG. 6 is a block diagram illustrating a triage apparatus according to further embodiments of the present disclosure.
  • the triage device 6 includes a memory 61 ; and a processor 62 coupled to the memory 61 .
  • the memory 61 is used for storing instructions for executing the corresponding embodiments of the triage method.
  • the processor 62 is configured to perform the triage method of any of some embodiments of the present disclosure based on the instructions stored in the memory 61 .
  • FIG. 7 is a block diagram illustrating a computer system for implementing some embodiments of the present disclosure.
  • Computer system 70 may take the form of a general-purpose computing device.
  • Computer system 70 includes memory 710, processor 720, and bus 700 connecting various system components.
  • Memory 710 may include, for example, system memory, non-volatile storage media, and the like.
  • the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), and other programs.
  • System memory may include volatile storage media such as random access memory (RAM) and/or cache memory.
  • RAM random access memory
  • the non-volatile storage medium stores, for example, instructions for performing corresponding embodiments of at least one of the triage methods.
  • Non-volatile storage media include, but are not limited to, magnetic disk memory, optical memory, flash memory, and the like.
  • Processor 720 may be implemented as a general purpose processor, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete hardware components such as discrete gates or transistors.
  • each module such as the judging module and the determining module can be implemented by a central processing unit (CPU) running the instructions in the memory for executing the corresponding steps, or can be implemented by a dedicated circuit for executing the corresponding steps.
  • CPU central processing unit
  • Bus 700 may use any of a variety of bus structures.
  • bus structures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Peripheral Component Interconnect (PCI) bus.
  • ISA Industry Standard Architecture
  • MCA Micro Channel Architecture
  • PCI Peripheral Component Interconnect
  • the computer system 70 may also include an input-output interface 730, a network interface 740, a storage interface 750, and the like. These interfaces 730 , 740 , 750 and the memory 710 and the processor 720 can be connected through the bus 700 .
  • the input and output interface 730 may provide a connection interface for input and output devices such as a monitor, a mouse, and a keyboard.
  • Network interface 740 provides a connection interface for various networked devices.
  • the storage interface 750 provides a connection interface for external storage devices such as a floppy disk, a U disk, and an SD card.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable device to produce a machine such that execution of the instructions by the processor produces one or more blocks in the flowchart and/or block diagrams the device with the specified function.
  • Also stored in computer readable memory are these computer readable program instructions, which cause the computer to operate in a particular manner resulting in an article of manufacture including implementing the functions specified in one or more blocks of the flowchart and/or block diagrams instruction.
  • the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

一种分诊方法及装置、计算机可存储介质,涉及计算机技术领域。分诊方法包括:获取待处理主诉信息(S110);确定待处理主诉信息中的至少一个实体名称和与每个实体名称对应的实体类型(S120);利用实体匹配算法,确定与待处理主诉信息中的每个实体名称对应的候选科室(S130);根据待处理主诉信息,利用训练好的深度学习模型,预测与待处理主诉信息对应的候选科室(S140);根据实体类型的优先级和训练好的深度学习模型的优先级,从候选科室中确定目标科室,作为与待处理主诉信息对应的分诊结果(S150)。从而可以提高分诊结果的精确性。

Description

分诊方法及装置、计算机可存储介质
相关申请的交叉引用
本申请是以CN申请号为202110361861.2,申请日为2021年4月2日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及计算机技术领域,特别涉及分诊方法及装置、计算机可存储介质。
背景技术
患者在就诊过程中,需要选择就诊的科室。然而,患者往往缺乏相关医疗知识,无法准确选择自己应当就诊的科室。为解决此类问题,医院传统的分诊方式是设置分诊台。设置分诊台的方式分诊效率较低,且人力成本较大。
相关技术中,从患者的主诉信息中识别出相关实体名称,并利用实体匹配算法确定目标科室。或者将患者的主诉信息输入到训练好的深度学习模型中,从而利用训练好的深度学习模型预测目标科室。
发明内容
根据本公开的第一方面,提供了一种分诊方法,包括:获取待处理主诉信息;确定所述待处理主诉信息中的至少一个实体名称和与每个实体名称对应的实体类型;利用实体匹配算法,确定与所述待处理主诉信息中的每个实体名称对应的候选科室;根据所述待处理主诉信息,利用训练好的深度学习模型,预测与所述待处理主诉信息对应的候选科室;根据实体类型的优先级和所述训练好的深度学习模型的优先级,从候选科室中确定目标科室,作为与所述待处理主诉信息对应的分诊结果。
在一些实施例中,分诊方法还包括:根据多条测试数据,确定实体类型的优先级和所述训练好的深度学习模型的优先级,每条测试数据为一条已标注实际科室的主诉信息。
在一些实施例中,确定实体类型的优先级和所述训练好的深度学习模型的优先级包括:确定每条测试数据中的至少一个实体名称和与每个实体名称对应的实体类型; 利用实体匹配算法,确定与所述每条测试数据中的每个实体名称对应的科室;对于与所述待处理主诉信息中的至少一个实体名称对应的每种实体类型,确定所述多条测试数据中所确定的实体类型包括所述每种实体类型的测试数据的数量,作为第一数量;确定所述第一数量的测试数据中、已标注的实际科室与利用实体匹配算法所确定的科室相同的测试数据的数量,作为第二数量;对于与所述待处理主诉信息中的至少一个实体名称对应的每种实体类型,计算第二数量与第一数量的比值;利用所述训练好的深度学习模型,预测与每条测试数据对应的科室;确定所述多条测试数据中、已标注的实际科室与利用训练好的深度学习模型所预测的科室相同的测试数据的数量,作为第三数量;对于所述训练好的深度学习模型,计算第三数量与所述多条测试数据的总数量的比值;根据与所述待处理主诉信息中的至少一个实体名称对应的每种实体类型所对应的比值、和所述训练好的深度学习模型所对应的比值,确定相应的优先级,优先级与比值成正相关。
在一些实施例中,所述待处理主诉信息中的至少一个实体名称包括多个实体名称,所述多个实体名称对应多种实体类型,确定相应的优先级包括:对与所述多种实体类型对应的多个比值、和与所述训练好的深度学习模型对应的比值进行排序;根据排序结果,确定相应的优先级。
在一些实施例中,分诊方法还包括:对于多条训练数据中的每条训练数据,确定该训练数据中的至少一个实体名称、与每个实体名称对应的实体类型、和该训练数据的拼音,每条训练数据为一条已标注实际科室的主诉信息;利用所述多条训练数据、所述多条训练数据中的多个实体名称、与每个实体名称对应的实体类型和所述多条训练数据的拼音,训练深度学习模型,得到所述训练好的深度学习模型。
在一些实施例中,所述多条训练数据通过对多条待训练主诉信息进行数据增强得到。
在一些实施例中,所述深度学习模型包括来自变换器的双向编码器表征量BERT模型和前馈神经网络模型。
在一些实施例中,利用实体匹配算法,确定与所述待处理主诉信息中的每个实体名称对应的候选科室包括:从预设的知识库中,选择与所述待处理主诉信息中的每个实体名称所对应的实体类型相同的多个实体名称,作为多个待匹配实体名称,所述知识库包括实体名称、实体类型与候选科室之间的对应关系;对于所述待处理主诉信息 中的每个实体名称,从所述多个待匹配实体名称中,确定与所述待处理主诉信息中的每个实体名称相匹配的实体名称;将所述知识库中与所述待处理主诉信息中的每个实体名称相匹配的实体名称所对应的候选科室,确定为与所述待处理主诉信息中的每个实体名称对应的候选科室。
在一些实施例中,确定与所述待处理主诉信息中的每个实体名称相匹配的实体名称包括:在所述待处理主诉信息中的每个实体名称的字符长度大于长度阈值的情况下,对所述待处理主诉信息中的每个实体名称与所述多个待匹配实体名称进行模糊匹配操作,得到与所述待处理主诉信息中的每个实体名称相匹配的实体名称;在所述待处理主诉信息中的每个实体名称的字符长度小于或等于所述长度阈值的情况下,对所述待处理主诉信息中的每个实体名称与所述多个待匹配实体名称进行全匹配操作,得到与所述待处理主诉信息中的每个实体名称相匹配的实体名称。
在一些实施例中,对所述待处理主诉信息中的每个实体名称与所述多个待匹配实体名称进行模糊匹配操作包括:计算所述待处理主诉信息中的每个实体名称与每个待匹配实体名称之间的相似度,所述相似度与编辑比例成负相关,所述编辑比例为编辑距离与所述待处理主诉信息中的每个实体名称的字符长度和每个待匹配实体名称的字符长度中的最大字符长度的比值,所述编辑距离为将每个待匹配实体名称修改为与待处理主诉信息中的每个实体名称的编辑次数;将与所述待处理主诉信息中的每个实体名称之间的相似度最大且大于相似度阈值的待匹配实体名称,确定为与与待处理主诉信息中的每个实体名称相匹配的实体名称。
在一些实施例中,确定所述待处理主诉信息中的至少一个实体名称和与每个实体名称对应的实体类型包括:对所述待处理主诉信息进行字嵌入处理,得到待处理主诉向量;根据所述待处理主诉向量,利用预先训练好的点阵长短期记忆Lattice LSTM模型和条件随机场CRF模型,确定所述待处理主诉信息中的至少一个实体名称和与每个实体名称对应的实体类型。
在一些实施例中,确定目标科室包括:将与所述待处理主诉信息中的至少一个实体名称对应的实体类型的优先级和所述训练好的深度学习模型的优先级中的最高优先级所对应的候选科室,确定为所述目标科室。
根据本公开第二方面,提供了一种分诊装置,包括:获取模块,被配置为获取待处理主诉信息;第一确定模块,被配置为确定所述待处理主诉信息中的至少一个实体 名称和与每个实体名称对应的实体类型;第二确定模块,被配置为利用实体匹配算法,确定与所述待处理主诉信息中的每个实体名称对应的候选科室;预测模块,被配置为根据所述待处理主诉信息,利用训练好的深度学习模型,预测与所述待处理主诉信息对应的候选科室;第三确定模块,被配置为根据实体类型的优先级和所述训练好的深度学习模型的优先级,从候选科室中确定目标科室,作为与所述待处理主诉信息对应的分诊结果。
根据本公开第三方面,提供了一种分诊装置,包括:存储器;以及耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器的指令,执行上述任一实施例所述的分诊方法。
根据本公开的第四方面,提供了一种计算机可存储介质,其上存储有计算机程序指令,该指令被处理器执行时实现上述任一实施例所述的分诊方法。
附图说明
构成说明书的一部分的附图描述了本公开的实施例,并且连同说明书一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1是示出根据本公开一些实施例的分诊方法的流程图;
图2是示出根据本公开另一些实施例的分诊方法的流程图;
图3是示出根据本公开一些实施例的确定实体类型的优先级和训练好的深度学习模型的优先级的流程图;
图4是示出根据本公开再一些实施例的分诊方法的流程图;
图5是示出根据本公开一些实施例的分诊装置的框图;
图6是示出根据本公开另一些实施例的分诊装置的框图;
图7是示出用于实现本公开一些实施例的计算机系统的框图。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
相关技术中,患者的主诉信息中可能会识别出多个实体名称,不同实体名称可能对应不同的科室,利用实体匹配算法无法准确地确定目标科室。利用训练好的深度学习模型预测目标科室的方式也可能会存在预测不准确的情况。甚至,利用实体匹配算法和利用训练好的深度学习模型所确定的目标科室也可能不同。
针对上述技术问题,本公开提出了一种分诊方法,可以提高分诊结果的精确性。
图1是示出根据本公开一些实施例的分诊方法的流程图。
如图1所示,分诊方法包括:步骤S110,获取待处理主诉信息;步骤S120,确定待处理主诉信息中的至少一个实体名称和与每个实体名称对应的实体类型;步骤S130,利用实体匹配算法,确定与待处理主诉信息中的每个实体名称对应的候选科室;步骤S140,根据待处理主诉信息,利用训练好的深度学习模型,预测与待处理主诉信息对应的候选科室;和步骤S150,根据实体类型的优先级和训练好的深度学习模型的优先级,从候选科室中确定目标科室,作为与待处理主诉信息对应的分诊结果。例如,分诊方法由分诊装置执行。
在步骤S110中,获取待处理主诉信息。例如,通过特定接口接收来自用户的主诉信息,作为待处理主诉信息。
在步骤S120中,确定待处理主诉信息中的至少一个实体名称和与每个实体名称对应的实体类型。例如,待处理主诉信息中可以包括一个实体名称,也可以包括多个实体名称。多个实体名称可以对应一种或多种实体类型。例如,待处理主诉信息中包括实体名称“咳嗽”、“感冒灵颗粒”和“富马酸替诺福韦”。实体名称“咳嗽”、“感冒灵颗粒” 和“富马酸替诺福韦”分别对应实体类型“症状”、“药品”和“药品”。即,待处理主诉信息中包括3个实体名称,对应2种实体类型。
例如,通过如下方式实现如图1所示的步骤S120。
首先,对待处理主诉信息进行字嵌入处理,得到待处理主诉向量。字嵌入处理是一种编码操作,将待处理主诉信息中的每个字都转换为一个可以表示该字的语义信息的数字向量,从而得到待处理主诉向量。
然后,根据待处理主诉向量,利用预先训练好的Lattice LSTM(Lattice Long-Short Term Memory,点阵长短期记忆)模型和CRF(Conditional Random Field,条件随机场)模型,确定待处理主诉信息中的至少一个实体名称。Lattice LSTM模型可以对待处理主诉向量进行二次编码,使得待处理主诉向量能够融合上下文信息,得到融合上下文信息的待处理主诉向量,CRF模型可以对融合上下文信息的待处理主诉向量进行解码,将数学向量转换为实体标记,即确定待处理主诉信息中的至少一个实体名称和与每个实体名称对应的实体类型。
在一些实施例中,可以从一些公开的知识图谱中提取与医疗相关的实体名称和实体类型,再基于开放领域文本(即字符串文本),利用远程监督方法对与医疗相关的实体名称和实体类型进行扩充,得到一个词典。该词典包括实体名称与实体类型之间的对应关系。利用该词典可以训练Lattice LSTM模型和CRF模型,得到训练好的Lattice LSTM模型和CRF模型。
例如,训练好的Lattice LSTM模型和CRF模型可以识别出的实体类型包括疾病(Dis)、药品(Med)、中医类(Tcm)、科室(Dep)、中医药物(Cmed)、身体特有(Bod)、手术(Sur)、治疗(Tre)、症状(Sym)、科室特有(Des)、检查(Exa)和其他(Other)。实体类型后的括号内的内容为实体缩写。
在步骤S130中,利用实体匹配算法,确定与待处理主诉信息中的每个实体名称对应的候选科室。
例如,通过如下方式实现如图1所示的步骤S130。
首先,从预设的知识库中,选择与待处理主诉信息中的每个实体名称所对应的实体类型相同的多个实体名称,作为多个待匹配实体名称。知识库包括实体名称、实体类型与候选科室之间的对应关系。例如,表1示出了知识库中的部分对应关系。
表1
实体名称 实体类型 候选科室
咳嗽 症状 呼吸内科
感冒灵颗粒 药品 呼吸内科
富马酸替诺福韦 药品 消化内科
腹痛 症状 消化内科
表1示出了四个对应关系,实体名称“咳嗽”、实体类型“症状”与候选科室“呼吸内科”之间具有对应关系。实体名称“感冒灵颗粒”、实体类型“药品”与候选科室“呼吸内科”之间具有对应关系。实体名称“富马酸替诺福韦”、实体类型“药品”与候选科室“消化内科”之间具有对应关系。实体名称“腹痛”、实体类型“药品”与候选科室“消化内科”之间具有对应关系。
然后,对于待处理主诉信息中的每个实体名称,从多个待匹配实体名称中,确定与待处理主诉信息中的每个实体名称相匹配的实体名称。
例如,在待处理主诉信息中的每个实体名称的字符长度大于长度阈值的情况下,对待处理主诉信息中的每个实体名称与多个待匹配实体名称进行模糊匹配操作,得到与待处理主诉信息中的每个实体名称相匹配的实体名称。长度阈值可以为3。
在一些实施例中,计算待处理主诉信息中的每个实体名称与每个待匹配实体名称之间的相似度。相似度与编辑比例成负相关。编辑比例为编辑距离与待处理主诉信息中的每个实体名称的字符长度和每个待匹配实体名称的字符长度中的最大字符长度的比值。编辑距离为将每个待匹配实体名称修改为与待处理主诉信息中的每个实体名称的编辑次数。例如,相似度=(1-编辑比例)×100。
在计算得到相似度后,将与待处理主诉信息中的每个实体名称之间的相似度最大且大于相似度阈值的待匹配实体名称,确定为与该实体名称相匹配的实体名称。例如,相似度阈值为80。
又例如,在待处理主诉信息中的每个实体名称的字符长度小于或等于长度阈值的情况下,对待处理主诉信息中的每个实体名称与多个待匹配实体名称进行全匹配操作,得到与待处理主诉信息中的每个实体名称相匹配的实体名称。
最后,将知识库中与待处理主诉信息中的每个实体名称相匹配的实体名称所对应的候选科室,确定为与待处理主诉信息中的每个实体名称对应的候选科室。
在步骤S140中,根据待处理主诉信息,利用训练好的深度学习模型,预测与待处理主诉信息对应的候选科室。例如,深度学习模型包括BERT(Bidirectional Encoder  Representation from Transformers,来自变换器的双向编码器表征量)模型和前馈神经网络模型。在一些实施例中,将待处理主诉信息输入到BERT模型中,得到待处理主诉信息的语义编码。再将待处理主诉信息的语义编码输入到前馈神经网络模型(全连接神经网络模型)中,得到与待处理主诉信息对应的候选科室。
在步骤S150中,根据实体类型的优先级和训练好的深度学习模型的优先级,从候选科室中确定目标科室,作为与待处理主诉信息对应的分诊结果。实体类型的优先级为待处理主诉信息中的至少一个实体名称对应的实体类型的优先级。在至少一个实体名称对应多种实体类型的情况下,优先级为每种实体类型的优先级。
在一些实施例中,将与待处理主诉信息中的至少一个实体名称对应的实体类型的优先级和训练好的深度学习模型的优先级中的最高优先级所对应的候选科室,确定为目标科室。
例如,待处理主诉信息中的实体名称有咳嗽、感冒灵颗粒和富马酸替诺福韦,相应的实体类型分别为症状、药品和药品。根据表1,可以确定与咳嗽、感冒灵颗粒和富马酸替诺福韦对应的候选科室分别为呼吸内科、呼吸内科和消化内科。利用训练好的深度学习模型预测的候选科室为“心血管内科”。假设实体类型“症状”的优先级大于实体类型“药品”的优先级、且实体类型“药品”的优先级大于训练好的深度学习模型的优先级,则目标科室为呼吸内科。
在上述实施例中,利用实体类型的优先级和训练好的深度学习模型的优先级,综合考虑了实体匹配算法的分诊结果和深度学习模型的分诊结果,可以提高分诊的精确性。
图2是示出根据本公开另一些实施例的分诊方法的流程图。
如图2所示,分诊方法包括步骤S100-步骤S150。图2与图1的不同之处在于,图2示出了另一些实施例的分诊方法还包括的步骤S100。下面将仅描述图2与图1的不同之处,相同之处不再赘述。
在步骤S100中,根据多条测试数据,确定实体类型的优先级和训练好的深度学习模型的优先级。每条测试数据为一条已标注实际科室的主诉信息。例如,多条测试数据包括测试数据1、2、3。测试数据1、2、3标注的实际科室分别为呼吸内科、呼吸内科和心血管内科。
在上述实施例中,利用具有标注信息的测试数据,来确定实体类型的优先级和训练好的深度学习模型的优先级,使得优先级的确定更加精确,从而进一步提高分诊的精确性。
例如,通过如图3所示的步骤S1001-步骤S1009来实现如图2所示的步骤S100。
图3是示出根据本公开一些实施例的确定实体类型的优先级和训练好的深度学习模型的优先级的流程图。
在步骤S1001中,确定每条测试数据中的至少一个实体名称和与每个实体名称对应的实体类型。在一些实施例中,每条测试数据中的至少一个实体名称包括多个实体名称,这多个实体名称对应多种实体类型。例如,测试数据1中包括实体名称A和C。实体名称A和C分别对应实体类型“症状”和“药品”。测试数据2中包括实体名称A、B和D。实体名称B和D分别对应实体类型“药品”和“检查”。测试数据3中包括实体名称A和D。
在步骤S1002中,利用实体匹配算法,确定与每条测试数据中的每个实体名称对应的科室。以测试数据1、2、3为例,实体名称A、B、C、D分别对应科室呼吸内科、消化内科、呼吸内科和皮肤外科。
在步骤S1003中,对于与待处理主诉信息中的至少一个实体名称对应的每种实体类型,确定多条测试数据中所确定的实体类型包括每种实体类型的测试数据的数量,作为第一数量。以待处理主诉信息包括的实体名称对应的实体类型包括症状、药品和药品为例,对于与待处理主诉信息包括的实体名称对应的实体类型“症状”,测试数据1、2、3中每条测试数据所确定的实体类型均包括症状。即,对于与待处理主诉信息包括的实体名称对应的实体类型“症状”,第一数量为3。同理,可以确定与待处理主诉信息包括的实体名称对应的实体类型“药品”所对应的第一数量为2。
在步骤S1004中,确定第一数量的测试数据中、已标注的实际科室与利用实体匹配算法所确定的科室相同的测试数据的数量,作为第二数量。以待处理主诉信息包括的实体名称对应的实体类型“症状”、“药品”和“药品”为例,对于与待处理主诉信息包括的实体名称对应的实体类型“症状”,利用实体匹配算法所确定的科室为呼吸内科。测试数据1、2、3中,已标注的实际科室为呼吸内科的测试数据为1和2。即,对于与待处理主诉信息包括的实体名称对应的实体类型“症状”,第二数量为2。同理,可以确定与待处理主诉信息包括的实体名称对应的实体类型“药品”所对应的第二数量为1。
在步骤S1005中,对于与待处理主诉信息中的至少一个实体名称对应的每种实体类型,计算第二数量与第一数量的比值。以待处理主诉信息包括的实体名称对应的实体类型“症状”、“药品”和“药品”为例,对于与待处理主诉信息包括的实体名称对应的实体类型“症状”,第二数量与第一数量的比值为2/3。对于与待处理主诉信息包括的实体名称对应的实体类型“药品”,第二数量与第一数量的比值为1/2。
在步骤S1006中,利用训练好的深度学习模型,预测与每条测试数据对应的科室。以测试数据1、2、3为例,利用训练好的深度学习模型预测得到的科室分别为呼吸内科、消化内科和皮肤外科。
在步骤S1007中,确定多条测试数据中、已标注的实际科室与利用训练好的深度学习模型所预测的科室相同的测试数据的数量,作为第三数量。以测试数据1、2、3中标注的科室分别为呼吸内科、呼吸内科和心血管内科为例,仅测试数据1的已标注的实际科室与预测的科室相同,均为呼吸内科。即,第三数量为1。
在步骤S1008中,对于训练好的深度学习模型,计算第三数量与多条测试数据的总数量的比值。以测试数据1、2、3为例,第三数量与总数量的比值为1/3。
在步骤S1009中,根据与待处理主诉信息中的至少一个实体名称对应的每种实体类型所对应的比值、和训练好的深度学习模型所对应的比值,确定相应的优先级。优先级与比值成正相关。例如,在待处理主诉信息中有多个实体名称,多个实体名称对应多种实体类型的情况下,确定的是每种实体类型的优先级。
例如,在待处理主诉信息中的至少一个实体名称包括多个实体名称、多个实体名称对应多种实体类型的情况下,可以通过如下的方式确定相应的优先级。
首先,对与待处理主诉信息中的多个实体名称对应的多种实体类型所对应的多个比值、和与训练好的深度学习模型对应的比值进行排序。然后,根据排序结果,确定相应的优先级。
以待处理主诉信息中包括的实体名称A、B和C所对应的实体类型包括症状和药品、且多条测试数据包括测试数据1、2、3为例,实体类型“症状”所对应的比值为2/3,实体类型“药品”所对应的比值为1/2,训练好的深度学习模型所对应的比值为1/3。对比值2/3、1/2和1/3进行排序,可得到2/3>1/2>1/3。
根据排序结果可知,实体类型“症状”、实体类型“药品”和训练好的深度学习模型之间的优先级顺序为实体类型“症状”的优先级高于实体类型“药品”的优先级,实体类 型“药品”的优先级高于训练好的深度学习模型的优先级。在一些实施例中,还可以对各个优先级进行赋值来展示优先级的高低顺序。例如,实体类型“症状”的优先级设置为1、实体类型“药品”的优先级设置为2、训练好的深度学习模型的优先级设置为3。优先级的数值越小,优先级越高。
图4是示出根据本公开再一些实施例的分诊方法的流程图。
如图4所示,分诊方法包括步骤S101-步骤S150。图4与图1的不同之处在于,图4示出了再一些实施例的分诊方法还包括的步骤S101-步骤S102。下面将仅描述图4与图1的不同之处,相同之处不再赘述。
在步骤S101中,对于多条训练数据中的每条训练数据,确定该训练数据中的至少一个实体名称、与每个实体名称对应的实体类型、和该训练数据的。每条训练数据为一条已标注实际科室的主诉信息。
在一些实施例中,多条训练数据通过对多条待训练主诉信息进行数据增强得到。通过数据增强操作可以对原始数据(待训练主诉信息)进行扩充,增加训练数据的多样性,覆盖了更多可能的文本,也在一定程度上增强了模型对于未见过的数据的预测性能,从而提高深度学习模型的可靠性,增强深度学习模型的泛化能力。例如,数据增强操作包括同义词替换、随机插入、随机交换和随机删除中的至少一种。在一些实施例中,可以利用概率随机的方式决定是否对某一条训练数据进行数据增强。
在步骤S102中,利用多条训练数据、多条训练数据中的多个实体名称、与每个实体名称对应的实体类型和多条训练数据的拼音,训练深度学习模型,得到训练好的深度学习模型。在一些实施例中,在训练过程中可以采用网格搜索的方式进行超参数优化。例如,训练过程中可以将训练数据的10%作为测试深度学习模型性能的数据。
在上述实施例中,利用训练数据、训练数据中的实体名称、与实体名称对应的实体类型和训练数据的拼音来训练深度学习模型,可以提高深度学习模型的可靠性和精确性。在训练过程中利用训练数据的拼音,可以减少用户在输入主诉信息过程中可能输入错别字的影响,从而可以提高深度学习模型的可靠性和精确性。
图5是示出根据本公开一些实施例的分诊装置的框图。
如图5所示,分诊装置5包括获取模块51、第一确定模块52、第二确定模块53、预测模块54和第三确定模块55。
获取模块51被配置为获取待处理主诉信息,例如执行如图1所示的步骤S110。
第一确定模块52被配置为确定待处理主诉信息中的至少一个实体名称和与每个实体名称对应的实体类型,例如执行如图1所示的步骤S120。
第二确定模块53被配置为利用实体匹配算法,确定与待处理主诉信息中的每个实体名称对应的候选科室,例如执行如图1所示的步骤S130。
预测模块54被配置为根据待处理主诉信息,利用训练好的深度学习模型,预测与待处理主诉信息对应的候选科室,例如执行如图1所示的步骤S140。
第三确定模块55被配置为根据实体类型的优先级和训练好的深度学习模型的优先级,从候选科室中确定目标科室,作为与待处理主诉信息对应的分诊结果,例如执行如图1所示的步骤S150。
图6是示出根据本公开另一些实施例的分诊装置的框图。
如图6所示,分诊装置6包括存储器61;以及耦接至该存储器61的处理器62。存储器61用于存储执行分诊方法对应实施例的指令。处理器62被配置为基于存储在存储器61中的指令,执行本公开中任意一些实施例中的分诊方法。
图7是示出用于实现本公开一些实施例的计算机系统的框图。
如图7所示,计算机系统70可以通用计算设备的形式表现。计算机系统70包括存储器710、处理器720和连接不同系统组件的总线700。
存储器710例如可以包括系统存储器、非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)以及其他程序等。系统存储器可以包括易失性存储介质,例如随机存取存储器(RAM)和/或高速缓存存储器。非易失性存储介质例如存储有执行分诊方法中的至少一种的对应实施例的指令。非易失性存储介质包括但不限于磁盘存储器、光学存储器、闪存等。
处理器720可以用通用处理器、数字信号处理器(DSP)、应用专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它可编程逻辑设备、分立门或晶体管等分立硬件组件方式来实现。相应地,诸如判断模块和确定模块的每个模块,可以通过中央处理器(CPU)运行存储器中执行相应步骤的指令来实现,也可以通过执行相应步骤的专用电路来实现。
总线700可以使用多种总线结构中的任意总线结构。例如,总线结构包括但不限于工业标准体系结构(ISA)总线、微通道体系结构(MCA)总线、外围组件互连(PCI)总线。
计算机系统70还可以包括输入输出接口730、网络接口740、存储接口750等。这些接口730、740、750以及存储器710和处理器720之间可以通过总线700连接。输入输出接口730可以为显示器、鼠标、键盘等输入输出设备提供连接接口。网络接口740为各种联网设备提供连接接口。存储接口750为软盘、U盘、SD卡等外部存储设备提供连接接口。
这里,参照根据本公开实施例的方法、装置和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个框以及各框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可提供到通用计算机、专用计算机或其他可编程装置的处理器,以产生一个机器,使得通过处理器执行指令产生实现在流程图和/或框图中一个或多个框中指定的功能的装置。
这些计算机可读程序指令也可存储在计算机可读存储器中,这些指令使得计算机以特定方式工作,从而产生一个制造品,包括实现在流程图和/或框图中一个或多个框中指定的功能的指令。
本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。
通过上述实施例中的分诊方法及装置、计算机可存储介质,可以提高分诊结果的精确性。
至此,已经详细描述了根据本公开的分诊方法及装置、计算机可存储介质。为了避免遮蔽本公开的构思,没有描述本领域所公知的一些细节。本领域技术人员根据上面的描述,完全可以明白如何实施这里公开的技术方案。

Claims (15)

  1. 一种分诊方法,包括:
    获取待处理主诉信息;
    确定所述待处理主诉信息中的至少一个实体名称和与每个实体名称对应的实体类型;
    利用实体匹配算法,确定与所述待处理主诉信息中的每个实体名称对应的候选科室;
    根据所述待处理主诉信息,利用训练好的深度学习模型,预测与所述待处理主诉信息对应的候选科室;
    根据实体类型的优先级和所述训练好的深度学习模型的优先级,从候选科室中确定目标科室,作为与所述待处理主诉信息对应的分诊结果。
  2. 根据权利要求1所述的分诊方法,还包括:
    根据多条测试数据,确定实体类型的优先级和所述训练好的深度学习模型的优先级,每条测试数据为一条已标注实际科室的主诉信息。
  3. 根据权利要求2所述的分诊方法,其中,确定实体类型的优先级和所述训练好的深度学习模型的优先级包括:
    确定每条测试数据中的至少一个实体名称和与每个实体名称对应的实体类型;
    利用实体匹配算法,确定与所述每条测试数据中的每个实体名称对应的科室;
    对于与所述待处理主诉信息中的至少一个实体名称对应的每种实体类型,确定所述多条测试数据中所确定的实体类型包括所述每种实体类型的测试数据的数量,作为第一数量;
    确定所述第一数量的测试数据中、已标注的实际科室与利用实体匹配算法所确定的科室相同的测试数据的数量,作为第二数量;
    对于与所述待处理主诉信息中的至少一个实体名称对应的每种实体类型,计算第二数量与第一数量的比值;
    利用所述训练好的深度学习模型,预测与每条测试数据对应的科室;
    确定所述多条测试数据中、已标注的实际科室与利用训练好的深度学习模型所预测的科室相同的测试数据的数量,作为第三数量;
    对于所述训练好的深度学习模型,计算第三数量与所述多条测试数据的总数量的比值;
    根据与所述待处理主诉信息中的至少一个实体名称对应的每种实体类型所对应的比值、和所述训练好的深度学习模型所对应的比值,确定相应的优先级,优先级与比值成正相关。
  4. 根据权利要求3所述的分诊方法,其中,所述待处理主诉信息中的至少一个实体名称包括多个实体名称,所述多个实体名称对应多种实体类型,确定相应的优先级包括:
    对与所述多种实体类型对应的多个比值、和与所述训练好的深度学习模型对应的比值进行排序;
    根据排序结果,确定相应的优先级。
  5. 根据权利要求1所述的分诊方法,还包括:
    对于多条训练数据中的每条训练数据,确定该训练数据中的至少一个实体名称、与每个实体名称对应的实体类型、和该训练数据的拼音,每条训练数据为一条已标注实际科室的主诉信息;
    利用所述多条训练数据、所述多条训练数据中的多个实体名称、与每个实体名称对应的实体类型和所述多条训练数据的拼音,训练深度学习模型,得到所述训练好的深度学习模型。
  6. 根据权利要求5所述的分诊方法,其中,所述多条训练数据通过对多条待训练主诉信息进行数据增强得到。
  7. 根据权利要求5所述的分诊方法,其中,所述深度学习模型包括来自变换器的双向编码器表征量BERT模型和前馈神经网络模型。
  8. 根据权利要求1所述的分诊方法,其中,利用实体匹配算法,确定与所述待处理主诉信息中的每个实体名称对应的候选科室包括:
    从预设的知识库中,选择与所述待处理主诉信息中的每个实体名称所对应的实体类型相同的多个实体名称,作为多个待匹配实体名称,所述知识库包括实体名称、实体类型与候选科室之间的对应关系;
    对于所述待处理主诉信息中的每个实体名称,从所述多个待匹配实体名称中,确定与所述待处理主诉信息中的每个实体名称相匹配的实体名称;
    将所述知识库中与所述待处理主诉信息中的每个实体名称相匹配的实体名称所对应的候选科室,确定为与所述待处理主诉信息中的每个实体名称对应的候选科室。
  9. 根据权利要求8所述的分诊方法,其中,确定与所述待处理主诉信息中的每个实体名称相匹配的实体名称包括:
    在所述待处理主诉信息中的每个实体名称的字符长度大于长度阈值的情况下,对所述待处理主诉信息中的每个实体名称与所述多个待匹配实体名称进行模糊匹配操作,得到与所述待处理主诉信息中的每个实体名称相匹配的实体名称;
    在所述待处理主诉信息中的每个实体名称的字符长度小于或等于所述长度阈值的情况下,对所述待处理主诉信息中的每个实体名称与所述多个待匹配实体名称进行全匹配操作,得到与所述待处理主诉信息中的每个实体名称相匹配的实体名称。
  10. 根据权利要求9所述的分诊方法,其中,对所述待处理主诉信息中的每个实体名称与所述多个待匹配实体名称进行模糊匹配操作包括:
    计算所述待处理主诉信息中的每个实体名称与每个待匹配实体名称之间的相似度,所述相似度与编辑比例成负相关,所述编辑比例为编辑距离与所述待处理主诉信息中的每个实体名称的字符长度和每个待匹配实体名称的字符长度中的最大字符长度的比值,所述编辑距离为将每个待匹配实体名称修改为与待处理主诉信息中的每个实体名称的编辑次数;
    将与所述待处理主诉信息中的每个实体名称之间的相似度最大且大于相似度阈值的待匹配实体名称,确定为与待处理主诉信息中的每个实体名称相匹配的实体名称。
  11. 根据权利要求1所述的分诊方法,其中,确定所述待处理主诉信息中的至少一个实体名称和与每个实体名称对应的实体类型包括:
    对所述待处理主诉信息进行字嵌入处理,得到待处理主诉向量;
    根据所述待处理主诉向量,利用预先训练好的点阵长短期记忆Lattice LSTM模型和条件随机场CRF模型,确定所述待处理主诉信息中的至少一个实体名称和与每个实体名称对应的实体类型。
  12. 根据权利要求1所述的分诊方法,其中,确定目标科室包括:
    将与所述待处理主诉信息中的至少一个实体名称对应的实体类型的优先级和所述训练好的深度学习模型的优先级中的最高优先级所对应的候选科室,确定为所述目标科室。
  13. 一种分诊装置,包括:
    获取模块,被配置为获取待处理主诉信息;
    第一确定模块,被配置为确定所述待处理主诉信息中的至少一个实体名称和与每个实体名称对应的实体类型;
    第二确定模块,被配置为利用实体匹配算法,确定与所述待处理主诉信息中的每个实体名称对应的候选科室;
    预测模块,被配置为根据所述待处理主诉信息,利用训练好的深度学习模型,预测与所述待处理主诉信息对应的候选科室;
    第三确定模块,被配置为根据实体类型的优先级和所述训练好的深度学习模型的优先级,从候选科室中确定目标科室,作为与所述待处理主诉信息对应的分诊结果。
  14. 一种分诊装置,包括:
    存储器;以及
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器的指令,执行如权利要求1至12任一项所述的分诊方法。
  15. 一种计算机可存储介质,其上存储有计算机程序指令,该指令被处理器执行时实现如权利要求1至12任一项所述的分诊方法。
PCT/CN2022/083036 2021-04-02 2022-03-25 分诊方法及装置、计算机可存储介质 WO2022206599A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110361861.2A CN113782165A (zh) 2021-04-02 2021-04-02 分诊方法及装置、计算机可存储介质
CN202110361861.2 2021-04-02

Publications (1)

Publication Number Publication Date
WO2022206599A1 true WO2022206599A1 (zh) 2022-10-06

Family

ID=78835633

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083036 WO2022206599A1 (zh) 2021-04-02 2022-03-25 分诊方法及装置、计算机可存储介质

Country Status (2)

Country Link
CN (1) CN113782165A (zh)
WO (1) WO2022206599A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115630649A (zh) * 2022-11-23 2023-01-20 南京邮电大学 一种基于生成模型的医学中文命名实体识别方法
CN116364296A (zh) * 2023-02-17 2023-06-30 中国人民解放军总医院 标准检查项目名称确认方法、装置、设备、介质及产品

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113782165A (zh) * 2021-04-02 2021-12-10 北京京东拓先科技有限公司 分诊方法及装置、计算机可存储介质
CN114464312B (zh) * 2022-01-04 2022-12-02 北京欧应信息技术有限公司 用于辅助疾病推理的系统及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180315488A1 (en) * 2017-04-25 2018-11-01 Telemedco Inc. Emergency Room Medical Triage, Diagnosis, and Treatment
CN110047584A (zh) * 2019-04-23 2019-07-23 清华大学 基于深度学习的医院分诊方法、系统、装置及介质
CN110675944A (zh) * 2019-09-20 2020-01-10 京东方科技集团股份有限公司 分诊方法及装置、计算机设备及介质
CN111785368A (zh) * 2020-06-30 2020-10-16 平安科技(深圳)有限公司 基于医疗知识图谱的分诊方法、装置、设备及存储介质
CN113782165A (zh) * 2021-04-02 2021-12-10 北京京东拓先科技有限公司 分诊方法及装置、计算机可存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109887587A (zh) * 2019-01-22 2019-06-14 平安科技(深圳)有限公司 智能分诊方法、系统、装置及存储介质
CN110085308B (zh) * 2019-04-23 2022-02-25 挂号网(杭州)科技有限公司 一种基于融合深度学习的诊疗科室分类方法
CN112115240A (zh) * 2019-06-21 2020-12-22 百度在线网络技术(北京)有限公司 分类处理方法、装置、服务器和存储介质
CN111370102A (zh) * 2020-02-06 2020-07-03 清华大学 科室导诊方法、装置以及设备
CN112015917A (zh) * 2020-09-07 2020-12-01 平安科技(深圳)有限公司 基于知识图谱的数据处理方法、装置及计算机设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180315488A1 (en) * 2017-04-25 2018-11-01 Telemedco Inc. Emergency Room Medical Triage, Diagnosis, and Treatment
CN110047584A (zh) * 2019-04-23 2019-07-23 清华大学 基于深度学习的医院分诊方法、系统、装置及介质
CN110675944A (zh) * 2019-09-20 2020-01-10 京东方科技集团股份有限公司 分诊方法及装置、计算机设备及介质
CN111785368A (zh) * 2020-06-30 2020-10-16 平安科技(深圳)有限公司 基于医疗知识图谱的分诊方法、装置、设备及存储介质
CN113782165A (zh) * 2021-04-02 2021-12-10 北京京东拓先科技有限公司 分诊方法及装置、计算机可存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115630649A (zh) * 2022-11-23 2023-01-20 南京邮电大学 一种基于生成模型的医学中文命名实体识别方法
CN115630649B (zh) * 2022-11-23 2023-06-30 南京邮电大学 一种基于生成模型的医学中文命名实体识别方法
CN116364296A (zh) * 2023-02-17 2023-06-30 中国人民解放军总医院 标准检查项目名称确认方法、装置、设备、介质及产品
CN116364296B (zh) * 2023-02-17 2023-12-26 中国人民解放军总医院 标准检查项目名称确认方法、装置、设备、介质及产品

Also Published As

Publication number Publication date
CN113782165A (zh) 2021-12-10

Similar Documents

Publication Publication Date Title
WO2022206599A1 (zh) 分诊方法及装置、计算机可存储介质
CN106844368B (zh) 用于人机对话的方法、神经网络系统和用户设备
Yang et al. Joint relational embeddings for knowledge-based question answering
WO2018214486A1 (zh) 一种多文档摘要生成的方法、装置和终端
US20210012215A1 (en) Hierarchical multi-task term embedding learning for synonym prediction
TWI738270B (zh) 將文句短語映射至知識分類表之方法及系統
Qu et al. Distant supervision for neural relation extraction integrated with word attention and property features
Cai et al. A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records
WO2021146831A1 (zh) 实体识别的方法和装置、建立词典的方法、设备、介质
US20220229984A1 (en) Systems and methods for semi-supervised extraction of text classification information
WO2023029513A1 (zh) 基于人工智能的搜索意图识别方法、装置、设备及介质
Yuan et al. Large language models for healthcare data augmentation: An example on patient-trial matching
CN116628186A (zh) 文本摘要生成方法及系统
Zhang et al. Natural language generation and deep learning for intelligent building codes
Wong et al. iSentenizer‐μ: Multilingual Sentence Boundary Detection Model
CN110019474B (zh) 异构数据库中的同义数据自动关联方法、装置及电子设备
WO2022127040A1 (zh) 文本处理方法、装置、设备及存储介质
Khan et al. A clustering framework for lexical normalization of Roman Urdu
Pan et al. A BERT-based generation model to transform medical texts to SQL queries for electronic medical records: model development and validation
Qian et al. Fine-grained entity typing without knowledge base
Spasić et al. Head to head: Semantic similarity of multi–word terms
Ren et al. Extraction of transitional relations in healthcare processes from Chinese medical text based on deep learning
Wu et al. A radical-based method for Chinese named entity recognition
Behera An Experiment with the CRF++ Parts of Speech (POS) Tagger for Odia.
CN114548113A (zh) 基于事件的指代消解系统、方法、终端及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22778781

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE