WO2021151328A1 - 症状数据处理方法、装置、计算机设备及存储介质 - Google Patents
症状数据处理方法、装置、计算机设备及存储介质 Download PDFInfo
- Publication number
- WO2021151328A1 WO2021151328A1 PCT/CN2020/124221 CN2020124221W WO2021151328A1 WO 2021151328 A1 WO2021151328 A1 WO 2021151328A1 CN 2020124221 W CN2020124221 W CN 2020124221W WO 2021151328 A1 WO2021151328 A1 WO 2021151328A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- symptom
- data
- vector
- preset
- medical record
- Prior art date
Links
- 208000024891 symptom Diseases 0.000 title claims abstract description 426
- 238000003672 processing method Methods 0.000 title claims abstract description 21
- 239000013598 vector Substances 0.000 claims abstract description 254
- 238000012549 training Methods 0.000 claims abstract description 103
- 238000000034 method Methods 0.000 claims abstract description 41
- 238000012545 processing Methods 0.000 claims abstract description 34
- 238000012512 characterization method Methods 0.000 claims description 64
- 230000008569 process Effects 0.000 claims description 23
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000010276 construction Methods 0.000 abstract description 2
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 7
- 206010011224 Cough Diseases 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 101100501281 Caenorhabditis elegans emb-1 gene Proteins 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 208000028399 Critical Illness Diseases 0.000 description 1
- 206010037660 Pyrexia Diseases 0.000 description 1
- 208000032023 Signs and Symptoms Diseases 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Definitions
- This application relates to the field of natural language processing, and in particular to a symptom data processing method, device, computer equipment and storage medium.
- Triage refers to the process of judging the patient's condition and department based on the patient's symptoms and signs, and arranging the treatment. The accuracy of triage results is of great significance to the rational allocation of hospital resources and improving the efficiency of patient consultation.
- the triage work in the hospital is mainly handled by the triage staff.
- the inventor realizes that because the triage work faces the triage task of the general department, it is difficult; at the same time, the number of hospital visits is large and the triage processing time is short, these two factors will have a certain impact on the accuracy of the triage results.
- a symptom data processing method including:
- the symptom data is processed into a representation vector by a preset BERT encoder, the representation vector is generated based on the symptom feature data in the symptom data; the symptom feature data includes symptom name and symptom attribute; the preset BERT The encoder is obtained after being trained by a pre-training task; the pre-training task is used to determine the association relationship between the characterization vector and the symptom name and symptom attribute;
- the characterization vector is input to a preset TextCNN model, and the classification result output by the preset TextCNN model is obtained.
- a symptom data processing device including:
- the data processing module is configured to process the symptom data into a representation vector through a preset BERT encoder, and the representation vector is generated based on the symptom name and its attributes in the symptom data; the preset BERT encoder is preset
- the training task is obtained after training; the pre-training task is used to determine the association relationship between the characterization vector and the symptom name and symptom attribute;
- the data output module is used to input the characterization vector into a preset TextCNN model, and obtain the classification result output by the preset TextCNN model.
- a computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
- the symptom data is processed into a representation vector by a preset BERT encoder, the representation vector is generated based on the symptom feature data in the symptom data; the symptom feature data includes symptom name and symptom attribute; the preset BERT The encoder is obtained after being trained by a pre-training task; the pre-training task is used to determine the association relationship between the characterization vector and the symptom feature data;
- the characterization vector is input to a preset TextCNN model, and the classification result output by the preset TextCNN model is obtained.
- One or more readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:
- the symptom data is processed into a representation vector by a preset BERT encoder, the representation vector is generated based on the symptom feature data in the symptom data; the symptom feature data includes symptom name and symptom attribute; the preset BERT The encoder is obtained after being trained by a pre-training task; the pre-training task is used to determine the association relationship between the characterization vector and the symptom feature data;
- the characterization vector is input to a preset TextCNN model, and the classification result output by the preset TextCNN model is obtained.
- the above-mentioned symptom data processing method, device, computer equipment and storage medium obtain the raw data input by the patient in real time by acquiring the symptom data.
- the symptom data is processed into a representation vector by a preset BERT encoder, the representation vector is generated based on the symptom feature data in the symptom data; the symptom feature data includes symptom name and symptom attribute; the preset BERT The encoder is obtained after training by a pre-training task; the pre-training task is used to determine the association relationship between the representation vector and the symptom feature data.
- the symptom data is processed into a representation through a preset BERT encoder
- the vector can better extract the characteristics of the symptom data, and the obtained representation vector contains more information, which is beneficial to improve the accuracy of the classification result.
- FIG. 1 is a schematic diagram of an application environment of a symptom data processing method in an embodiment of the present application
- FIG. 2 is a schematic flowchart of a method for processing symptom data in an embodiment of the present application
- FIG. 3 is a schematic flowchart of a method for processing symptom data in an embodiment of the present application
- FIG. 4 is a schematic flowchart of a symptom data processing method in an embodiment of the present application.
- FIG. 5 is a schematic flowchart of a method for processing symptom data in an embodiment of the present application
- FIG. 6 is a schematic flowchart of a method for processing symptom data in an embodiment of the present application.
- FIG. 7 is a schematic structural diagram of a symptom data processing device in an embodiment of the present application.
- Fig. 8 is a schematic diagram of a computer device in an embodiment of the present application.
- the symptom data processing method provided in this embodiment can be applied in an application environment as shown in FIG. 1, in which the client communicates with the server.
- the client includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
- the server can be implemented with an independent server or a server cluster composed of multiple servers.
- a symptom data processing method is provided.
- the method is applied to the server in FIG. 1 as an example for description, including the following steps:
- the symptom data processing method can be executed on the symptom data processing device.
- the symptom data may refer to data input by the patient in the symptom data processing device. Schematically, the patient first enters one of his symptoms. The symptom data processing device will inquire about the attributes (duration and onset characteristics) of the symptom, and then recommend other symptoms that the patient may have based on the input symptoms. If the patient affirms the symptom, ask about the relevant attributes of the symptom as well, otherwise ask about the next symptom. After the patient is sure that he has described all his symptoms completely, he can click the "confirm" button to complete the submission of the symptom data. In some cases, symptom data can also be entered with the assistance of triagers.
- S20 Process the symptom data into a characterization vector through a preset BERT encoder, the characterization vector being generated based on the symptom characteristic data in the symptom data; the symptom characteristic data includes symptom name and symptom attribute; the prediction It is assumed that the BERT encoder is obtained after being trained by a pre-training task; the pre-training task is used to determine the association relationship between the characterization vector and the symptom feature data.
- the default BERT (Bidirectional Encoder Representations from Transformers) encoder is based on the existing BERT model (see the paper "Pre-training of Deep Bidirectional Transformers for Language Understanding" published by Google in 2018) Obtained after improvement.
- the preset BERT encoder is obtained after pre-training task training.
- the pre-training task is a custom task, and the pre-training task is defined as inferring the symptom name and symptom attribute contained in the characterization vector based on the current characterization vector.
- the pre-training task can ensure that the preset BERT encoder can learn the information contained in the output characterization vector, that is, the association relationship between the characterization vector and the symptom feature data is determined through the pre-training task.
- the association relationship is reflected in the model parameters of the preset BERT encoder.
- the symptom name and symptom attribute can be accurately converted into an overall vector, that is, the representation vector.
- the number of generated representation vectors is equal to the number of symptoms in the symptom data. That is, as many symptoms in the symptom data, the same number of corresponding representation vectors are generated.
- the preset BERT encoder is obtained after pre-training with a large amount of medical record data (the same type as step S10). Therefore, in addition to the features of the input symptom data, the generated representation vector also contains the associated features related to the symptom data predicted by the preset BERT encoder.
- the preset TextCNN model (text convolutional neural network) can be obtained after improvement based on the existing TextCNN model.
- the input of the preset TextCNN model is a representation vector generated after processing by a preset BERT encoder, rather than a randomly initialized word vector.
- all the representation vectors generated by the symptom data are used as the input data of the preset TextCNN model.
- the model calculation stage multiple convolution kernels are used to convolve the input data, pooled in the pooling layer, the output of the pooling layer is connected to the fully connected network unit, and finally the softmax activation function is used to output the probability of each category.
- the preset TextCNN model may be a two-classification model, and the classification result is used to determine whether the patient has a critical illness.
- steps S10-S30 by obtaining symptom data, the raw data input by the patient in real time is obtained.
- the symptom data is processed into a representation vector by a preset BERT encoder, the representation vector is generated based on the symptom feature data in the symptom data; the symptom feature data includes symptom name and symptom attribute; the preset BERT The encoder is obtained after training by a pre-training task; the pre-training task is used to determine the association relationship between the representation vector and the symptom feature data.
- the symptom data is processed into a representation through a preset BERT encoder
- the vector can better extract the characteristics of the symptom data, and the obtained representation vector contains more information, which is beneficial to improve the accuracy of the classification result.
- step S10 that is, obtaining symptom data includes:
- the first symptom data refers to the symptom name and symptom attribute of the first symptom entered by the patient.
- the symptom name of the first symptom data is "cough”
- the corresponding attribute data includes "cough for three days” and "cough with bloodshot eyes”.
- a symptom name is associated with one or more attribute data.
- S102 Output a related symptom prompt according to the first symptom data.
- the relevant symptom prompt can be expressed as: In addition to "cough", do you have "fever” symptoms.
- the corresponding selection boxes are output at the same time, which are "Yes” and “No” respectively.
- the patient selects "Yes” then collect the second symptom data.
- the acquisition method of the second symptom data is basically the same as the acquisition method of the first symptom data, and both are the input data of the patient.
- the patient selects "No” the second symptom data corresponding to the current related symptom prompt is not collected.
- the number of output related symptom prompts can be greater than one.
- the collected second symptom data can be greater than one.
- the acquisition of the symptom data is completed, where the symptom data includes the first symptom data and the second symptom data.
- the patient judges that the symptom described by him is complete, he can click the "confirm" button to confirm that the symptom data collection is completed.
- the number of second symptom data can be any non-negative integer, that is, it can be zero or a positive integer.
- the first symptom data is acquired.
- the symptom data of the patient can be gradually collected according to different symptoms. If there are multiple symptom data, the first symptom data has the highest importance in general.
- the relevant symptom prompt is output according to the first symptom data to determine whether the patient has other symptoms related to the first symptom (that is, the symptom name corresponding to the first symptom data).
- the second symptom data is acquired based on the relevant symptom prompt to further collect the patient's symptom data (here, the second symptom data refers to other symptom data except the first symptom data).
- the acquisition of the symptom data is completed, and the symptom data includes the first symptom data and the second symptom data.
- more detailed symptom data can be obtained and the classification result is improved. accuracy.
- step S20 that is, processing the symptom data into a characterization vector through a preset BERT encoder, before the characterization vector is generated based on the symptom name and symptom attribute, further includes:
- S202 Input the several word vectors into an initial BERT network model, and obtain a training characterization vector output by the initial BERT network model;
- the loss value is within a preset range
- the pre-training task training is completed, and the initial BERT network model after the training is completed is the preset BERT encoder.
- the pre-training task is mainly used to perform the iterative calculation of steps S202-S204.
- the symptom samples need to be converted into word vectors through the word2vec model.
- the word2vec model is a model for generating word vectors.
- the symptom data is ⁇ cough: three days; bloodshot ⁇ , and the word vectors emb1 (cough), emb2 (three days), and emb3 (bloodshot) can be obtained after conversion by the word2vec model.
- emb1 is the first word vector
- emb2 and emb3 are the second word vectors.
- the symptom sample refers to the training data used to train the initial BERT network model, which is generally historical symptom data in a certain area.
- each word vector can be input to the initial BERT network model as input data, and a training characterization vector can be generated, and the corresponding loss value can be calculated.
- the loss value can be calculated by the loss function.
- the loss function is defined as:
- L(Vs, sym (n) ) is the loss value of the nth symptom; sym (n) represents the nth symptom in the symptom list; Vs represents the overall representation vector; Is the loss term of the nth symptom in the representation vector, Is the loss of other symptoms in the representation vector. It can be known from the loss function that the loss value of the symptoms appearing in the characterization vector should be as small as possible, otherwise the loss value should be as large as possible.
- the preset range can be adjusted according to actual needs. If the loss value is within the preset range, it means that the initial BERT network model has converged, and the pre-training task training is now complete.
- the initial BERT network model after training is the preset BERT encoder.
- a pre-training task is established, and the symptom sample is processed into several word vectors using the word2vec model.
- the word vectors include a first word vector generated based on the symptom name, and a second word vector generated based on the symptom attribute.
- the several word vectors are input into the initial BERT network model, and the training characterization vector output by the initial BERT network model is obtained to perform the training step of the initial BERT network model.
- the loss value of the initial BERT network model is calculated according to the training characterization vector, and the obtained loss value can be used to adjust model parameters and determine whether the model converges.
- the loss value is outside the preset range, adjust the model parameters of the initial BERT network model, and recalculate the training characterization vector of the symptom sample, so as to perform iterative calculation when the model does not converge. If the loss value is within the preset range, the training of the pre-training task is completed, and the initial BERT network model after the training is the preset BERT encoder, where it can be used to generate The preset BERT encoder for the characterization vector.
- step S10 that is, after obtaining the symptom data
- the method further includes:
- a preset BERT encoder can be used to process the symptom data into a word vector (the representation vector can be split into multiple word vectors), and then combined with the TF-IDF value of the local medical record database to generate a sentence vector.
- the corresponding cosine similarity can be calculated.
- a high cosine similarity indicates that the medical record is highly similar to the current symptom data.
- the specified number can be set according to actual needs, for example, it can be 10.
- each matching medical record has a corresponding treatment department.
- the matching relationship between the matching medical records and the treatment departments can be expressed as:
- the department with the highest frequency can be determined as the recommended department.
- the recommended department is the department where the patient is recommended to see a doctor.
- a sentence vector is generated according to the symptom data, and a sentence vector is generated in combination with the characteristics of the local data.
- the sentence vector is compared with the medical record sentence vector of the local medical record database, and the cosine similarity is calculated, and the similarity between the medical record and the symptom data corresponding to the medical record sentence vector in the local medical record database can be compared through the cosine similarity.
- a specified number of matching medical records are selected to obtain the matching medical records with the highest similarity.
- step S12 that is, comparing the sentence vector with the medical record sentence vector in the local medical record database, and before calculating the cosine similarity
- the method further includes:
- S124 Generate a medical record sentence vector for each medical record according to the symptom word vector and the TF-IDF value.
- the medical record data and the symptom data in step S10 belong to the patient visit data in the same area.
- the preset BERT encoder in step S122 has the same training method as the preset BERT encoder in step S20, but the output form is slightly different.
- the output of the preset BERT encoder in step S122 is a symptom word vector (W emb ), and the dimension of the word vector is [1,1024].
- the TF-IDF (term frequency—inverse document frequency) value corresponding to each symptom word vector is calculated, and the TF-IDF value is set as the weight of the word vector.
- the TF-IDF value is used to evaluate the importance of a symptom description (word vector) in the symptom data to the medical record data (sentence vector).
- the medical record sentence vector can be converted by the following formula:
- S emb is the medical record sentence vector
- Is the i-th symptom word vector
- TF-IDF i the TF-IDF value corresponding to the i-th symptom word vector
- k is the total number of symptoms in the medical record.
- the symptom data can also use steps S122-S124 to generate a corresponding sentence vector.
- the medical record data of the local medical record database is acquired.
- the local medical record database is a pre-built database for storing local medical record data, which can be used for comparison with the symptom data of step S10.
- Use the preset BERT encoder to process the medical record data to generate a symptom word vector query vocabulary.
- the symptom word vector query vocabulary includes a symptom word vector for each symptom.
- the medical record data is processed into a vector form , To facilitate comparison. Calculate the TF-IDF value of the symptom word vector, where the TF-IDF value is set as the weight of the symptom word vector in the sentence vector.
- the medical record sentence vector of each medical record is generated according to the symptom word vector and the TF-IDF value, and the obtained medical record sentence vector can be compared with the sentence vector of the symptom data (by calculating the cosine similarity) to determine the degree of similarity between each other.
- a symptom data processing device is provided, and the symptom data processing device corresponds to the symptom data processing method in the above-mentioned embodiment one-to-one.
- the symptom data processing device includes an acquisition module 10, a data processing module 20 and a data output module 30.
- the detailed description of each functional module is as follows:
- the obtaining module 10 is used to obtain symptom data
- the data processing module 20 is configured to process the symptom data into a characterization vector through a preset BERT encoder, and the characterization vector is generated based on the symptom name and its attributes in the symptom data; the preset BERT encoder is A pre-training task is obtained after training; the pre-training task is used to determine the association relationship between the characterization vector and the symptom name and symptom attribute;
- the data output module 30 is configured to input the characterization vector into a preset TextCNN model, and obtain the classification result output by the preset TextCNN model.
- the obtaining module 10 includes:
- the first data acquisition unit is used to acquire first symptom data
- the prompt unit is configured to output related symptom prompts according to the first symptom data
- the acquiring second data unit is configured to acquire second symptom data based on the relevant symptom prompt
- the completion collection unit is configured to complete the acquisition of the symptom data after it is determined that the symptom data is collected, and the symptom data includes the first symptom data and the second symptom data.
- the symptom data processing device further includes:
- a model training module configured to input the several word vectors into the initial BERT network model, and obtain the training characterization vector output by the initial BERT network model;
- a loss calculation module configured to calculate the loss value of the initial BERT network model according to the training characterization vector
- the iterative module is used to adjust the model parameters of the initial BERT network model if the loss value is outside the preset range, and recalculate the training characterization vector of the symptom sample to calculate the loss of the initial BERT network model value;
- the determining encoder module is configured to, if the loss value is within a preset range, the pre-training task training is completed, and the initial BERT network model after the training is completed is the preset BERT encoder.
- the symptom data processing device further includes:
- a sentence vector generation module which is used to generate a sentence vector according to the symptom data
- the cosine similarity calculation module is used to compare the sentence vector with the medical record sentence vector in the local medical record database to calculate the cosine similarity
- the matching medical record module is used to select a specified number of matching medical records according to the cosine similarity
- the determining department module is used to obtain the medical department to which the matched medical record belongs, and determine the medical department with the highest frequency of occurrence as the recommended department.
- the module for calculating cosine similarity includes:
- Obtain the local medical record data unit which is used to obtain the medical record data of the local medical record database
- the symptom word vector calculation unit is used to process the medical record data using the preset BERT encoder to generate a symptom word vector query vocabulary, the symptom word vector query vocabulary including the symptom word vector of each symptom;
- TF-IDF value calculation unit used to calculate the TF-IDF value of the symptom word vector
- a medical record sentence vector unit is used to generate a medical record sentence vector for each medical record according to the symptom word vector and the TF-IDF value.
- Each module in the above-mentioned symptom data processing device can be implemented in whole or in part by software, hardware, and a combination thereof.
- the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
- a computer device is provided.
- the computer device may be a server, and its internal structure diagram may be as shown in FIG. 8.
- the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
- the memory of the computer device includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
- the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
- the database of the computer equipment is used to store the data involved in the above-mentioned symptom data processing method.
- the network interface of the computer device is used to communicate with an external terminal through a network connection.
- the computer readable instructions are executed by the processor to realize a symptom data processing method.
- a computer device including a memory, a processor, and computer-readable instructions stored on the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
- the symptom data is processed into a representation vector by a preset BERT encoder, the representation vector is generated based on the symptom feature data in the symptom data; the symptom feature data includes symptom name and symptom attribute; the preset BERT The encoder is obtained after being trained by a pre-training task; the pre-training task is used to determine the association relationship between the characterization vector and the symptom name and symptom attribute;
- the characterization vector is input to a preset TextCNN model, and the classification result output by the preset TextCNN model is obtained.
- one or more computer-readable storage media storing computer-readable instructions are provided.
- the readable storage media provided in this embodiment include non-volatile readable storage media and volatile readable storage media. Storage medium.
- the readable storage medium stores computer readable instructions, and when the computer readable instructions are executed by one or more processors, the following steps are implemented:
- the symptom data is processed into a representation vector by a preset BERT encoder, the representation vector is generated based on the symptom feature data in the symptom data; the symptom feature data includes symptom name and symptom attribute; the preset BERT The encoder is obtained after being trained by a pre-training task; the pre-training task is used to determine the association relationship between the characterization vector and the symptom name and symptom attribute;
- the characterization vector is input to a preset TextCNN model, and the classification result output by the preset TextCNN model is obtained.
- Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory may include random access memory (RAM) or external cache memory.
- RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Pathology (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
本申请涉及自然语言处理领域,公开了一种症状数据处理方法、装置、计算机设备及存储介质,其方法包括:获取症状数据;通过预设BERT编码器将症状数据处理为表征向量,表征向量基于症状数据中的症状特征数据而生成;症状特征数据包括症状名称和症状属性;预设BERT编码器经预训练任务训练后获得;预训练任务用于确定表征向量与症状名称和症状属性之间的关联关系;将表征向量输入预设TextCNN模型,获取预设TextCNN模型输出的分类结果。本申请可以提高分诊结果的准确性,提升分诊结果的质量。本申请还可应用于智慧城市的建设。
Description
本申请要求于2020年9月04日提交中国专利局、申请号为202010921651.X,发明名称为“症状数据处理方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及自然语言处理领域,尤其涉及一种症状数据处理方法、装置、计算机设备及存储介质。
分诊是指根据病人的症状及体征判断病人的病情及科室,并安排其就诊的过程。分诊结果的准确性,对于医院资源的合理分配,提高患者就诊效率具有重要意义。
目前医院分诊工作主要由分诊员处理。发明人意识到,由于分诊工作面临全科室的分诊任务,难度大;同时医院就诊人数较多,分诊处理时间短,这两种因素会对分诊结果的准确性产生一定影响。
因而,有必要提供一种智能医疗引导方法,以解决当前分诊结果准确性不高的问题。
申请内容
基于此,有必要针对上述技术问题,提供一种症状数据处理方法、装置、计算机设备及存储介质,以提高分诊结果的准确性,提升分诊结果的质量。
一种症状数据处理方法,包括:
获取症状数据;
通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于所述症状数据中的症状特征数据而生成;所述症状特征数据包括症状名称和症状属性;所述预设BERT编码器经预训练任务训练后获得;所述预训练任务用于确定所述表征向量与所述症状名称和症状属性之间的关联关系;
将所述表征向量输入预设TextCNN模型,获取所述预设TextCNN模型输出的分类结果。
一种症状数据处理装置,包括:
获取模块,用于获取症状数据;
数据处理模块,用于通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于所述症状数据中的症状名称及其属性而生成;所述预设BERT编码器经预训练任务训练后获得;所述预训练任务用于确定所述表征向量与所述症状名称和症状属性之间的关联关系;
数据输出模块,用于将所述表征向量输入预设TextCNN模型,获取所述预设TextCNN模型输出的分类结果。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取症状数据;
通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于所述症状数据中的症状特征数据而生成;所述症状特征数据包括症状名称和症状属性;所述预设BERT编码器经预训练任务训练后获得;所述预训练任务用于确定所述表征向量与所述症 状特征数据之间的关联关系;
将所述表征向量输入预设TextCNN模型,获取所述预设TextCNN模型输出的分类结果。
一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
获取症状数据;
通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于所述症状数据中的症状特征数据而生成;所述症状特征数据包括症状名称和症状属性;所述预设BERT编码器经预训练任务训练后获得;所述预训练任务用于确定所述表征向量与所述症状特征数据之间的关联关系;
将所述表征向量输入预设TextCNN模型,获取所述预设TextCNN模型输出的分类结果。
上述症状数据处理方法、装置、计算机设备及存储介质,通过获取症状数据,以获得患者实时输入的原始数据。通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于所述症状数据中的症状特征数据而生成;所述症状特征数据包括症状名称和症状属性;所述预设BERT编码器经预训练任务训练后获得;所述预训练任务用于确定所述表征向量与所述症状特征数据之间的关联关系,在此处,通过预设BERT编码器将症状数据处理为表征向量,可以更好地提取出症状数据的特征,获得的表征向量包含的信息量更多,有利于提高分类结果的准确率。将所述表征向量输入预设TextCNN模型,获取所述预设TextCNN模型输出的分类结果,在此处,通过TextCNN模型的处理,可以准确识别出症状数据对应的分类结果,也即是提高分类结果的准确率。本申请可以提高分诊结果的准确性,提升分诊结果的质量。本申请可应用于智慧城市的智能医疗领域中,从而推动智慧城市的建设。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中症状数据处理方法的一应用环境示意图;
图2是本申请一实施例中症状数据处理方法的一流程示意图;
图3是本申请一实施例中症状数据处理方法的一流程示意图;
图4是本申请一实施例中症状数据处理方法的一流程示意图;
图5是本申请一实施例中症状数据处理方法的一流程示意图;
图6是本申请一实施例中症状数据处理方法的一流程示意图;
图7是本申请一实施例中症状数据处理装置的一结构示意图;
图8是本申请一实施例中计算机设备的一示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本实施例提供的症状数据处理方法,可应用在如图1的应用环境中,其中,客户端与服务端进行通信。其中,客户端包括但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务端可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一实施例中,如图2所示,提供一种症状数据处理方法,以该方法应用在图1中的服务端为例进行说明,包括如下步骤:
S10、获取症状数据。
本实施例中,症状数据处理方法可以在症状数据处理装置上执行。症状数据可以指患者自行在症状数据处理装置输入的数据。示意性的,患者首先输入自己的一个症状。症状数据处理装置会询问该症状的属性(持续时间和发作特点),然后根据输入的症状推荐患者可能有的其他症状。若患者肯定了该症状,则同样询问该症状的相关属性,否则询问下一个症状。当患者确定已完整描述其所有的症状后,可以通过点击“确认”按钮,完成症状数据的提交。在一些情况下,症状数据也可由分诊员辅助输入。
S20、通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于所述症状数据中的症状特征数据而生成;所述症状特征数据包括症状名称和症状属性;所述预设BERT编码器经预训练任务训练后获得;所述预训练任务用于确定所述表征向量与所述症状特征数据之间的关联关系。
本实施例中,预设BERT(Bidirectional Encoder Representations from Transformers,双向转换编码)编码器是基于现有的BERT模型(参见谷歌公司2018年发表的论文《Pre-training of Deep Bidirectional Transformers for Language Understanding》)改进后获得。在此处,预设BERT编码器是经预训练任务训练后获得的。预训练任务为自定义任务,该预训练任务定义为根据当前的表征向量推测该表征向量包含的症状名称及症状属性。预训练任务可以确保预设BERT编码器能够学习到输出的表征向量包含的信息,也即是,通过预训练任务确定表征向量与症状特征数据之间的关联关系。需要注意的是,在此处,关联关系体现在预设BERT编码器的模型参数之中。这样,能够准确地将症状名称和症状属性转换成一个整体的向量,即表征向量。生成的表征向量的数量与症状数据中的症状数量相等。也即是,症状数据中有多少个症状,则生成相同数量且对应的表征向量。
预设BERT编码器是预先经大量病历数据(与步骤S10的类型相同)训练后获得的。因而,生成的表征向量,除了包含输入的症状数据的特征外,还包含了预设BERT编码器预测的与症状数据相关的关联特征。
S30、将所述表征向量输入预设TextCNN模型,获取所述预设TextCNN模型输出的分类结果。
本实施例中,预设TextCNN模型(文本卷积神经网络)可以基于现有的TextCNN模型改进后获得。相较于现有的TextCNN模型,预设TextCNN模型输入的是经预设BERT编码器处理后生成的表征向量,而非随机初始化的词向量。在输入阶段,症状数据生成的所有表征向量均作为预设TextCNN模型的输入数据。在模型计算阶段,使用多个卷积核对输入数据进行卷积,在池化层中池化,池化层的输出连接全连接网络单元,最后使用softmax激活函数输出每一个分类的概率。在一示例中,预设TextCNN模型可以是二分类模型,其分类结果用于判定患者是否患有危重疾病。
步骤S10-S30中,通过获取症状数据,以获得患者实时输入的原始数据。通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于所述症状数据中的症状特征数据而生成;所述症状特征数据包括症状名称和症状属性;所述预设BERT编码器经预训练任务训练后获得;所述预训练任务用于确定所述表征向量与所述症状特征数据之间的关联关系,在此处,通过预设BERT编码器将症状数据处理为表征向量,可以更好地提取出症状数据的特征,获得的表征向量包含的信息量更多,有利于提高分类结果的准确率。将所 述表征向量输入预设TextCNN模型,获取所述预设TextCNN模型输出的分类结果,在此处,通过TextCNN模型的处理,可以准确识别出症状数据对应的分类结果,也即是提高分类结果的准确率。
可选的,如图3所示,步骤S10,即所述获取症状数据,包括:
S101、获取第一症状数据。
第一症状数据指的是患者输入的第一个症状的症状名称和症状属性。如第一症状数据的症状名称为“咳嗽”,相应的属性数据包括“咳嗽三天”和“咳嗽带血丝”。一般情况下,一个症状名称与一个或多个属性数据关联。
S102、根据所述第一症状数据输出相关症状提示。
在获取到第一症状数据(对患者而言,则是完成第一症状数据的输入)之后,可以根据当前第一症状数据推荐患者可能存在的其他症状(可以使用常规的概率模型进行推荐),并生成相关症状提示。在一示例中,相关症状提示可以表示为:您除了“咳嗽”,是否还有“发热”的症状。
S103、基于所述相关症状提示获取第二症状数据;
在输出相关症状提示时,同时输出相应的选择框,分别为“是”和“否”。当患者选“是”时,则进行第二症状数据的收集。第二症状数据的获取方式与第一症状数据的获取方式基本相同,均为患者的输入数据。当患者选“否”时,则不收集当前相关症状提示所对应的第二症状数据。
需要注意的是,输出的相关症状提示的数量可以大于一。也就是说,收集到的第二症状数据可以大于一。在一些患者的症状数据中,可能包含的症状为3-5种。
S104、在确定症状数据收集完毕后,完成所述症状数据的获取,所述症状数据包括所述第一症状数据和所述第二症状数据。
当患者判断自己描述的症状已经是完整的,则可以点击“确认”按钮,以确定症状数据收集完毕。一般情况下,在所有症状数据中,只有一个第一症状数据,而第二症状数据的个数可以是任意非负整数,即可以是零或正整数。
步骤S101-S104中,获取第一症状数据,在此处,可以按照症状的不同逐步收集患者的症状数据,若存在多个症状数据,一般情况下,第一症状数据的重要度最高。根据所述第一症状数据输出相关症状提示,以确定患者是否存在与第一症状(即第一症状数据所对应的症状名称)相关的其他症状。基于所述相关症状提示获取第二症状数据,以进一步收集患者的症状数据(在此处,第二症状数据指的是除第一症状数据外的其他症状数据)。在确定症状数据收集完毕后,完成所述症状数据的获取,所述症状数据包括所述第一症状数据和所述第二症状数据,由此,可以获得比较详尽的症状数据,提高分类结果的准确性。
可选的,如图4所示,步骤S20,即所述通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于症状名称和症状属性而生成之前,还包括:
S201、建立预训练任务,使用word2vec模型将症状样本处理为若干个词向量,所述词向量包括基于症状名称生成的第一词向量,基于症状属性生成的第二词向量;
S202、将所述若干个词向量输入初始BERT网络模型,获取所述初始BERT网络模型输出的训练表征向量;
S203、根据所述训练表征向量计算所述初始BERT网络模型的损失值;
S204、若所述损失值处于预设范围之外,调整所述初始BERT网络模型的模型参数,并重新计算所述症状样本的训练表征向量;
S205、若所述损失值处于预设范围之内,则所述预训练任务训练完毕,训练完毕后的所述初始BERT网络模型即为所述预设BERT编码器。
本实施例中,预训练任务主要用于执行步骤S202-S204的循环迭代计算。在循环迭代计算之前,需要通过word2vec模型将症状样本转化为词向量。word2vec模型是一种用于 产生词向量的模型。在一示例中,症状数据为{咳嗽:三天;带血丝},经word2vec模型转换后可以得到词向量emb1(咳嗽)、emb2(三天)和emb3(带血丝)。在此处,emb1为第一词向量,emb2和emb3为第二词向量。症状样本指的是用于训练初始BERT网络模型的训练数据,一般为某个地区的历史症状数据。
在获得词向量后,可以将各个词向量作为输入数据输入初始BERT网络模型,并生成训练表征向量,并计算相应的损失值。具体的,损失值可由损失函数计算获得。该损失函数定义为:
上式中,L(Vs,sym
(n))为第n个症状的损失值;sym
(n)表示症状列表中的第n个症状;Vs表示整体的表征向量;
为第n个症状在表征向量中的损失项,
为其他症状在表征向量中的损失项。通过损失函数可知,出现在表征向量里的症状,其损失值应该尽可能小,反之损失值应尽可能大。
预设范围可以根据实际需要进行调节。若损失值处于预设范围之内,则说明初始BERT网络模型已收敛,此时预训练任务训练完毕。训练完毕后的初始BERT网络模型即为预设BERT编码器。
步骤S201-S205中,建立预训练任务,使用word2vec模型将症状样本处理为若干个词向量,所述词向量包括基于症状名称生成的第一词向量,基于症状属性生成的第二词向量,以获得初始BERT网络模型的输入数据。将所述若干个词向量输入初始BERT网络模型,获取所述初始BERT网络模型输出的训练表征向量,以执行初始BERT网络模型的训练步骤。根据所述训练表征向量计算所述初始BERT网络模型的损失值,所获得的损失值可以用于调整模型参数及判断模型是否收敛。若所述损失值处于预设范围之外,调整所述初始BERT网络模型的模型参数,并重新计算所述症状样本的训练表征向量,以在模型未收敛时,进行迭代计算。若所述损失值处于预设范围之内,则所述预训练任务训练完毕,训练完毕后的所述初始BERT网络模型即为所述预设BERT编码器,在此处,获得了可用于生成表征向量的预设BERT编码器。
可选的,如图5所示,步骤S10之后,即所述获取症状数据之后,还包括:
S11、根据所述症状数据生成句向量;
S12、将所述句向量与本地病历库的病历句向量比较,计算余弦相似度;
S13、根据所述余弦相似度选取指定个数的匹配病历;
S14、获取所述匹配病历所属的就诊科室,将出现频次最高的就诊科室确定为推荐科室。
本实施例中,可以使用预设BERT编码器将症状数据处理为词向量(表征向量可以拆分为多个词向量),再结合本地病历库的TF-IDF值生成句向量。
已知症状数据的句向量和本地病历库中的病历句向量(可以是部分或全部),可以计算出相应的余弦相似度。余弦相似度高,则说明该病历与当前的症状数据相似度高。在计算完所有的余弦相似度后,可以按余弦相似度从高到低排序,选取排序在先的指定个数的匹配病历。在此处,指定个数可以根据实际需要进行设置,如可以是10个。
在选取出匹配病历后,可以获取各个匹配病历所属的就诊科室。每个匹配病历都有对应的就诊科室。示意性的,匹配病历与就诊科室的匹配关系可以表示为:
匹配病历1——就诊科室1;
匹配病历2——就诊科室3;
匹配病历3——就诊科室2;
……;
匹配病历10——就诊科室1。
可以将出现频次最高的就诊科室确定为推荐科室。推荐科室即为建议患者就诊的科室。
步骤S11-S14中,根据所述症状数据生成句向量,以结合本地数据的特点生成句向量。将所述句向量与本地病历库的病历句向量比较,计算余弦相似度,通过余弦相似度可以比较本地病历库中的病历句向量对应的病历与症状数据的相似度。根据所述余弦相似度选取指定个数的匹配病历,以获得相似度最高的匹配病历。获取所述匹配病历所属的就诊科室,将出现频次最高的就诊科室确定为推荐科室,以帮助患者选择最适合的科室。
可选的,如图6所示,步骤S12之前,即所述将所述句向量与本地病历库的病历句向量比较,计算余弦相似度之前,还包括:
S121、获取本地病历库的病历数据;
S122、使用所述预设BERT编码器处理所述病历数据,生成症状词向量查询词表,所述症状词向量查询词表包括每一症状的症状词向量;
S123、计算所述症状词向量的TF-IDF值;
S124、根据所述症状词向量和所述TF-IDF值生成每份病历的病历句向量。
本实施例中,病历数据与步骤S10中症状数据同属于同一地区的患者就诊数据。步骤S122中的预设BERT编码器与步骤S20中的预设BERT编码器训练方式相同,但输出形式略有不同。步骤S122中的预设BERT编码器输出的是症状词向量(W
emb),该词向量的维度为[1,1024]。然后,计算每个症状词向量对应的TF-IDF(term frequency–inverse document frequency,基于词频的逆文本频率指数)值,并将该TF-IDF值设置为该词向量的权重。TF-IDF值用于评估症状数据中的某个症状描述(词向量)对于病历数据(句向量)的重要程度。
病历句向量可通过以下公式进行转换:
同样的,症状数据也可以采用步骤S122-S124生成相应的句向量。
步骤S121-S124中,获取本地病历库的病历数据,在此处,本地病历库为预先构建的用于存储本地病历数据的数据库,可以用于与步骤S10的症状数据比较。使用所述预设BERT编码器处理所述病历数据,生成症状词向量查询词表,所述症状词向量查询词表包括每一症状的症状词向量,在此处,将病历数据处理为向量形式,便于进行比较。计算所述症状词向量的TF-IDF值,在此处,TF-IDF值被设置为症状词向量在句向量中的权重。根据所述症状词向量和所述TF-IDF值生成每份病历的病历句向量,获得的病历句向量可与症状数据的句向量进行比较(通过计算余弦相似度),确定彼此的相似程度。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种症状数据处理装置,该症状数据处理装置与上述实施例中症状数据处理方法一一对应。如图7所示,该症状数据处理装置包括获取模块10、数据处理模块20和数据输出模块30。各功能模块详细说明如下:
获取模块10,用于获取症状数据;
数据处理模块20,用于通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于所述症状数据中的症状名称及其属性而生成;所述预设BERT编码器经预训练任务训练后获得;所述预训练任务用于确定所述表征向量与所述症状名称和症状属性之间的关联关系;
数据输出模块30,用于将所述表征向量输入预设TextCNN模型,获取所述预设TextCNN模型输出的分类结果。
可选的,获取模块10包括:
获取第一数据单元,用于获取第一症状数据;
提示单元,用于根据所述第一症状数据输出相关症状提示;
获取第二数据单元,用于基于所述相关症状提示获取第二症状数据;
完成收集单元,用于在确定症状数据收集完毕后,完成所述症状数据的获取,所述症状数据包括所述第一症状数据和所述第二症状数据。
可选的,症状数据处理装置还包括:
建立任务模块,用于建立预训练任务,使用word2vec模型将症状样本处理为若干个词向量,所述词向量包括基于症状名称生成的第一词向量,基于症状属性生成的第二词向量;
模型训练模块,用于将所述若干个词向量输入初始BERT网络模型,获取所述初始BERT网络模型输出的训练表征向量;
损失计算模块,用于根据所述训练表征向量计算所述初始BERT网络模型的损失值;
迭代模块,用于若所述损失值处于预设范围之外,调整所述初始BERT网络模型的模型参数,并重新计算所述症状样本的训练表征向量,以计算所述初始BERT网络模型的损失值;
确定编码器模块,用于若所述损失值处于预设范围之内,则所述预训练任务训练完毕,训练完毕后的所述初始BERT网络模型即为所述预设BERT编码器。
可选的,症状数据处理装置还包括:
生成句向量模块,用于根据所述症状数据生成句向量;
计算余弦相似度模块,用于将所述句向量与本地病历库的病历句向量比较,计算余弦相似度;
匹配病历模块,用于根据所述余弦相似度选取指定个数的匹配病历;
确定科室模块,用于获取所述匹配病历所属的就诊科室,将出现频次最高的就诊科室确定为推荐科室。
可选的,计算余弦相似度模块包括:
获取本地病历数据单元,用于获取本地病历库的病历数据;
计算症状词向量单元,用于使用所述预设BERT编码器处理所述病历数据,生成症状词向量查询词表,所述症状词向量查询词表包括每一症状的症状词向量;
计算TF-IDF值单元,用于计算所述症状词向量的TF-IDF值;
生成病历句向量单元,用于根据所述症状词向量和所述TF-IDF值生成每份病历的病历句向量。
关于症状数据处理装置的具体限定可以参见上文中对于症状数据处理方法的限定,在此不再赘述。上述症状数据处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指 令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储上述症状数据处理方法所涉及的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种症状数据处理方法。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:
获取症状数据;
通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于所述症状数据中的症状特征数据而生成;所述症状特征数据包括症状名称和症状属性;所述预设BERT编码器经预训练任务训练后获得;所述预训练任务用于确定所述表征向量与所述症状名称和症状属性之间的关联关系;
将所述表征向量输入预设TextCNN模型,获取所述预设TextCNN模型输出的分类结果。
在一个实施例中,提供了一个或多个存储有计算机可读指令的计算机可读存储介质,本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。可读存储介质上存储有计算机可读指令,计算机可读指令被一个或多个处理器执行时实现以下步骤:
获取症状数据;
通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于所述症状数据中的症状特征数据而生成;所述症状特征数据包括症状名称和症状属性;所述预设BERT编码器经预训练任务训练后获得;所述预训练任务用于确定所述表征向量与所述症状名称和症状属性之间的关联关系;
将所述表征向量输入预设TextCNN模型,获取所述预设TextCNN模型输出的分类结果。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性可读取存储介质或易失性可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。
Claims (20)
- 一种症状数据处理方法,其中,包括:获取症状数据;通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于所述症状数据中的症状特征数据而生成;所述症状特征数据包括症状名称和症状属性;所述预设BERT编码器经预训练任务训练后获得;所述预训练任务用于确定所述表征向量与所述症状特征数据之间的关联关系;将所述表征向量输入预设TextCNN模型,获取所述预设TextCNN模型输出的分类结果。
- 如权利要求1所述的症状数据处理方法,其中,所述获取症状数据,包括:获取第一症状数据;根据所述第一症状数据输出相关症状提示;基于所述相关症状提示获取第二症状数据;在确定症状数据收集完毕后,完成所述症状数据的获取,所述症状数据包括所述第一症状数据和所述第二症状数据。
- 如权利要求1所述的症状数据处理方法,其中,所述通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于症状名称和症状属性而生成之前,还包括:建立预训练任务,使用word2vec模型将症状样本处理为若干个词向量,所述词向量包括基于症状名称生成的第一词向量,基于症状属性生成的第二词向量;将所述若干个词向量输入初始BERT网络模型,获取所述初始BERT网络模型输出的训练表征向量;根据所述训练表征向量计算所述初始BERT网络模型的损失值;若所述损失值处于预设范围之外,调整所述初始BERT网络模型的模型参数,并重新计算所述症状样本的训练表征向量,以计算所述初始BERT网络模型的损失值;若所述损失值处于预设范围之内,则所述预训练任务训练完毕,训练完毕后的所述初始BERT网络模型即为所述预设BERT编码器。
- 如权利要求1所述的症状数据处理方法,其中,所述获取症状数据之后,还包括:根据所述症状数据生成句向量;将所述句向量与本地病历库的病历句向量比较,计算余弦相似度;根据所述余弦相似度选取指定个数的匹配病历;获取所述匹配病历所属的就诊科室,将出现频次最高的就诊科室确定为推荐科室。
- 如权利要求4所述的症状数据处理方法,其中,所述将所述句向量与本地病历库的病历句向量比较,计算余弦相似度之前,还包括:获取本地病历库的病历数据;使用所述预设BERT编码器处理所述病历数据,生成症状词向量查询词表,所述症状词向量查询词表包括每一症状的症状词向量;计算所述症状词向量的TF-IDF值;根据所述症状词向量和所述TF-IDF值生成每份病历的病历句向量。
- 一种症状数据处理装置,其中,包括:获取模块,用于获取症状数据;数据处理模块,用于通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于所述症状数据中的症状名称及其属性而生成;所述预设BERT编码器经预训练任务训练后获得;所述预训练任务用于确定所述表征向量与所述症状名称和症状属性之间的关联关系;数据输出模块,用于将所述表征向量输入预设TextCNN模型,获取所述预设TextCNN模型输出的分类结果。
- 如权利要求6所述的症状数据处理装置,其中,所述获取模块还包括:获取第一数据单元,用于获取第一症状数据;提示单元,用于根据所述第一症状数据输出相关症状提示;获取第二数据单元,用于基于所述相关症状提示获取第二症状数据;完成收集单元,用于在确定症状数据收集完毕后,完成所述症状数据的获取,所述症状数据包括所述第一症状数据和所述第二症状数据。
- 如权利要求6所述的症状数据处理装置,其中,还包括:建立任务模块,用于建立预训练任务,使用word2vec模型将症状样本处理为若干个词向量,所述词向量包括基于症状名称生成的第一词向量,基于症状属性生成的第二词向量;模型训练模块,用于将所述若干个词向量输入初始BERT网络模型,获取所述初始BERT网络模型输出的训练表征向量;损失计算模块,用于根据所述训练表征向量计算所述初始BERT网络模型的损失值;迭代模块,用于若所述损失值处于预设范围之外,调整所述初始BERT网络模型的模型参数,并重新计算所述症状样本的训练表征向量,以计算所述初始BERT网络模型的损失值;确定编码器模块,用于若所述损失值处于预设范围之内,则所述预训练任务训练完毕,训练完毕后的所述初始BERT网络模型即为所述预设BERT编码器。
- 如权利要求6所述的症状数据处理装置,其中,还包括:生成句向量模块,用于根据所述症状数据生成句向量;计算余弦相似度模块,用于将所述句向量与本地病历库的病历句向量比较,计算余弦相似度;匹配病历模块,用于根据所述余弦相似度选取指定个数的匹配病历;确定科室模块,用于获取所述匹配病历所属的就诊科室,将出现频次最高的就诊科室确定为推荐科室。
- 如权利要求9所述的症状数据处理装置,其中,所述计算余弦相似度模块包括:获取本地病历数据单元,用于获取本地病历库的病历数据;计算症状词向量单元,用于使用所述预设BERT编码器处理所述病历数据,生成症状词向量查询词表,所述症状词向量查询词表包括每一症状的症状词向量;计算TF-IDF值单元,用于计算所述症状词向量的TF-IDF值;生成病历句向量单元,用于根据所述症状词向量和所述TF-IDF值生成每份病历的病历句向量。
- 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:获取症状数据;通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于所述症状数据中的症状特征数据而生成;所述症状特征数据包括症状名称和症状属性;所述预设BERT编码器经预训练任务训练后获得;所述预训练任务用于确定所述表征向量与所述症状特征数据之间的关联关系;将所述表征向量输入预设TextCNN模型,获取所述预设TextCNN模型输出的分类结果。
- 如权利要求11所述的计算机设备,其中,所述获取症状数据,包括:获取第一症状数据;根据所述第一症状数据输出相关症状提示;基于所述相关症状提示获取第二症状数据;在确定症状数据收集完毕后,完成所述症状数据的获取,所述症状数据包括所述第一症状数据和所述第二症状数据。
- 如权利要求11所述的计算机设备,其中,所述通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于症状名称和症状属性而生成之前,还包括:建立预训练任务,使用word2vec模型将症状样本处理为若干个词向量,所述词向量包括基于症状名称生成的第一词向量,基于症状属性生成的第二词向量;将所述若干个词向量输入初始BERT网络模型,获取所述初始BERT网络模型输出的训练表征向量;根据所述训练表征向量计算所述初始BERT网络模型的损失值;若所述损失值处于预设范围之外,调整所述初始BERT网络模型的模型参数,并重新计算所述症状样本的训练表征向量,以计算所述初始BERT网络模型的损失值;若所述损失值处于预设范围之内,则所述预训练任务训练完毕,训练完毕后的所述初始BERT网络模型即为所述预设BERT编码器。
- 如权利要求11所述的计算机设备,其中,所述获取症状数据之后,还包括:根据所述症状数据生成句向量;将所述句向量与本地病历库的病历句向量比较,计算余弦相似度;根据所述余弦相似度选取指定个数的匹配病历;获取所述匹配病历所属的就诊科室,将出现频次最高的就诊科室确定为推荐科室。
- 如权利要求14所述的计算机设备,其中,所述将所述句向量与本地病历库的病历句向量比较,计算余弦相似度之前,还包括:获取本地病历库的病历数据;使用所述预设BERT编码器处理所述病历数据,生成症状词向量查询词表,所述症状词向量查询词表包括每一症状的症状词向量;计算所述症状词向量的TF-IDF值;根据所述症状词向量和所述TF-IDF值生成每份病历的病历句向量。
- 一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:获取症状数据;通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于所述症状数据中的症状特征数据而生成;所述症状特征数据包括症状名称和症状属性;所述预设BERT编码器经预训练任务训练后获得;所述预训练任务用于确定所述表征向量与所述症状特征数据之间的关联关系;将所述表征向量输入预设TextCNN模型,获取所述预设TextCNN模型输出的分类结果。
- 如权利要求16所述的可读存储介质,其中,所述获取症状数据,包括:获取第一症状数据;根据所述第一症状数据输出相关症状提示;基于所述相关症状提示获取第二症状数据;在确定症状数据收集完毕后,完成所述症状数据的获取,所述症状数据包括所述第一症状数据和所述第二症状数据。
- 如权利要求16所述的可读存储介质,其中,所述通过预设BERT编码器将所述症状数据处理为表征向量,所述表征向量基于症状名称和症状属性而生成之前,还包括:建立预训练任务,使用word2vec模型将症状样本处理为若干个词向量,所述词向量包括基于症状名称生成的第一词向量,基于症状属性生成的第二词向量;将所述若干个词向量输入初始BERT网络模型,获取所述初始BERT网络模型输出的训练表征向量;根据所述训练表征向量计算所述初始BERT网络模型的损失值;若所述损失值处于预设范围之外,调整所述初始BERT网络模型的模型参数,并重新计算所述症状样本的训练表征向量,以计算所述初始BERT网络模型的损失值;若所述损失值处于预设范围之内,则所述预训练任务训练完毕,训练完毕后的所述初始BERT网络模型即为所述预设BERT编码器。
- 如权利要求16所述的可读存储介质,其中,所述获取症状数据之后,还包括:根据所述症状数据生成句向量;将所述句向量与本地病历库的病历句向量比较,计算余弦相似度;根据所述余弦相似度选取指定个数的匹配病历;获取所述匹配病历所属的就诊科室,将出现频次最高的就诊科室确定为推荐科室。
- 如权利要求19所述的可读存储介质,其中,所述将所述句向量与本地病历库的病历句向量比较,计算余弦相似度之前,还包括:获取本地病历库的病历数据;使用所述预设BERT编码器处理所述病历数据,生成症状词向量查询词表,所述症状词向量查询词表包括每一症状的症状词向量;计算所述症状词向量的TF-IDF值;根据所述症状词向量和所述TF-IDF值生成每份病历的病历句向量。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010921651.X | 2020-09-04 | ||
CN202010921651.XA CN112016295B (zh) | 2020-09-04 | 2020-09-04 | 症状数据处理方法、装置、计算机设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021151328A1 true WO2021151328A1 (zh) | 2021-08-05 |
Family
ID=73515804
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/124221 WO2021151328A1 (zh) | 2020-09-04 | 2020-10-28 | 症状数据处理方法、装置、计算机设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112016295B (zh) |
WO (1) | WO2021151328A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113642312A (zh) * | 2021-08-19 | 2021-11-12 | 平安医疗健康管理股份有限公司 | 体检数据的处理方法、装置、设备及存储介质 |
CN113761201A (zh) * | 2021-08-27 | 2021-12-07 | 河北工程大学 | 院前急救信息处理装置 |
CN115132303A (zh) * | 2022-04-28 | 2022-09-30 | 腾讯科技(深圳)有限公司 | 生理标签预测方法、模型训练方法、装置、设备及介质 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112562809A (zh) * | 2020-12-15 | 2021-03-26 | 贵州小宝健康科技有限公司 | 一种基于电子病历文本进行辅助诊断的方法及系统 |
CN113223735B (zh) * | 2021-04-30 | 2024-08-20 | 平安科技(深圳)有限公司 | 基于对话表征的分诊方法、装置、设备及存储介质 |
CN113345574B (zh) * | 2021-05-26 | 2022-03-22 | 复旦大学 | 基于bert语言模型与cnn模型的中医胃痛养生方案获取装置 |
CN113555086B (zh) * | 2021-07-26 | 2024-05-10 | 平安科技(深圳)有限公司 | 基于机器学习的辩证分析方法、装置、设备及介质 |
CN113838579B (zh) * | 2021-09-29 | 2024-07-12 | 平安医疗健康管理股份有限公司 | 一种医疗数据的异常检测方法、装置、设备及存储介质 |
CN114822830B (zh) * | 2022-06-27 | 2022-12-06 | 安徽讯飞医疗股份有限公司 | 问诊交互方法及相关装置、电子设备、存储介质 |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108922608A (zh) * | 2018-06-13 | 2018-11-30 | 平安医疗科技有限公司 | 智能导诊方法、装置、计算机设备和存储介质 |
CN109192300A (zh) * | 2018-08-17 | 2019-01-11 | 百度在线网络技术(北京)有限公司 | 智能问诊方法、系统、计算机设备和存储介质 |
CN109460473A (zh) * | 2018-11-21 | 2019-03-12 | 中南大学 | 基于症状提取和特征表示的电子病历多标签分类方法 |
CN109635122A (zh) * | 2018-11-28 | 2019-04-16 | 平安科技(深圳)有限公司 | 智能疾病问询方法、装置、设备及存储介质 |
CN109887587A (zh) * | 2019-01-22 | 2019-06-14 | 平安科技(深圳)有限公司 | 智能分诊方法、系统、装置及存储介质 |
CN110246572A (zh) * | 2019-05-05 | 2019-09-17 | 清华大学 | 一种基于词向量的医疗分诊方法及系统 |
CN110348008A (zh) * | 2019-06-17 | 2019-10-18 | 五邑大学 | 基于预训练模型和微调技术的医疗文本命名实体识别方法 |
CN110490251A (zh) * | 2019-03-08 | 2019-11-22 | 腾讯科技(深圳)有限公司 | 基于人工智能的预测分类模型获取方法及装置、存储介质 |
CN111104799A (zh) * | 2019-10-16 | 2020-05-05 | 中国平安人寿保险股份有限公司 | 文本信息表征方法、系统及计算机设备、存储介质 |
US20200185102A1 (en) * | 2018-12-11 | 2020-06-11 | K Health Inc. | System and method for providing health information |
CN111477310A (zh) * | 2020-03-04 | 2020-07-31 | 平安国际智慧城市科技股份有限公司 | 分诊数据处理方法、装置、计算机设备及存储介质 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108170667B (zh) * | 2017-11-30 | 2020-06-23 | 阿里巴巴集团控股有限公司 | 词向量处理方法、装置以及设备 |
CN108563725A (zh) * | 2018-04-04 | 2018-09-21 | 华东理工大学 | 一种中文症状体征构成识别方法 |
US11017180B2 (en) * | 2018-04-18 | 2021-05-25 | HelpShift, Inc. | System and methods for processing and interpreting text messages |
KR102060418B1 (ko) * | 2018-06-08 | 2019-12-30 | 연세대학교 산학협력단 | 진단명 레이블링을 위한 딥러닝을 이용한 판독기록문으로부터 최종 진단명 추출 방법 및 장치 |
CN109215754A (zh) * | 2018-09-10 | 2019-01-15 | 平安科技(深圳)有限公司 | 病历数据处理方法、装置、计算机设备和存储介质 |
US11195620B2 (en) * | 2019-01-04 | 2021-12-07 | International Business Machines Corporation | Progress evaluation of a diagnosis process |
US11928142B2 (en) * | 2019-02-18 | 2024-03-12 | Sony Group Corporation | Information processing apparatus and information processing method |
CN110534185B (zh) * | 2019-08-30 | 2024-08-20 | 腾讯科技(深圳)有限公司 | 标注数据获取方法、分诊方法、装置、存储介质及设备 |
CN111259148B (zh) * | 2020-01-19 | 2024-03-26 | 北京小米松果电子有限公司 | 信息处理方法、装置及存储介质 |
CN111415740B (zh) * | 2020-02-12 | 2024-04-19 | 东北大学 | 问诊信息的处理方法、装置、存储介质及计算机设备 |
CN111553140B (zh) * | 2020-05-13 | 2024-03-19 | 金蝶软件(中国)有限公司 | 数据处理方法、数据处理设备及计算机存储介质 |
-
2020
- 2020-09-04 CN CN202010921651.XA patent/CN112016295B/zh active Active
- 2020-10-28 WO PCT/CN2020/124221 patent/WO2021151328A1/zh active Application Filing
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108922608A (zh) * | 2018-06-13 | 2018-11-30 | 平安医疗科技有限公司 | 智能导诊方法、装置、计算机设备和存储介质 |
CN109192300A (zh) * | 2018-08-17 | 2019-01-11 | 百度在线网络技术(北京)有限公司 | 智能问诊方法、系统、计算机设备和存储介质 |
CN109460473A (zh) * | 2018-11-21 | 2019-03-12 | 中南大学 | 基于症状提取和特征表示的电子病历多标签分类方法 |
CN109635122A (zh) * | 2018-11-28 | 2019-04-16 | 平安科技(深圳)有限公司 | 智能疾病问询方法、装置、设备及存储介质 |
US20200185102A1 (en) * | 2018-12-11 | 2020-06-11 | K Health Inc. | System and method for providing health information |
CN109887587A (zh) * | 2019-01-22 | 2019-06-14 | 平安科技(深圳)有限公司 | 智能分诊方法、系统、装置及存储介质 |
CN110490251A (zh) * | 2019-03-08 | 2019-11-22 | 腾讯科技(深圳)有限公司 | 基于人工智能的预测分类模型获取方法及装置、存储介质 |
CN110246572A (zh) * | 2019-05-05 | 2019-09-17 | 清华大学 | 一种基于词向量的医疗分诊方法及系统 |
CN110348008A (zh) * | 2019-06-17 | 2019-10-18 | 五邑大学 | 基于预训练模型和微调技术的医疗文本命名实体识别方法 |
CN111104799A (zh) * | 2019-10-16 | 2020-05-05 | 中国平安人寿保险股份有限公司 | 文本信息表征方法、系统及计算机设备、存储介质 |
CN111477310A (zh) * | 2020-03-04 | 2020-07-31 | 平安国际智慧城市科技股份有限公司 | 分诊数据处理方法、装置、计算机设备及存储介质 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113642312A (zh) * | 2021-08-19 | 2021-11-12 | 平安医疗健康管理股份有限公司 | 体检数据的处理方法、装置、设备及存储介质 |
CN113761201A (zh) * | 2021-08-27 | 2021-12-07 | 河北工程大学 | 院前急救信息处理装置 |
CN113761201B (zh) * | 2021-08-27 | 2023-12-22 | 河北工程大学 | 院前急救信息处理装置 |
CN115132303A (zh) * | 2022-04-28 | 2022-09-30 | 腾讯科技(深圳)有限公司 | 生理标签预测方法、模型训练方法、装置、设备及介质 |
Also Published As
Publication number | Publication date |
---|---|
CN112016295A (zh) | 2020-12-01 |
CN112016295B (zh) | 2024-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021151328A1 (zh) | 症状数据处理方法、装置、计算机设备及存储介质 | |
WO2020237869A1 (zh) | 一种问题意图识别方法、装置、计算机设备及存储介质 | |
WO2020232877A1 (zh) | 一种问题答案选取方法、装置、计算机设备及存储介质 | |
US11113479B2 (en) | Utilizing a gated self-attention memory network model for predicting a candidate answer match to a query | |
WO2020177230A1 (zh) | 基于机器学习的医疗数据分类方法、装置、计算机设备及存储介质 | |
WO2019140863A1 (en) | Method of calculating relevancy, apparatus for calculating relevancy, data query apparatus, and non-transitory computer-readable storage medium | |
WO2022227162A1 (zh) | 问答数据处理方法、装置、计算机设备及存储介质 | |
WO2021114810A1 (zh) | 基于图结构的公文推荐方法、装置、计算机设备及介质 | |
JP2021089705A (ja) | 翻訳品質を評価するための方法と装置 | |
WO2020063092A1 (zh) | 知识图谱的处理方法及装置 | |
CN112287089B (zh) | 用于自动问答系统的分类模型训练、自动问答方法及装置 | |
WO2020140612A1 (zh) | 基于卷积神经网络的意图识别方法、装置、设备及介质 | |
WO2021114620A1 (zh) | 病历质控方法、装置、计算机设备和存储介质 | |
WO2020181808A1 (zh) | 一种文本标点预测方法、装置、计算机设备及存储介质 | |
KR102424085B1 (ko) | 기계-보조 대화 시스템 및 의학적 상태 문의 장치 및 방법 | |
WO2022252636A1 (zh) | 基于人工智能的回答生成方法、装置、设备及存储介质 | |
WO2021164388A1 (zh) | 分诊融合模型训练方法、分诊方法、装置、设备及介质 | |
CN111159343A (zh) | 基于文本嵌入的文本相似性搜索方法、装置、设备和介质 | |
WO2021151358A1 (zh) | 基于解释模型的分诊信息推荐方法、装置、设备及介质 | |
WO2022134357A1 (zh) | 分诊数据处理方法、装置、计算机设备及存储介质 | |
WO2022134805A1 (zh) | 文档分类预测方法、装置、计算机设备及存储介质 | |
WO2022088671A1 (zh) | 自动问答方法、装置、设备及存储介质 | |
WO2021151356A1 (zh) | 分诊数据处理方法、装置、计算机设备及存储介质 | |
WO2020192523A1 (zh) | 译文质量检测方法、装置、机器翻译系统和存储介质 | |
WO2022227214A1 (zh) | 分类模型训练方法、装置、终端设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20916564 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20916564 Country of ref document: EP Kind code of ref document: A1 |