WO2021114620A1 - Medical-record quality control method, apparatus, computer device, and storage medium - Google Patents

Medical-record quality control method, apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2021114620A1
WO2021114620A1 PCT/CN2020/099180 CN2020099180W WO2021114620A1 WO 2021114620 A1 WO2021114620 A1 WO 2021114620A1 CN 2020099180 W CN2020099180 W CN 2020099180W WO 2021114620 A1 WO2021114620 A1 WO 2021114620A1
Authority
WO
WIPO (PCT)
Prior art keywords
symptom
main complaint
complaint information
information
natural language
Prior art date
Application number
PCT/CN2020/099180
Other languages
French (fr)
Chinese (zh)
Inventor
朱昭苇
孙行智
胡岗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021114620A1 publication Critical patent/WO2021114620A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a method, device, computer equipment and storage medium for quality control of medical records based on natural language processing.
  • Medical records are used to record patient visits and are the basic data source for follow-up medical research.
  • the quality control of medical records is one of the important concerns in the quality control system.
  • a method for quality control of medical records comprising:
  • the disease set is matched with the diagnosis information in the medical record to be examined, and it is determined whether the diagnosis information of the medical record to be examined is misdiagnosed according to the matching result.
  • a medical record quality control device comprising:
  • the extraction module is used to extract the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined;
  • a processing module configured to input the main complaint information and the symptom relationship attribute pair into the trained first natural language processing model to obtain a set of diseases matching the main complaint information;
  • the determining module is configured to match the disease set with the diagnostic information in the medical record to be checked, and determine whether the diagnostic information in the medical record to be checked is misdiagnosed according to the matching result.
  • a computer device includes a memory and one or more processors, the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the one or more processors execute the following step:
  • the disease set is matched with the diagnosis information in the medical record to be examined, and it is determined whether the diagnosis information of the medical record to be examined is misdiagnosed according to the matching result.
  • One or more computer-readable storage media storing computer-readable instructions.
  • the one or more processors perform the following steps:
  • the disease set is matched with the diagnosis information in the medical record to be examined, and it is determined whether the diagnosis information of the medical record to be examined is misdiagnosed according to the matching result.
  • the above medical record quality control method, device, computer equipment and storage medium use the trained natural language processing model to perform natural language processing on the main complaint information extracted from the medical record to be examined and the corresponding symptom relationship attribute pair to obtain a match with the main complaint information Disease collection. Then, the disease set matched with the main complaint information is matched with the diagnosis information in the medical record to be checked to determine whether the diagnosis information in the medical record to be checked is misdiagnosed.
  • This method uses the extracted chief complaint information and symptom relationship attributes to determine the disease set corresponding to the chief complaint information, and then matches the diseases in the disease set with the diagnosis information, thereby realizing the judgment of whether the chief complaint information and the diagnosis information are consistent.
  • Fig. 1 is an application scenario diagram of a medical record quality control method according to one or more embodiments
  • FIG. 2 is a schematic flowchart of a method for quality control of medical records according to one or more embodiments
  • FIG. 3 is a schematic diagram of a process of extracting the main complaint information and the corresponding symptom relationship attribute pair steps in the medical record to be examined according to one or more embodiments;
  • FIG. 4 is a flow diagram of the steps of inputting the main complaint information and the symptom relationship attribute pair into the trained first natural language processing model to obtain the disease set matching the main complaint information according to one or more embodiments;
  • Fig. 5 is a schematic diagram of a work flow of a medical record quality control method according to one or more embodiments
  • Fig. 6 is a structural block diagram of a medical record quality control device according to one or more embodiments.
  • Figure 7 is a block diagram of a computer device according to one or more embodiments.
  • the medical record quality control method provided in this application can be applied to the application environment as shown in FIG. 1.
  • the terminal 102 communicates with the server 104 through the network through the network.
  • the server 104 extracts the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be checked; the server 104 inputs the main complaint information and the symptom relationship attribute pair into the trained first natural language processing
  • the model obtains the disease set matching the main complaint information; the server 104 matches the disease set with the diagnosis information in the medical record to be checked, and determines whether the diagnosis information of the medical record to be checked is misdiagnosed according to the matching result.
  • the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
  • a method for quality control of medical records is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:
  • Step S202 Extract the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined.
  • the medical record to be checked is an electronic medical record that needs to be quality controlled and has been entered into the terminal.
  • the main complaint information is the description of the patient's own symptoms recorded in the medical record.
  • the symptom relationship attribute pair refers to the attribute pair including the relationship between the symptom entity and the symptom location, symptom duration, etc., including ⁇ symptom entity: symptom location ⁇ symptom entity: symptom duration ⁇ . For example, suppose the symptom entity is coughing and convulsions.
  • the symptom relationship attribute pair can be ⁇ convulsion: right lower limb ⁇ cough: two days ⁇ etc.
  • the server obtains the medical record to be checked, which may be obtained by the user entering the main complaint information and diagnosis information in real time through the terminal, or may be pre-configured and stored in the server.
  • the server obtains the medical record to be examined, the natural language processing model and regular expression are used to extract the symptom relationship attribute pair from the main complaint information of the medical record to be examined.
  • the medical record to be examined may also be stored in a node of a blockchain.
  • Step S204 Input the main complaint information and the symptom relationship attribute pair into the trained first natural language processing model to obtain a disease set matching the main complaint information.
  • natural language processing is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, that is, the language people use daily, so it is closely related to linguistic research, but there are important differences.
  • the natural language processing model is a neural network model used for natural language processing.
  • the disease collection refers to a collection that includes multiple diseases.
  • the server extracts the main complaint information and symptom relationship attribute pairs
  • the extracted main complaint information and symptom relationship attribute pairs are input into the pre-trained first natural language processing model.
  • the first natural language processing model is used to perform natural language processing on the main complaint information and the symptom relationship attribute pair, and the main complaint information is matched with the matching diseases to obtain the disease set.
  • Step S206 Match the disease set with the diagnostic information in the medical record to be checked, and determine whether the diagnostic information in the medical record to be checked is misdiagnosed according to the matching result.
  • the diagnosis information is the information entered into the medical record to be examined after the medical staff diagnoses the patient.
  • step S206 includes: when the diagnosis information does not match the diseases in the disease set, determining that the diagnosis information of the medical record to be checked is misdiagnosed; when the diagnosis information matches any disease in the disease set, determining that the diagnosis information matches any disease in the disease set. The diagnostic information in the medical record was not misdiagnosed.
  • the server obtains the diagnosis information from the medical record to be examined, and matches the diagnosis information with each disease in the disease set one by one.
  • diagnosis information matches any one of the diseases in the disease set, it means that the diagnosis of the medical staff matches the main complaint information, and it is determined that there is no misdiagnosis.
  • diagnosis information does not match all the diseases in the disease set, it means that the diagnosis of the medical staff does not match the main complaint information, and the misdiagnosis is determined.
  • the above medical record quality control method uses a trained natural language processing model to perform natural language processing on the main complaint information extracted from the medical record to be examined and the corresponding symptom relationship attribute pair to obtain a set of diseases matching the main complaint information. Then, the disease set matched with the main complaint information is matched with the diagnosis information in the medical record to be checked to determine whether the diagnosis information in the medical record to be checked is misdiagnosed.
  • This method uses the extracted chief complaint information and symptom relationship attributes to determine the disease set corresponding to the chief complaint information, and then matches the diseases in the disease set with the diagnosis information, thereby realizing the judgment of whether the chief complaint information and the diagnosis information are consistent.
  • step S202 includes:
  • step S302 the main complaint information of the medical record to be examined is extracted.
  • the server After the server obtains the medical record to be examined, it first extracts the main complaint information from the medical record to be examined. Since the content of the medical record generally has a fixed format, the server can directly extract the main complaint information from the medical record according to the format of the medical record.
  • Step S304 Input the main complaint information into the trained second natural language processing model, and use the second natural language processing model to extract symptom entities from the main complaint information.
  • the second natural language processing model is a natural language processing model for extracting symptom entities from the main complaint information
  • the second natural language processing model in this embodiment is preferably a named entity recognition model NER.
  • the named entity recognition model is a model used for information extraction, which aims to locate and classify named entities in the text into predefined categories.
  • the server After the server extracts the main complaint information, it inputs the main complaint information into the named entity recognition model NER.
  • the named entity recognition model NER is used to locate and classify the main complaint information to obtain the symptom entities in the main complaint information.
  • Step S306 Query the symptom duration and symptom location of the symptom entity from the main complaint information to obtain a symptom relationship attribute pair.
  • the regular expression is used to query the symptom duration and symptom location corresponding to the symptom entity from the main complaint information. Combine the obtained symptom entity with symptom duration and symptom location to obtain a symptom relationship attribute pair.
  • step S306 includes: matching the nearest punctuation marks on the left and right sides of the symptom entity in the main complaint information to determine the sentence segment where the symptom entity is located; The symptom part characters and symptom time characters in the dictionary are matched; when there are characters that successfully match the symptoms part characters and symptom time characters in the preset dictionary, the successfully matched characters are extracted from the sentence segment; the symptom entities are combined with the extracted characters Character, get symptom relationship attribute pair.
  • the regular expression in this embodiment includes a regular expression punctuation symbol template and a regular expression part and time template.
  • the regular expression punctuation template is a logic program that matches punctuation
  • the regular expression location and time template is a logic program for detecting the symptom location and the duration of the symptom.
  • the server queries the symptom entity's symptom duration and symptom location from the main complaint information, it first calls the regular expression punctuation template.
  • the logic program recorded by the regular expression punctuation template matches the punctuation marks closest to the left and right sides of the symptom entity to determine the sentence segment where the symptom entity is located. For example, the source string is "Patient complained of twitching sensation in the right lower extremity and started coughing 2 days ago".
  • the named entity recognition model NER detects the symptom entity "twitch”
  • the punctuation marks closest to the left and right sides of the "twitch” are queried through the regular expression punctuation template.
  • the punctuation mark on the right side of the symptom entity "twitch” is ",", and no punctuation marks are detected on the left side, so it is considered that the beginning of the left side is the beginning of the sentence segment where the symptom entity "twitch” is located, and the punctuation mark ",” is At the end of the sentence segment where the symptom entity "twitches” is located. Therefore, the sentence segment where the symptom entity "twitches” is located is "the patient complains of twitching in the right lower extremity".
  • the server After the server determines the sentence segment where the symptom entity is located, it then calls the regular expression location and time template, and determines the symptom duration or symptom location corresponding to the symptom entity through the regular expression location and time template. That is, a dictionary constructed offline in advance is obtained, and a preset dictionary is obtained.
  • the form of the preset dictionary can be ⁇ upper right limb, lower right limb, /d day, /d month ⁇ , where d represents any number.
  • the server matches the characters in the preset dictionary representing the symptom location and symptom duration with the characters in the sentence segment one by one, and judges whether the characters in the dictionary are located in the sentence segment.
  • Step S308 Perform text conversion on the symptom relationship attribute pair to obtain a symptom relationship attribute pair in text form.
  • the text form does not include any structured form.
  • the CCP extracts two symptom relationship attribute pairs in the main complaint above: ⁇ cough: 2 days ⁇ and ⁇ convulsions: right lower limb ⁇ .
  • the converted text format is cough two Tian, right lower limb twitching.
  • Arabic numerals need to be converted into Chinese character descriptions during this text conversion process.
  • the server extracts the symptom relationship attribute pair
  • the symptom relationship attribute pair that originally has a structure is converted into a symptom relationship attribute pair in text form.
  • the natural language processing model and regular expression technology are used to extract all symptoms and related attributes from the main complaint information in the medical record to be examined. Compared with the extraction using the natural language processing model alone, the accuracy is higher and can ensure The most comprehensive symptom information is extracted from the main complaint information to improve the accuracy of extraction.
  • the first natural language processing model includes a first natural language text classification model and a second natural language text classification model.
  • step S204 includes:
  • Step S402 Input the main complaint information into the embedding layer of the first natural language text classification model to perform vector conversion to obtain the word vector of the main complaint information.
  • the server inputs the main complaint information into the embedding layer (embedding) of the first natural language text classification model, through the embedding layer, first performs vector conversion on the main complaint information, and the embedding layer outputs the word vector of the main complaint information.
  • the first natural language text classification model in this embodiment is preferably the TextCNN model.
  • the TextCNN model is a model that applies a convolutional neural network CNN to text classification. It extracts key information in sentences by using multiple convolution kernels with different scales.
  • the TextCNN model includes embedding layer (embedding), convolution layer (Convolution), pooling layer (MaxPolling) and fully connected layer (FullConnection and Softmax).
  • the server first inputs the main complaint information into the embedding layer (embedding) of the TextCNN model to obtain the word vector of the main complaint information.
  • step S404 word vector conversion is performed on the symptom relationship attribute to the embedding layer of the input second natural language text classification model to obtain the word vector of the symptom relationship attribute pair.
  • the server inputs the symptom relationship attribute pair in the text form into the embedding layer of the second natural language text classification model.
  • the embedding layer of the second natural language text classification model is used to perform word vector conversion on the symptom relation attribute pair to obtain the word vector of the symptom relation attribute pair.
  • the second natural language text classification model is preferably the FastText model.
  • the Fasttext model is an engineering model based on the word2vec theoretical framework, which can quickly complete text word vector conversion and incorporate text n-gram information at the same time.
  • the main complaint information and the symptom relationship attribute pair in the text form are respectively input to the textCNN model and the fasttext model, instead of obtaining the final output of the textCNN model and the fasttext model. Instead, get the output of the embedding layer in the textCNN model and the fasttext model. That is, the output of the embedding layer of the textCNN model is obtained, and the word vector of the main complaint information is obtained. Obtain the output of the embedding layer of the fasttext model, and obtain the word vector of the symptom relationship attribute pair.
  • step S406 the word vector of the main complaint information and the word vector of the symptom relationship attribute pair are spliced according to the vertical axis direction to obtain a spliced vector.
  • the word vector of the main complaint information and the word vector of the symptom relationship attribute pair are spliced in the direction of the vertical axis to obtain the spliced vector. If there are multiple pairs of symptom relationship attributes at the same time. Firstly, the multiple word vectors of the same symptom relationship attribute pair are spliced on the vertical axis to obtain the spliced word vector of the symptom relationship attribute pair. Then, the word vector corresponding to the main complaint information and the spliced word vector of the symptom relationship attribute pair are spliced on the vertical axis, and the size of the spliced vector finally obtained is 1*N. For example, one main complaint information is extracted from two symptom relationship attribute pairs.
  • the splicing vector is: the word vector of the main complaint information-the word vector of the symptom relationship attribute pair-the word vector of the symptom relationship attribute pair.
  • the order of the word vectors of the symptom relation attribute pair is determined by the order of the model output. Due to the mini-batch method adopted for model training, the batches obtained are randomly selected, so the order of word vectors is random.
  • Step S408 input the splicing vector to the network layer after the embedding layer of the first natural language text classification model, and output the disease set matching the main complaint information.
  • the splicing vector is input to the network layer after the embedding layer of the first natural language text classification model.
  • TextCNN model including embedding layer (embedding), convolution layer (Convolution), pooling layer (MaxPolling) and fully connected layer (FullConnection and Softmax) as an example
  • the stitching vector is directly input to the convolution layer (Convolution) of the TextCNN model.
  • the disease set output by the Full Connection and Softmax layer of the TextCNN model.
  • the number of diseases in the disease set can be configured according to the actual situation, for example, 20 diseases are required for the configuration of the disease set.
  • the Full Connection layer (Full Connection and Softmax) outputs the top 20 diseases according to the probability, and obtains a disease set including 20 diseases.
  • the first natural language text classification model and the second natural language text classification model are based on the MIMIC data set and are trained using a supervision method based on an end-to-end mechanism.
  • a data-driven model is used to perform medical record diagnosis quality control, which can cover more disease types and improve the wide availability of medical record quality control.
  • step S402 includes: each convolution kernel in the embedding layer of the first natural language text classification model convolves the main complaint information to obtain the convolution vector of each convolution kernel; Weighted average processing to obtain the word vector of the main complaint information.
  • each convolution kernel in the embedding layer of the TextCNN model performs a weighted average on the vector obtained by convolving the main complaint information, thereby obtaining the word vector of the main complaint information.
  • the weight coefficient has been fixed when training the TextCNN model.
  • the weight of the vector convolved by different convolution kernels in different embedding layers is fully considered, and the accuracy is improved.
  • FIG. 5 a working flow chart of medical record quality control is provided, and the medical record quality control method is explained with reference to FIG. 5.
  • the server inputs the main complaint information into the embedding layer of the TextCNN model to obtain the word vector of the main complaint information.
  • the server extracts the symptom relationship attribute pair from the main complaint information, inputs the symptom relationship attribute pair into the embedding layer of the FastText model, and obtains the word vector of the symptom relationship attribute pair.
  • the word vector of the main complaint information and the word vector of the symptom relationship attribute pair are spliced on the vertical axis to obtain the splicing vector.
  • the splicing vector is input to the network layer after the embedding layer of the TextCNN model for processing, and the disease set including TOP20 diseases is obtained. Match the disease with the diagnosis information to determine whether it is misdiagnosed.
  • a medical record quality control device which includes: an extraction module 602, a processing module 604, and a determination module 606, wherein:
  • the extraction module 602 is used to extract the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined.
  • the processing module 604 is used to input the main complaint information and the symptom relationship attribute pair into the trained first natural language processing model to obtain a disease set that matches the main complaint information.
  • the determining module 606 is configured to match the disease set with the diagnosis information in the medical record to be checked, and determine whether the diagnosis information of the medical record to be checked is misdiagnosed according to the matching result.
  • the extraction module 602 is also used to extract the main complaint information of the medical record to be examined; input the main complaint information into the trained second natural language processing model, and use the second natural language processing model to extract symptom entities from the main complaint information; From the main complaint information, query the symptom entity's symptom duration and symptom location to obtain the symptom relationship attribute pair; perform text conversion of the symptom relationship attribute pair to obtain the symptom relationship attribute pair in text form.
  • the extraction module 602 is also used to match the nearest punctuation marks on the left and right sides of the symptom entity in the main complaint information to determine the sentence segment where the symptom entity is located; The symptom part character and the symptom time character are matched; when there is a character that successfully matches the symptom part character and the symptom time character in the preset dictionary, the successfully matched character is extracted from the sentence segment; the symptom entity and the extracted character are combined to obtain Symptom relationship attribute pair.
  • the processing module 604 is further configured to input the main complaint information into the embedding layer of the first natural language text classification model for vector conversion to obtain the word vector of the main complaint information; and classify the symptom relationship attribute to the input second natural language text
  • the embedding layer of the model performs word vector conversion to obtain the word vector of the symptom relation attribute pair; splicing the word vector of the main complaint information and the word vector of the symptom relation attribute pair according to the vertical axis direction to obtain the splicing vector; input the splicing vector into the first natural language
  • the network layer after the embedding layer of the text classification model outputs a set of diseases matching the main complaint information.
  • the processing module 604 is also used to convolve the main complaint information with each convolution kernel in the embedding layer of the first natural language text classification model to obtain the convolution vector of each convolution kernel; Perform weighted average processing to obtain the word vector of the main complaint information.
  • the determining module 606 is further configured to determine that the diagnostic information of the medical record to be checked is misdiagnosed when the diagnostic information does not match the diseases in the disease set; when the diagnostic information matches any disease in the disease set, determine The diagnosis information of the medical record to be examined is not misdiagnosed.
  • Each module in the above medical record quality control device can be implemented in whole or in part by software, hardware and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 7.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile or volatile storage medium and internal memory.
  • the non-volatile or volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as medical records and models to be examined.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by the processor to realize a medical record quality control method.
  • FIG. 7 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device including a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the one or more processors perform the following steps: extracting medical records to be examined The main complaint information and the corresponding symptom relationship attribute pair;
  • the disease set is matched with the diagnosis information in the medical record to be checked, and the diagnosis information in the medical record to be checked is determined according to the matching result whether it is misdiagnosed.
  • the processor further implements the following steps when executing the computer-readable instructions:
  • the symptom relationship attribute pair is converted into text to obtain the symptom relationship attribute pair in text form.
  • the processor further implements the following steps when executing the computer-readable instructions:
  • the processor further implements the following steps when executing the computer-readable instructions:
  • the splicing vector is input to the network layer after the embedding layer of the first natural language text classification model, and the disease set matching the main complaint information is output.
  • the processor further implements the following steps when executing the computer-readable instructions:
  • Each convolution kernel in the embedding layer of the first natural language text classification model convolves the main complaint information to obtain the convolution vector of each convolution kernel
  • the processor further implements the following steps when executing the computer-readable instructions:
  • diagnosis information does not match the diseases in the disease set, it is determined that the diagnosis information in the medical record to be examined is misdiagnosed.
  • diagnosis information matches any disease in the disease set, it is determined that the diagnosis information of the medical record to be examined is not misdiagnosed.
  • One or more computer-readable storage media storing computer-readable instructions.
  • the one or more processors perform the following steps:
  • the disease set is matched with the diagnosis information in the medical record to be checked, and the diagnosis information in the medical record to be checked is determined according to the matching result whether it is misdiagnosed.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the symptom relationship attribute pair is converted into text to obtain the symptom relationship attribute pair in text form.
  • the following steps are further implemented: input the main complaint information into the embedding layer of the first natural language text classification model to perform vector conversion to obtain the word vector of the main complaint information;
  • the splicing vector is input to the network layer after the embedding layer of the first natural language text classification model, and the disease set matching the main complaint information is output.
  • Each convolution kernel in the embedding layer of the first natural language text classification model convolves the main complaint information to obtain the convolution vector of each convolution kernel
  • diagnosis information does not match the diseases in the disease set, it is determined that the diagnosis information in the medical record to be examined is misdiagnosed.
  • diagnosis information matches any disease in the disease set, it is determined that the diagnosis information of the medical record to be examined is not misdiagnosed.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Biophysics (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

Provided are a medical-record quality control method, apparatus, computer device, and storage medium, relating to artificial intelligence. The method comprises: extracting chief complaint information and a corresponding symptom relationship attribute pair in a medical record to be examined (S202); inputting the chief complaint information and symptom relationship attribute pair to a trained first natural language processing model to obtain a set of diseases matching the chief complaint information (S204); matching the set of diseases with diagnostic information in the medical record to be examined, and according to the result of matching, determining whether the diagnosis information of the medical record to be examined is a misdiagnosis (S206). In addition, the invention also relates to blockchain technology, and the medical record to be examined can be stored on the blockchain. Using this method, it can be determined whether the chief complaint information and the diagnostic information are consistent, thus achieving diagnosis quality control.

Description

病历质控方法、装置、计算机设备和存储介质Medical record quality control method, device, computer equipment and storage medium
相关申请的交叉引用Cross-references to related applications
本申请要求于2020年06月16日提交中国专利局,申请号为2020105485409,申请名称为“病历质控方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 16, 2020. The application number is 2020105485409, and the application title is "Medical Record Quality Control Methods, Devices, Computer Equipment and Storage Media". The entire content of the Chinese patent application is by reference. Incorporated in this application.
技术领域Technical field
本申请涉及人工智能领域,特别是涉及一种基于自然语言处理的病历质控方法、装置、计算机设备和存储介质。This application relates to the field of artificial intelligence, and in particular to a method, device, computer equipment and storage medium for quality control of medical records based on natural language processing.
背景技术Background technique
病历用于记录患者就诊信息,是后续医学研究的基础数据源。为了加强医院病案质量管理,完善医院内部质量管理体系,以及后续测验医生业务水平提高医生能力,病历质控在质控系统中是重要关注点之一。Medical records are used to record patient visits and are the basic data source for follow-up medical research. In order to strengthen the quality management of hospital medical records, improve the internal quality management system of the hospital, and follow-up test the doctor's professional level to improve the doctor's ability, the quality control of medical records is one of the important concerns in the quality control system.
然而,发明人意识到,目前病历质控大多集中在病历书写等基础层面,例如病历书写是否正确、病例条目前后是否一致等,缺乏主诉和诊断是否一致的判断。However, the inventor realizes that the current quality control of medical records is mostly focused on basic aspects such as medical record writing, such as whether the medical record is written correctly, whether the case entries are consistent before and after, etc., and there is a lack of judgment on whether the main complaint and the diagnosis are consistent.
发明内容Summary of the invention
根据本申请公开的各种实施例,提供一种病历质控方法、装置、计算机设备和存储介质。According to various embodiments disclosed in the present application, a medical record quality control method, device, computer equipment, and storage medium are provided.
一种病历质控方法,所述方法包括:A method for quality control of medical records, the method comprising:
抽取待检病历中的主诉信息以及对应的症状关系属性对;Extract the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined;
将所述主诉信息和所述症状关系属性对输入训练好的第一自然语言处理模型,得到与所述主诉信息匹配的疾病集合;及Input the trained first natural language processing model to the main complaint information and the symptom relationship attribute pair to obtain a set of diseases matching the main complaint information; and
将所述疾病集合与所述待检病历中的诊断信息进行匹配,根据匹配结果确定所述待检病历的诊断信息是否误诊。The disease set is matched with the diagnosis information in the medical record to be examined, and it is determined whether the diagnosis information of the medical record to be examined is misdiagnosed according to the matching result.
一种病历质控装置,所述装置包括:A medical record quality control device, the device comprising:
抽取模块,用于抽取待检病历中的主诉信息以及对应的症状关系属性对;The extraction module is used to extract the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined;
处理模块,用于将所述主诉信息和所述症状关系属性对输入训练好的第一自然语言处理模型,得到与所述主诉信息匹配的疾病集合;及A processing module, configured to input the main complaint information and the symptom relationship attribute pair into the trained first natural language processing model to obtain a set of diseases matching the main complaint information; and
确定模块,用于将所述疾病集合与所述待检病历中的诊断信息进行匹配,根据匹配结果确定所述待检病历的诊断信息是否误诊。The determining module is configured to match the disease set with the diagnostic information in the medical record to be checked, and determine whether the diagnostic information in the medical record to be checked is misdiagnosed according to the matching result.
一种计算机设备,包括存储器和一个或多个处理器,所述存储器存储有计算机可读指 令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device includes a memory and one or more processors, the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the one or more processors execute the following step:
抽取待检病历中的主诉信息以及对应的症状关系属性对;Extract the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined;
将所述主诉信息和所述症状关系属性对输入训练好的第一自然语言处理模型,得到与所述主诉信息匹配的疾病集合;及Input the trained first natural language processing model to the main complaint information and the symptom relationship attribute pair to obtain a set of diseases matching the main complaint information; and
将所述疾病集合与所述待检病历中的诊断信息进行匹配,根据匹配结果确定所述待检病历的诊断信息是否误诊。The disease set is matched with the diagnosis information in the medical record to be examined, and it is determined whether the diagnosis information of the medical record to be examined is misdiagnosed according to the matching result.
一个或多个存储有计算机可读指令的计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:One or more computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:
抽取待检病历中的主诉信息以及对应的症状关系属性对;Extract the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined;
将所述主诉信息和所述症状关系属性对输入训练好的第一自然语言处理模型,得到与所述主诉信息匹配的疾病集合;及Input the trained first natural language processing model to the main complaint information and the symptom relationship attribute pair to obtain a set of diseases matching the main complaint information; and
将所述疾病集合与所述待检病历中的诊断信息进行匹配,根据匹配结果确定所述待检病历的诊断信息是否误诊。The disease set is matched with the diagnosis information in the medical record to be examined, and it is determined whether the diagnosis information of the medical record to be examined is misdiagnosed according to the matching result.
上述病历质控方法、装置、计算机设备和存储介质,通过利用训练好的自然语言处理模型对从待检病历中抽取的主诉信息以及对应的症状关系属性对进行自然语言处理,得到与主诉信息匹配的疾病集合。进而将与主诉信息匹配的疾病集合与待检病历中的诊断信息进行匹配,确定待检病历的诊断信息是否误诊。该方法通过抽取的主诉信息和症状关系属性对确定与主诉信息相对应的疾病集合后,将疾病集合中的疾病与诊断信息进行匹配,从而实现了对主诉信息与诊断信息是否一致的判断。The above medical record quality control method, device, computer equipment and storage medium use the trained natural language processing model to perform natural language processing on the main complaint information extracted from the medical record to be examined and the corresponding symptom relationship attribute pair to obtain a match with the main complaint information Disease collection. Then, the disease set matched with the main complaint information is matched with the diagnosis information in the medical record to be checked to determine whether the diagnosis information in the medical record to be checked is misdiagnosed. This method uses the extracted chief complaint information and symptom relationship attributes to determine the disease set corresponding to the chief complaint information, and then matches the diseases in the disease set with the diagnosis information, thereby realizing the judgment of whether the chief complaint information and the diagnosis information are consistent.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. A person of ordinary skill in the art can obtain other drawings based on these drawings without creative work.
图1为根据一个或多个实施例中病历质控方法的应用场景图;Fig. 1 is an application scenario diagram of a medical record quality control method according to one or more embodiments;
图2为根据一个或多个实施例中病历质控方法的流程示意图;2 is a schematic flowchart of a method for quality control of medical records according to one or more embodiments;
图3为根据一个或多个实施例中抽取待检病历中的主诉信息以及对应的症状关系属性对步骤的流程示意图;FIG. 3 is a schematic diagram of a process of extracting the main complaint information and the corresponding symptom relationship attribute pair steps in the medical record to be examined according to one or more embodiments;
图4为根据一个或多个实施例中将主诉信息和症状关系属性对输入训练好的第一自然语言处理模型,得到与主诉信息匹配的疾病集合步骤的流程示意图;4 is a flow diagram of the steps of inputting the main complaint information and the symptom relationship attribute pair into the trained first natural language processing model to obtain the disease set matching the main complaint information according to one or more embodiments;
图5为根据一个或多个实施例中病历质控方法的工作流程示意图;Fig. 5 is a schematic diagram of a work flow of a medical record quality control method according to one or more embodiments;
图6为根据一个或多个实施例中病历质控装置的结构框图;Fig. 6 is a structural block diagram of a medical record quality control device according to one or more embodiments;
图7为根据一个或多个实施例中计算机设备的框图。Figure 7 is a block diagram of a computer device according to one or more embodiments.
具体实施方式Detailed ways
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the technical solutions and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.
本申请提供的病历质控方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104通过网络进行通信。当终端102将待检病历发送给服务器104之后,服务器104抽取待检病历中的主诉信息以及对应的症状关系属性对;服务器104将主诉信息和症状关系属性对输入训练好的第一自然语言处理模型,得到与主诉信息匹配的疾病集合;服务器104将疾病集合与待检病历中的诊断信息进行匹配,根据匹配结果确定待检病历的诊断信息是否误诊。其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The medical record quality control method provided in this application can be applied to the application environment as shown in FIG. 1. Wherein, the terminal 102 communicates with the server 104 through the network through the network. After the terminal 102 sends the medical record to be checked to the server 104, the server 104 extracts the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be checked; the server 104 inputs the main complaint information and the symptom relationship attribute pair into the trained first natural language processing The model obtains the disease set matching the main complaint information; the server 104 matches the disease set with the diagnosis information in the medical record to be checked, and determines whether the diagnosis information of the medical record to be checked is misdiagnosed according to the matching result. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
在其中一个实施例中,如图2所示,提供了一种病历质控方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:In one of the embodiments, as shown in FIG. 2, a method for quality control of medical records is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:
步骤S202,抽取待检病历中的主诉信息以及对应的症状关系属性对。Step S202: Extract the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined.
其中,待检病历是需要进行质控,已录入终端的电子病历。主诉信息是病历中记载的患者对自身症状的描述。症状关系属性对是指包括症状实体与症状部位、症状持续时间等关系的属性对,包括{症状实体:症状部位}{症状实体:症状持续时间}。例如,假设症状实体是咳嗽、抽搐。症状关系属性对则可以为{抽搐:右下肢}{咳嗽:两天}等。Among them, the medical record to be checked is an electronic medical record that needs to be quality controlled and has been entered into the terminal. The main complaint information is the description of the patient's own symptoms recorded in the medical record. The symptom relationship attribute pair refers to the attribute pair including the relationship between the symptom entity and the symptom location, symptom duration, etc., including {symptom entity: symptom location}{symptom entity: symptom duration}. For example, suppose the symptom entity is coughing and convulsions. The symptom relationship attribute pair can be {convulsion: right lower limb}{cough: two days} etc.
具体地,服务器获取待检病历,待检病历可以是用户通过终端实时录入主诉信息、诊断信息得到,也可以是预先配置存储在服务器。当服务器获取到待检病历之后,利用自然语言处理模型和正则表达式从待检病历的主诉信息中抽取得到症状关系属性对。需要强调的是,为进一步保证上述待检病历信息的私密和安全性,上述待检病历还可以存储于一区块链的节点中。Specifically, the server obtains the medical record to be checked, which may be obtained by the user entering the main complaint information and diagnosis information in real time through the terminal, or may be pre-configured and stored in the server. After the server obtains the medical record to be examined, the natural language processing model and regular expression are used to extract the symptom relationship attribute pair from the main complaint information of the medical record to be examined. It should be emphasized that, in order to further ensure the privacy and security of the medical record information to be examined, the medical record to be examined may also be stored in a node of a blockchain.
步骤S204,将主诉信息和症状关系属性对输入训练好的第一自然语言处理模型,得到与主诉信息匹配的疾病集合。Step S204: Input the main complaint information and the symptom relationship attribute pair into the trained first natural language processing model to obtain a disease set matching the main complaint information.
其中,自然语言处理是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效沟通的各种理论和方法。自然语言处理是一门融合语言学、计算机科学、数学于一体的科学。因此,这一领域的研究将涉及自然语言,即人们日常使用的语言,所以它与语言学的研究有着密切的联系,但又有重要的区别。自然语言处理模型则是进行自然语言处理所用的神经网络模型。疾病集合是指包括多种疾病的集合。Among them, natural language processing is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, that is, the language people use daily, so it is closely related to linguistic research, but there are important differences. The natural language processing model is a neural network model used for natural language processing. The disease collection refers to a collection that includes multiple diseases.
具体地,当服务器抽取到主诉信息和症状关系属性对之后,通过将抽取到的主诉信息和症状关系属性对输入至预先训练好的第一自然语言处理模型。利用第一自然语言处理 模型对主诉信息和症状关系属性对进行自然语言处理,为主诉信息匹配与其相匹配的疾病,得到疾病集合。Specifically, after the server extracts the main complaint information and symptom relationship attribute pairs, the extracted main complaint information and symptom relationship attribute pairs are input into the pre-trained first natural language processing model. The first natural language processing model is used to perform natural language processing on the main complaint information and the symptom relationship attribute pair, and the main complaint information is matched with the matching diseases to obtain the disease set.
步骤S206,将疾病集合与待检病历中的诊断信息进行匹配,根据匹配结果确定待检病历的诊断信息是否误诊。Step S206: Match the disease set with the diagnostic information in the medical record to be checked, and determine whether the diagnostic information in the medical record to be checked is misdiagnosed according to the matching result.
其中,诊断信息是医护人员对患者进行诊断之后录入到待检病历的信息。Among them, the diagnosis information is the information entered into the medical record to be examined after the medical staff diagnoses the patient.
在其中一个实施例中,步骤S206,包括:当诊断信息与疾病集合中的疾病均不匹配时,确定待检病历的诊断信息为误诊;当诊断信息与疾病集合中任意疾病匹配时,确定待检病历的诊断信息未误诊。In one of the embodiments, step S206 includes: when the diagnosis information does not match the diseases in the disease set, determining that the diagnosis information of the medical record to be checked is misdiagnosed; when the diagnosis information matches any disease in the disease set, determining that the diagnosis information matches any disease in the disease set. The diagnostic information in the medical record was not misdiagnosed.
具体地,服务器从待检病历中获取到诊断信息,将诊断信息与疾病集合中的各疾病一一进行匹配。当诊断信息与疾病集合中任意一个疾病相匹配时,表示医护人员的诊断与主诉信息是相匹配的,确定没有误诊。而当诊断信息与疾病集合中所有的疾病都不匹配时,表示医护人员的诊断与主诉信息是不相匹配的,确定误诊。Specifically, the server obtains the diagnosis information from the medical record to be examined, and matches the diagnosis information with each disease in the disease set one by one. When the diagnosis information matches any one of the diseases in the disease set, it means that the diagnosis of the medical staff matches the main complaint information, and it is determined that there is no misdiagnosis. When the diagnosis information does not match all the diseases in the disease set, it means that the diagnosis of the medical staff does not match the main complaint information, and the misdiagnosis is determined.
上述病历质控方法,通过利用训练好的自然语言处理模型对从待检病历中抽取的主诉信息以及对应的症状关系属性对进行自然语言处理,得到与主诉信息匹配的疾病集合。进而将与主诉信息匹配的疾病集合与待检病历中的诊断信息进行匹配,确定待检病历的诊断信息是否误诊。该方法通过抽取的主诉信息和症状关系属性对确定与主诉信息相对应的疾病集合后,将疾病集合中的疾病与诊断信息进行匹配,从而实现了对主诉信息与诊断信息是否一致的判断。The above medical record quality control method uses a trained natural language processing model to perform natural language processing on the main complaint information extracted from the medical record to be examined and the corresponding symptom relationship attribute pair to obtain a set of diseases matching the main complaint information. Then, the disease set matched with the main complaint information is matched with the diagnosis information in the medical record to be checked to determine whether the diagnosis information in the medical record to be checked is misdiagnosed. This method uses the extracted chief complaint information and symptom relationship attributes to determine the disease set corresponding to the chief complaint information, and then matches the diseases in the disease set with the diagnosis information, thereby realizing the judgment of whether the chief complaint information and the diagnosis information are consistent.
在其中一个实施例中,如图3所示,步骤S202,包括:In one of the embodiments, as shown in FIG. 3, step S202 includes:
步骤S302,提取待检病历的主诉信息。In step S302, the main complaint information of the medical record to be examined is extracted.
具体地,当服务器获取待检病历之后,首先从待检病历中抽取到主诉信息。由于病历的内容一般都具有固定的格式,即服务器可以直接根据病历的格式从病历中抽取到主诉信息。Specifically, after the server obtains the medical record to be examined, it first extracts the main complaint information from the medical record to be examined. Since the content of the medical record generally has a fixed format, the server can directly extract the main complaint information from the medical record according to the format of the medical record.
步骤S304,将主诉信息输入训练好的第二自然语言处理模型,利用第二自然语言处理模型从主诉信息中抽取症状实体。Step S304: Input the main complaint information into the trained second natural language processing model, and use the second natural language processing model to extract symptom entities from the main complaint information.
其中,第二自然语言处理模型是用于从主诉信息中抽取症状实体的自然语言处理模型,本实施例第二自然语言处理模型优选命名实体识别模型NER。命名实体识别模型是信息提取所用的模型,旨在将文本中的命名实体定位并分类为预先定义的类别。Among them, the second natural language processing model is a natural language processing model for extracting symptom entities from the main complaint information, and the second natural language processing model in this embodiment is preferably a named entity recognition model NER. The named entity recognition model is a model used for information extraction, which aims to locate and classify named entities in the text into predefined categories.
具体地,当服务器抽取到主诉信息之后,将主诉信息输入到命名实体识别模型NER中。利用命名实体识别模型NER对主诉信息进行定位并分类得到主诉信息中的症状实体。Specifically, after the server extracts the main complaint information, it inputs the main complaint information into the named entity recognition model NER. The named entity recognition model NER is used to locate and classify the main complaint information to obtain the symptom entities in the main complaint information.
步骤S306,从主诉信息中查询症状实体的症状持续时间和症状部位,得到症状关系属性对。Step S306: Query the symptom duration and symptom location of the symptom entity from the main complaint information to obtain a symptom relationship attribute pair.
具体地,当从主诉信息中抽取到症状实体之后,利用正则表达式从主诉信息中查询症状实体对应的症状持续时间和症状部位。将所得到的症状实体与症状持续时间和症状部位进行组合,得到症状关系属性对。Specifically, after the symptom entity is extracted from the main complaint information, the regular expression is used to query the symptom duration and symptom location corresponding to the symptom entity from the main complaint information. Combine the obtained symptom entity with symptom duration and symptom location to obtain a symptom relationship attribute pair.
在其中一个实施例中,步骤S306,包括:在所述主诉信息中匹配所述症状实体左右两侧最近的标点符号,确定症状实体所在的语句段;将语句段中的各字符逐个与预设字典中的症状部位字符和症状时间字符进行匹配;当存在与预设字典中的症状部位字符和症状时间字符匹配成功的字符时,从语句段中抽取匹配成功的字符;组合症状实体和抽取的字符,得到症状关系属性对。In one of the embodiments, step S306 includes: matching the nearest punctuation marks on the left and right sides of the symptom entity in the main complaint information to determine the sentence segment where the symptom entity is located; The symptom part characters and symptom time characters in the dictionary are matched; when there are characters that successfully match the symptoms part characters and symptom time characters in the preset dictionary, the successfully matched characters are extracted from the sentence segment; the symptom entities are combined with the extracted characters Character, get symptom relationship attribute pair.
其中,本实施例正则表达式包括正则表达式标点符号模板和正则表达式部位和时间模板。正则表达式标点符号模板是一种匹配标点符号的逻辑程序,正则表达式部位和时间模板是用于检测症状部位和症状持续时间的逻辑程序。Wherein, the regular expression in this embodiment includes a regular expression punctuation symbol template and a regular expression part and time template. The regular expression punctuation template is a logic program that matches punctuation, and the regular expression location and time template is a logic program for detecting the symptom location and the duration of the symptom.
具体地,当服务器从主诉信息中查询症状实体的症状持续时间和症状部位时,首先调用正则表达式标点符号模板。通过正则表达式标点符号模板记载的逻辑程序匹配距离症状实体左右两侧最近的标点符号,从而确定该症状实体所在的语句段。例如,源字符串是“患者主诉右下肢有抽搐感、2天前开始咳嗽”。通过命名实体识别模型NER检测到症状实体“抽搐”时,通过正则表达式标点符号模板查询离“抽搐”左右两侧最近的标点符号。此处症状实体“抽搐”的右侧标点符号为“、”,而左侧未检测到标点符号,所以认为左侧起始端即为症状实体“抽搐”所在语句段开头,标点符号“、”为症状实体“抽搐”所在语句段结尾。因此,症状实体“抽搐”所在的语句段为“患者主诉右下肢有抽搐感”。Specifically, when the server queries the symptom entity's symptom duration and symptom location from the main complaint information, it first calls the regular expression punctuation template. The logic program recorded by the regular expression punctuation template matches the punctuation marks closest to the left and right sides of the symptom entity to determine the sentence segment where the symptom entity is located. For example, the source string is "Patient complained of twitching sensation in the right lower extremity and started coughing 2 days ago". When the named entity recognition model NER detects the symptom entity "twitch", the punctuation marks closest to the left and right sides of the "twitch" are queried through the regular expression punctuation template. Here, the punctuation mark on the right side of the symptom entity "twitch" is ",", and no punctuation marks are detected on the left side, so it is considered that the beginning of the left side is the beginning of the sentence segment where the symptom entity "twitch" is located, and the punctuation mark "," is At the end of the sentence segment where the symptom entity "twitches" is located. Therefore, the sentence segment where the symptom entity "twitches" is located is "the patient complains of twitching in the right lower extremity".
服务器确定症状实体所在的语句段之后,再调用正则表达式部位和时间模板,通过正则表达式部位和时间模板确定症状实体对应的症状持续时间或症状部位。即,获取预先离线构建的字典,得到预设字典。例如,预设字典的形式可以是{右上肢、右下肢、/d天、/d月},d表示任意数字。然后,服务器逐个将预设字典中表示症状部位和症状持续时间的字符与语句段中的字符进行匹配,判断字典中的字符是否位于该语句段中。如果有,则从语句段中将匹配到的字符取出作为症状实体的属性,与症状实体建立症状关系属性对。例如,当通过与预设字典匹配,在待检病历的主诉信息中检测到症状部位右下肢,则将其作为症状实体“抽搐”的属性,组合形成症状关系属性对{抽搐:右下肢}。After the server determines the sentence segment where the symptom entity is located, it then calls the regular expression location and time template, and determines the symptom duration or symptom location corresponding to the symptom entity through the regular expression location and time template. That is, a dictionary constructed offline in advance is obtained, and a preset dictionary is obtained. For example, the form of the preset dictionary can be {upper right limb, lower right limb, /d day, /d month}, where d represents any number. Then, the server matches the characters in the preset dictionary representing the symptom location and symptom duration with the characters in the sentence segment one by one, and judges whether the characters in the dictionary are located in the sentence segment. If so, take the matched characters from the sentence segment as the attributes of the symptom entity, and establish a symptom relationship attribute pair with the symptom entity. For example, when the right lower limb of the symptom part is detected in the main complaint information of the medical record to be examined by matching with a preset dictionary, it will be used as the attribute of the symptom entity "twitch" and combined to form a symptom relationship attribute pair {twitch: right lower limb}.
步骤S308,将症状关系属性对进行文本转换,得到文本形式的症状关系属性对。Step S308: Perform text conversion on the symptom relationship attribute pair to obtain a symptom relationship attribute pair in text form.
其中,文本形式是指不包括任何结构化的形式,例如,上述主诉中共抽取到两个症状关系属性对:{咳嗽:2天}、{抽搐:右下肢},转换之后的文本格式为咳嗽两天、右下肢抽搐。另外,在这个文本转换过程中还需将阿拉伯数字转换成汉字描述。Among them, the text form does not include any structured form. For example, the CCP extracts two symptom relationship attribute pairs in the main complaint above: {cough: 2 days} and {convulsions: right lower limb}. The converted text format is cough two Tian, right lower limb twitching. In addition, Arabic numerals need to be converted into Chinese character descriptions during this text conversion process.
具体地,当服务器抽取得到症状关系属性对之后,为了便于后续处理,将原本具有结构的症状关系属性对转换为文本形式的症状关系属性对。Specifically, after the server extracts the symptom relationship attribute pair, in order to facilitate subsequent processing, the symptom relationship attribute pair that originally has a structure is converted into a symptom relationship attribute pair in text form.
在本实施例中,利用自然语言处理模型以及正则表达式技术从待检病历中的主诉信息中抽取全部的症状和相关属性,相比单纯利用自然语言处理模型提取,其精度更高,能够确保从主诉信息中抽取到最全面的症状信息,提高抽取的准确性。In this embodiment, the natural language processing model and regular expression technology are used to extract all symptoms and related attributes from the main complaint information in the medical record to be examined. Compared with the extraction using the natural language processing model alone, the accuracy is higher and can ensure The most comprehensive symptom information is extracted from the main complaint information to improve the accuracy of extraction.
在其中一个实施例中,第一自然语言处理模型包括第一自然语言文本分类模型和第二自然语言文本分类模型。如图4所示,步骤S204,包括:In one of the embodiments, the first natural language processing model includes a first natural language text classification model and a second natural language text classification model. As shown in Fig. 4, step S204 includes:
步骤S402,将主诉信息输入第一自然语言文本分类模型的嵌入层进行向量转换,得到主诉信息的词向量。Step S402: Input the main complaint information into the embedding layer of the first natural language text classification model to perform vector conversion to obtain the word vector of the main complaint information.
具体地,服务器将主诉信息输入第一自然语言文本分类模型的嵌入层(embedding),通过嵌入层首先对主诉信息进行向量转换,有嵌入层输出得到主诉信息的词向量。本实施例第一自然语言文本分类模型优选TextCNN模型,TextCNN模型是一种将卷积神经网络CNN应用到文本分类的模型。其通过利用多个尺度不同卷积核来提取句子中的关键信息。TextCNN模型包括嵌入层(embedding)、卷积层(Convolution)、池化层(MaxPolling)和全连接层(FullConnection and Softmax)。服务器首先将主诉信息输入TextCNN模型的嵌入层(embedding)得到主诉信息的词向量。Specifically, the server inputs the main complaint information into the embedding layer (embedding) of the first natural language text classification model, through the embedding layer, first performs vector conversion on the main complaint information, and the embedding layer outputs the word vector of the main complaint information. The first natural language text classification model in this embodiment is preferably the TextCNN model. The TextCNN model is a model that applies a convolutional neural network CNN to text classification. It extracts key information in sentences by using multiple convolution kernels with different scales. The TextCNN model includes embedding layer (embedding), convolution layer (Convolution), pooling layer (MaxPolling) and fully connected layer (FullConnection and Softmax). The server first inputs the main complaint information into the embedding layer (embedding) of the TextCNN model to obtain the word vector of the main complaint information.
步骤S404,将症状关系属性对输入第二自然语言文本分类模型的嵌入层进行词向量转换,得到症状关系属性对的词向量。In step S404, word vector conversion is performed on the symptom relationship attribute to the embedding layer of the input second natural language text classification model to obtain the word vector of the symptom relationship attribute pair.
具体地,服务器将文本形式的症状关系属性对输入到第二自然语言文本分类模型的嵌入层。利用第二自然语言文本分类模型的嵌入层对症状关系属性对进行词向量转换,得到症状关系属性对的词向量。在本实施例中,第二自然语言文本分类模型优选FastText模型。Fasttext模型是基于word2vec理论框架的,能够快速完成文本词向量转换并同时纳入文本n-gram信息的工程模型。Specifically, the server inputs the symptom relationship attribute pair in the text form into the embedding layer of the second natural language text classification model. The embedding layer of the second natural language text classification model is used to perform word vector conversion on the symptom relation attribute pair to obtain the word vector of the symptom relation attribute pair. In this embodiment, the second natural language text classification model is preferably the FastText model. The Fasttext model is an engineering model based on the word2vec theoretical framework, which can quickly complete text word vector conversion and incorporate text n-gram information at the same time.
应当理解的是,由于只需要进行词向量转换得到对应的词向量,所以主诉信息和文本形式的症状关系属性对分别输入到textCNN模型和fasttext模型之后,不是获取textCNN模型和fasttext模型最终的输出。而是获取textCNN模型和fasttext模型中嵌入层(embedding)的输出即可。即,获取textCNN模型的embedding层的输出,得到主诉信息的词向量。获取fasttext模型的embedding层的输出,得到症状关系属性对的词向量。It should be understood that since only the word vector conversion is needed to obtain the corresponding word vector, the main complaint information and the symptom relationship attribute pair in the text form are respectively input to the textCNN model and the fasttext model, instead of obtaining the final output of the textCNN model and the fasttext model. Instead, get the output of the embedding layer in the textCNN model and the fasttext model. That is, the output of the embedding layer of the textCNN model is obtained, and the word vector of the main complaint information is obtained. Obtain the output of the embedding layer of the fasttext model, and obtain the word vector of the symptom relationship attribute pair.
步骤S406,按照纵轴方向拼接主诉信息的词向量和症状关系属性对的词向量,得到拼接向量。In step S406, the word vector of the main complaint information and the word vector of the symptom relationship attribute pair are spliced according to the vertical axis direction to obtain a spliced vector.
具体地,将主诉信息的词向量和症状关系属性对的词向量按照纵轴方向进行拼接,得到拼接向量。若同时存在多对症状关系属性对。优先把同一个症状关系属性对的多个词向量进行纵轴拼接,得到症状关系属性对的拼接词向量。然后,把对应主诉信息的词向量与症状关系属性对的拼接词向量进行纵轴拼接,最终得到的拼接向量的尺寸为1*N。例如,一个主诉信息抽取到2个症状关系属性对。拼接向量为:主诉信息的词向量-症状关系属性对的词向量-症状关系属性对的词向量。其中,症状关系属性对的词向量的顺序由模型输出的顺序决定。由于模型训练采取的mini-batch方法,所以获取的batch是随机抽取的,因此词向量的顺序是随机的。Specifically, the word vector of the main complaint information and the word vector of the symptom relationship attribute pair are spliced in the direction of the vertical axis to obtain the spliced vector. If there are multiple pairs of symptom relationship attributes at the same time. Firstly, the multiple word vectors of the same symptom relationship attribute pair are spliced on the vertical axis to obtain the spliced word vector of the symptom relationship attribute pair. Then, the word vector corresponding to the main complaint information and the spliced word vector of the symptom relationship attribute pair are spliced on the vertical axis, and the size of the spliced vector finally obtained is 1*N. For example, one main complaint information is extracted from two symptom relationship attribute pairs. The splicing vector is: the word vector of the main complaint information-the word vector of the symptom relationship attribute pair-the word vector of the symptom relationship attribute pair. Among them, the order of the word vectors of the symptom relation attribute pair is determined by the order of the model output. Due to the mini-batch method adopted for model training, the batches obtained are randomly selected, so the order of word vectors is random.
步骤S408,将拼接向量输入至第一自然语言文本分类模型的嵌入层之后的网络层,输出与主诉信息匹配的疾病集合。Step S408, input the splicing vector to the network layer after the embedding layer of the first natural language text classification model, and output the disease set matching the main complaint information.
具体地,当服务器得到拼接向量之后,将拼接向量输入第一自然语言文本分类模型的嵌入层之后的网络层。以TextCNN模型包括嵌入层(embedding)、卷积层(Convolution)、 池化层(MaxPolling)和全连接层(FullConnection and Softmax)为例,则将拼接向量直接输入至TextCNN模型的卷积层(Convolution)。然后,获取TextCNN模型的全连接层(FullConnection and Softmax)输出的疾病集合。疾病集合中的疾病数量可以根据实际情况配置,例如,疾病集合配置需要20条疾病。则全连接层(FullConnection and Softmax)按照概率输出排名前20的疾病,得到包括20条疾病的疾病集合。Specifically, after the server obtains the splicing vector, the splicing vector is input to the network layer after the embedding layer of the first natural language text classification model. Taking TextCNN model including embedding layer (embedding), convolution layer (Convolution), pooling layer (MaxPolling) and fully connected layer (FullConnection and Softmax) as an example, the stitching vector is directly input to the convolution layer (Convolution) of the TextCNN model. ). Then, obtain the disease set output by the Full Connection and Softmax layer of the TextCNN model. The number of diseases in the disease set can be configured according to the actual situation, for example, 20 diseases are required for the configuration of the disease set. Then the Full Connection layer (Full Connection and Softmax) outputs the top 20 diseases according to the probability, and obtains a disease set including 20 diseases.
本实施例中,第一自然语言文本分类模型和第二自然语言文本分类模型基于MIMIC数据集,采用基于端到端机制使用监督方法训练得到。本实施例通过数据驱动的模型进行病历诊断质控,能够覆盖更多的病种,提高病历质控的广泛可用性。In this embodiment, the first natural language text classification model and the second natural language text classification model are based on the MIMIC data set and are trained using a supervision method based on an end-to-end mechanism. In this embodiment, a data-driven model is used to perform medical record diagnosis quality control, which can cover more disease types and improve the wide availability of medical record quality control.
在其中一个实施例中,步骤S402,包括:第一自然语言文本分类模型的嵌入层中的各卷积核对主诉信息进行卷积,得到各卷积核的卷积向量;将各卷积向量进行加权平均处理,得到主诉信息的词向量。In one of the embodiments, step S402 includes: each convolution kernel in the embedding layer of the first natural language text classification model convolves the main complaint information to obtain the convolution vector of each convolution kernel; Weighted average processing to obtain the word vector of the main complaint information.
具体地,TextCNN模型的嵌入层中各卷积核对主诉信息进行卷积得到的向量进行加权平均,从而得到主诉信息的词向量。其中,权重系数在训练TextCNN模型时已经固定得到。Specifically, each convolution kernel in the embedding layer of the TextCNN model performs a weighted average on the vector obtained by convolving the main complaint information, thereby obtaining the word vector of the main complaint information. Among them, the weight coefficient has been fixed when training the TextCNN model.
本实施例中,相比传统直接取向量的均值的方法来说,充分考虑了不同嵌入层中不同卷积核卷积出的向量的权重,提高准确性。In this embodiment, compared with the traditional method of directly taking the mean value of the vector, the weight of the vector convolved by different convolution kernels in different embedding layers is fully considered, and the accuracy is improved.
在其中一个实施例中,如图5所示,提供一种病历质控的工作流程图,参考图5对病历质控方法进行解释说明。In one of the embodiments, as shown in FIG. 5, a working flow chart of medical record quality control is provided, and the medical record quality control method is explained with reference to FIG. 5.
具体地,首先获取包括主诉信息和诊断信息的待检病历。服务器将主诉信息输入TextCNN模型的嵌入层,得到主诉信息的词向量。同时,服务器从主诉信息中抽取得到症状关系属性对,将症状关系属性对输入FastText模型的嵌入层,得到症状关系属性对的词向量。然后,将主诉信息的词向量和症状关系属性对的词向量进行纵轴拼接,得到拼接向量。最后,将拼接向量输入到TextCNN模型的嵌入层之后的网络层进行处理,得到包括TOP20疾病的疾病集合。将疾病结合与诊断信息进行匹配,确定是否误诊。Specifically, first obtain the medical record to be examined including the main complaint information and the diagnosis information. The server inputs the main complaint information into the embedding layer of the TextCNN model to obtain the word vector of the main complaint information. At the same time, the server extracts the symptom relationship attribute pair from the main complaint information, inputs the symptom relationship attribute pair into the embedding layer of the FastText model, and obtains the word vector of the symptom relationship attribute pair. Then, the word vector of the main complaint information and the word vector of the symptom relationship attribute pair are spliced on the vertical axis to obtain the splicing vector. Finally, the splicing vector is input to the network layer after the embedding layer of the TextCNN model for processing, and the disease set including TOP20 diseases is obtained. Match the disease with the diagnosis information to determine whether it is misdiagnosed.
应该理解的是,虽然图2-4的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-4中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowcharts of FIGS. 2-4 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least part of the steps in Figures 2-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
在其中一个实施例中,如图6所示,提供了一种病历质控装置,包括:抽取模块602、处理模块604和确定模块606,其中:In one of the embodiments, as shown in FIG. 6, a medical record quality control device is provided, which includes: an extraction module 602, a processing module 604, and a determination module 606, wherein:
抽取模块602,用于抽取待检病历中的主诉信息以及对应的症状关系属性对。The extraction module 602 is used to extract the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined.
处理模块604,用于将主诉信息和症状关系属性对输入训练好的第一自然语言处理模 型,得到与主诉信息匹配的疾病集合。The processing module 604 is used to input the main complaint information and the symptom relationship attribute pair into the trained first natural language processing model to obtain a disease set that matches the main complaint information.
确定模块606,用于将疾病集合与待检病历中的诊断信息进行匹配,根据匹配结果确定待检病历的诊断信息是否误诊。The determining module 606 is configured to match the disease set with the diagnosis information in the medical record to be checked, and determine whether the diagnosis information of the medical record to be checked is misdiagnosed according to the matching result.
在其中一个实施例中,抽取模块602还用于提取待检病历的主诉信息;将主诉信息输入训练好的第二自然语言处理模型,利用第二自然语言处理模型从主诉信息中抽取症状实体;从主诉信息中查询症状实体的症状持续时间和症状部位,得到症状关系属性对;将症状关系属性对进行文本转换,得到文本形式的症状关系属性对。In one of the embodiments, the extraction module 602 is also used to extract the main complaint information of the medical record to be examined; input the main complaint information into the trained second natural language processing model, and use the second natural language processing model to extract symptom entities from the main complaint information; From the main complaint information, query the symptom entity's symptom duration and symptom location to obtain the symptom relationship attribute pair; perform text conversion of the symptom relationship attribute pair to obtain the symptom relationship attribute pair in text form.
在其中一个实施例中,抽取模块602还用于在主诉信息中匹配症状实体左右两侧最近的标点符号,确定症状实体所在的语句段;将语句段中的各字符逐个与预设字典中的症状部位字符和症状时间字符进行匹配;当存在与预设字典中的症状部位字符和症状时间字符匹配成功的字符时,从语句段中抽取匹配成功的字符;组合症状实体和抽取的字符,得到症状关系属性对。In one of the embodiments, the extraction module 602 is also used to match the nearest punctuation marks on the left and right sides of the symptom entity in the main complaint information to determine the sentence segment where the symptom entity is located; The symptom part character and the symptom time character are matched; when there is a character that successfully matches the symptom part character and the symptom time character in the preset dictionary, the successfully matched character is extracted from the sentence segment; the symptom entity and the extracted character are combined to obtain Symptom relationship attribute pair.
在其中一个实施例中,处理模块604还用于将主诉信息输入第一自然语言文本分类模型的嵌入层进行向量转换,得到主诉信息的词向量;将症状关系属性对输入第二自然语言文本分类模型的嵌入层进行词向量转换,得到症状关系属性对的词向量;按照纵轴方向拼接主诉信息的词向量和症状关系属性对的词向量,得到拼接向量;将拼接向量输入至第一自然语言文本分类模型的嵌入层之后的网络层,输出与主诉信息匹配的疾病集合。In one of the embodiments, the processing module 604 is further configured to input the main complaint information into the embedding layer of the first natural language text classification model for vector conversion to obtain the word vector of the main complaint information; and classify the symptom relationship attribute to the input second natural language text The embedding layer of the model performs word vector conversion to obtain the word vector of the symptom relation attribute pair; splicing the word vector of the main complaint information and the word vector of the symptom relation attribute pair according to the vertical axis direction to obtain the splicing vector; input the splicing vector into the first natural language The network layer after the embedding layer of the text classification model outputs a set of diseases matching the main complaint information.
在其中一个实施例中,处理模块604还用于第一自然语言文本分类模型的嵌入层中的各卷积核对主诉信息进行卷积,得到各卷积核的卷积向量;将各卷积向量进行加权平均处理,得到主诉信息的词向量。In one of the embodiments, the processing module 604 is also used to convolve the main complaint information with each convolution kernel in the embedding layer of the first natural language text classification model to obtain the convolution vector of each convolution kernel; Perform weighted average processing to obtain the word vector of the main complaint information.
在其中一个实施例中,确定模块606还用于当诊断信息与疾病集合中的疾病均不匹配时,确定待检病历的诊断信息为误诊;当诊断信息与疾病集合中任意疾病匹配时,确定待检病历的诊断信息未误诊。In one of the embodiments, the determining module 606 is further configured to determine that the diagnostic information of the medical record to be checked is misdiagnosed when the diagnostic information does not match the diseases in the disease set; when the diagnostic information matches any disease in the disease set, determine The diagnosis information of the medical record to be examined is not misdiagnosed.
关于病历质控装置的具体限定可以参见上文中对于病历质控方法的限定,在此不再赘述。上述病历质控装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific definition of the medical record quality control device, please refer to the above definition of the medical record quality control method, which will not be repeated here. Each module in the above medical record quality control device can be implemented in whole or in part by software, hardware and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
在其中一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性或易失性存储介质、内存储器。该非易失性或易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储待检病历、模型等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器 执行时以实现一种病历质控方法。In one of the embodiments, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 7. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile or volatile storage medium and internal memory. The non-volatile or volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store data such as medical records and models to be examined. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by the processor to realize a medical record quality control method.
本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 7 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行以下步骤:抽取待检病历中的主诉信息以及对应的症状关系属性对;A computer device, including a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the one or more processors perform the following steps: extracting medical records to be examined The main complaint information and the corresponding symptom relationship attribute pair;
用于将主诉信息和症状关系属性对输入训练好的第一自然语言处理模型,得到与主诉信息匹配的疾病集合;及Used to input the main complaint information and symptom relationship attributes into the trained first natural language processing model to obtain a set of diseases matching the main complaint information; and
将疾病集合与待检病历中的诊断信息进行匹配,根据匹配结果确定待检病历的诊断信息是否误诊。The disease set is matched with the diagnosis information in the medical record to be checked, and the diagnosis information in the medical record to be checked is determined according to the matching result whether it is misdiagnosed.
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤:In one of the embodiments, the processor further implements the following steps when executing the computer-readable instructions:
提取待检病历的主诉信息;Extract the main complaint information of the medical records to be examined;
将主诉信息输入训练好的第二自然语言处理模型,利用第二自然语言处理模型从主诉信息中抽取症状实体;Input the main complaint information into the trained second natural language processing model, and use the second natural language processing model to extract symptom entities from the main complaint information;
从主诉信息中查询症状实体的症状持续时间和症状部位,得到症状关系属性对;及Query the symptom entity's symptom duration and symptom location from the main complaint information to obtain symptom relationship attribute pairs; and
将症状关系属性对进行文本转换,得到文本形式的症状关系属性对。The symptom relationship attribute pair is converted into text to obtain the symptom relationship attribute pair in text form.
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤:In one of the embodiments, the processor further implements the following steps when executing the computer-readable instructions:
在主诉信息中匹配症状实体左右两侧最近的标点符号,确定症状实体所在的语句段;Match the nearest punctuation marks on the left and right sides of the symptom entity in the main complaint information to determine the sentence segment where the symptom entity is located;
将语句段中的各字符逐个与预设字典中的症状部位字符和症状时间字符进行匹配;Match each character in the sentence segment with the symptom part character and symptom time character in the preset dictionary one by one;
当存在与预设字典中的症状部位字符和症状时间字符匹配成功的字符时,从语句段中抽取匹配成功的字符;及When there are characters that successfully match the symptom part characters and symptom time characters in the preset dictionary, extract the successfully matched characters from the sentence segment; and
组合症状实体和抽取的字符,得到症状关系属性对。Combine the symptom entity and the extracted characters to obtain the symptom relationship attribute pair.
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤:In one of the embodiments, the processor further implements the following steps when executing the computer-readable instructions:
将主诉信息输入第一自然语言文本分类模型的嵌入层进行向量转换,得到主诉信息的词向量;Input the main complaint information into the embedding layer of the first natural language text classification model for vector conversion to obtain the word vector of the main complaint information;
将症状关系属性对输入第二自然语言文本分类模型的嵌入层进行词向量转换,得到症状关系属性对的词向量;Perform word vector conversion on the symptom relationship attribute to the embedding layer of the input second natural language text classification model to obtain the word vector of the symptom relationship attribute pair;
按照纵轴方向拼接主诉信息的词向量和症状关系属性对的词向量,得到拼接向量;及Splicing the word vector of the main complaint information and the word vector of the symptom relation attribute pair according to the vertical axis direction to obtain the splicing vector; and
将拼接向量输入至第一自然语言文本分类模型的嵌入层之后的网络层,输出与主诉信息匹配的疾病集合。The splicing vector is input to the network layer after the embedding layer of the first natural language text classification model, and the disease set matching the main complaint information is output.
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤:In one of the embodiments, the processor further implements the following steps when executing the computer-readable instructions:
第一自然语言文本分类模型的嵌入层中的各卷积核对主诉信息进行卷积,得到各卷积核的卷积向量;及Each convolution kernel in the embedding layer of the first natural language text classification model convolves the main complaint information to obtain the convolution vector of each convolution kernel; and
将各卷积向量进行加权平均处理,得到主诉信息的词向量。Perform weighted average processing on each convolution vector to obtain the word vector of the main complaint information.
在其中一个实施例中,处理器执行计算机可读指令时还实现以下步骤:In one of the embodiments, the processor further implements the following steps when executing the computer-readable instructions:
当诊断信息与疾病集合中的疾病均不匹配时,确定待检病历的诊断信息为误诊;及When the diagnosis information does not match the diseases in the disease set, it is determined that the diagnosis information in the medical record to be examined is misdiagnosed; and
当诊断信息与疾病集合中任意疾病匹配时,确定待检病历的诊断信息未误诊。When the diagnosis information matches any disease in the disease set, it is determined that the diagnosis information of the medical record to be examined is not misdiagnosed.
一个或多个存储有计算机可读指令的计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:One or more computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:
抽取待检病历中的主诉信息以及对应的症状关系属性对;Extract the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined;
用于将主诉信息和症状关系属性对输入训练好的第一自然语言处理模型,得到与主诉信息匹配的疾病集合;及Used to input the main complaint information and symptom relationship attributes into the trained first natural language processing model to obtain a set of diseases matching the main complaint information; and
将疾病集合与待检病历中的诊断信息进行匹配,根据匹配结果确定待检病历的诊断信息是否误诊。The disease set is matched with the diagnosis information in the medical record to be checked, and the diagnosis information in the medical record to be checked is determined according to the matching result whether it is misdiagnosed.
其中,该计算机可读存储介质可以是非易失性,也可以是易失性的。Wherein, the computer-readable storage medium may be non-volatile or volatile.
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:In one of the embodiments, when the computer-readable instructions are executed by the processor, the following steps are further implemented:
提取待检病历的主诉信息;Extract the main complaint information of the medical records to be examined;
将主诉信息输入训练好的第二自然语言处理模型,利用第二自然语言处理模型从主诉信息中抽取症状实体;Input the main complaint information into the trained second natural language processing model, and use the second natural language processing model to extract symptom entities from the main complaint information;
从主诉信息中查询症状实体的症状持续时间和症状部位,得到症状关系属性对;及Query the symptom entity's symptom duration and symptom location from the main complaint information to obtain symptom relationship attribute pairs; and
将症状关系属性对进行文本转换,得到文本形式的症状关系属性对。The symptom relationship attribute pair is converted into text to obtain the symptom relationship attribute pair in text form.
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:In one of the embodiments, when the computer-readable instructions are executed by the processor, the following steps are further implemented:
在主诉信息中匹配症状实体左右两侧最近的标点符号,确定症状实体所在的语句段;Match the nearest punctuation marks on the left and right sides of the symptom entity in the main complaint information to determine the sentence segment where the symptom entity is located;
将语句段中的各字符逐个与预设字典中的症状部位字符和症状时间字符进行匹配;Match each character in the sentence segment with the symptom part character and symptom time character in the preset dictionary one by one;
当存在与预设字典中的症状部位字符和症状时间字符匹配成功的字符时,从语句段中抽取匹配成功的字符;及When there are characters that successfully match the symptom part characters and symptom time characters in the preset dictionary, extract the successfully matched characters from the sentence segment; and
组合症状实体和抽取的字符,得到症状关系属性对。Combine the symptom entity and the extracted characters to obtain the symptom relationship attribute pair.
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:将主诉信息输入第一自然语言文本分类模型的嵌入层进行向量转换,得到主诉信息的词向量;In one of the embodiments, when the computer-readable instructions are executed by the processor, the following steps are further implemented: input the main complaint information into the embedding layer of the first natural language text classification model to perform vector conversion to obtain the word vector of the main complaint information;
将症状关系属性对输入第二自然语言文本分类模型的嵌入层进行词向量转换,得到症状关系属性对的词向量;Perform word vector conversion on the symptom relationship attribute to the embedding layer of the input second natural language text classification model to obtain the word vector of the symptom relationship attribute pair;
按照纵轴方向拼接主诉信息的词向量和症状关系属性对的词向量,得到拼接向量;及Splicing the word vector of the main complaint information and the word vector of the symptom relation attribute pair according to the vertical axis direction to obtain the splicing vector; and
将拼接向量输入至第一自然语言文本分类模型的嵌入层之后的网络层,输出与主诉信息匹配的疾病集合。The splicing vector is input to the network layer after the embedding layer of the first natural language text classification model, and the disease set matching the main complaint information is output.
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:In one of the embodiments, when the computer-readable instructions are executed by the processor, the following steps are further implemented:
第一自然语言文本分类模型的嵌入层中的各卷积核对主诉信息进行卷积,得到各卷积核的卷积向量;及Each convolution kernel in the embedding layer of the first natural language text classification model convolves the main complaint information to obtain the convolution vector of each convolution kernel; and
将各卷积向量进行加权平均处理,得到主诉信息的词向量。Perform weighted average processing on each convolution vector to obtain the word vector of the main complaint information.
在其中一个实施例中,计算机可读指令被处理器执行时还实现以下步骤:In one of the embodiments, when the computer-readable instructions are executed by the processor, the following steps are further implemented:
当诊断信息与疾病集合中的疾病均不匹配时,确定待检病历的诊断信息为误诊;及When the diagnosis information does not match the diseases in the disease set, it is determined that the diagnosis information in the medical record to be examined is misdiagnosed; and
当诊断信息与疾病集合中任意疾病匹配时,确定待检病历的诊断信息未误诊。When the diagnosis information matches any disease in the disease set, it is determined that the diagnosis information of the medical record to be examined is not misdiagnosed.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a computer-readable storage. In the medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (20)

  1. 一种病历质控方法,所述方法包括:A method for quality control of medical records, the method comprising:
    抽取待检病历中的主诉信息以及对应的症状关系属性对;Extract the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined;
    将所述主诉信息和所述症状关系属性对输入训练好的第一自然语言处理模型,得到与所述主诉信息匹配的疾病集合;及Input the trained first natural language processing model to the main complaint information and the symptom relationship attribute pair to obtain a set of diseases matching the main complaint information; and
    将所述疾病集合与所述待检病历中的诊断信息进行匹配,根据匹配结果确定所述待检病历的诊断信息是否误诊。The disease set is matched with the diagnosis information in the medical record to be examined, and it is determined whether the diagnosis information of the medical record to be examined is misdiagnosed according to the matching result.
  2. 根据权利要求1所述的方法,其中,所述抽取待检病历中的主诉信息以及对应的症状关系属性对,包括:The method according to claim 1, wherein the extracting the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined comprises:
    提取所述待检病历的主诉信息;Extract the main complaint information of the medical record to be examined;
    将所述主诉信息输入训练好的第二自然语言处理模型,利用所述第二自然语言处理模型从所述主诉信息中抽取症状实体;Input the main complaint information into a trained second natural language processing model, and use the second natural language processing model to extract symptom entities from the main complaint information;
    从所述主诉信息中查询所述症状实体的症状持续时间和症状部位,得到症状关系属性对;及Query the symptom duration and symptom location of the symptom entity from the main complaint information to obtain symptom relationship attribute pairs; and
    将所述症状关系属性对进行文本转换,得到文本形式的症状关系属性对。The symptom relationship attribute pair is text-converted to obtain the symptom relationship attribute pair in text form.
  3. 根据权利要求2所述的方法,其中,所述从所述主诉信息中查询所述症状实体的症状持续时间和症状部位,得到症状关系属性对,包括:The method according to claim 2, wherein the querying the symptom duration and symptom location of the symptom entity from the main complaint information to obtain a symptom relationship attribute pair comprises:
    在所述主诉信息中匹配所述症状实体左右两侧最近的标点符号,确定所述症状实体所在的语句段;Matching the nearest punctuation marks on the left and right sides of the symptom entity in the main complaint information to determine the sentence segment where the symptom entity is located;
    将所述语句段中的各字符逐个与预设字典中的症状部位字符和症状时间字符进行匹配;Match each character in the sentence segment with the symptom part character and symptom time character in the preset dictionary one by one;
    当存在与预设字典中的所述症状部位字符和所述症状时间字符匹配成功的字符时,从所述语句段中抽取匹配成功的字符;及When there is a character that successfully matches the symptom part character and the symptom time character in the preset dictionary, extract the successfully matched character from the sentence segment; and
    组合所述症状实体和抽取的字符,得到症状关系属性对。The symptom entity and the extracted characters are combined to obtain a symptom relationship attribute pair.
  4. 根据权利要求1所述的方法,其中,所述第一自然语言处理模型包括第一自然语言文本分类模型和第二自然语言文本分类模型;The method according to claim 1, wherein the first natural language processing model comprises a first natural language text classification model and a second natural language text classification model;
    将所述主诉信息和所述症状关系属性对输入训练好的第一自然语言处理模型,得到与所述主诉信息匹配的疾病集合,包括:The main complaint information and the symptom relationship attribute pair are input into the trained first natural language processing model to obtain a set of diseases matching the main complaint information, including:
    将所述主诉信息输入所述第一自然语言文本分类模型的嵌入层进行向量转换,得到所述主诉信息的词向量;Input the main complaint information into the embedding layer of the first natural language text classification model to perform vector conversion to obtain the word vector of the main complaint information;
    将所述症状关系属性对输入所述第二自然语言文本分类模型的嵌入层进行词向量转换,得到所述症状关系属性对的词向量;Performing word vector conversion on the symptom relationship attribute to the embedding layer input to the second natural language text classification model to obtain the word vector of the symptom relationship attribute pair;
    按照纵轴方向拼接所述主诉信息的词向量和所述症状关系属性对的词向量,得到拼接向量;及Splicing the word vector of the main complaint information and the word vector of the symptom relationship attribute pair according to the vertical axis direction to obtain a splicing vector; and
    将所述拼接向量输入至所述第一自然语言文本分类模型的嵌入层之后的网络层,输出 与所述主诉信息匹配的疾病集合。The splicing vector is input to a network layer after the embedding layer of the first natural language text classification model, and a disease set matching the main complaint information is output.
  5. 根据权利要求4所述的方法,其中,所述将所述主诉信息输入训练好的所述第一自然语言文本分类模型的嵌入层进行向量转换,得到所述主诉信息的词向量,包括:The method according to claim 4, wherein the inputting the main complaint information into the embedding layer of the trained first natural language text classification model for vector conversion to obtain the word vector of the main complaint information comprises:
    所述第一自然语言文本分类模型的嵌入层中的各卷积核对所述主诉信息进行卷积,得到各所述卷积核的卷积向量;及Each convolution kernel in the embedding layer of the first natural language text classification model convolves the main complaint information to obtain the convolution vector of each convolution kernel; and
    将各所述卷积向量进行加权平均处理,得到所述主诉信息的词向量。Perform weighted average processing on each of the convolution vectors to obtain the word vector of the main complaint information.
  6. 根据权利要求1所述的方法,其中,所述将所述疾病集合与所述待检病历中的诊断信息进行匹配,根据匹配结果确定所述待检病历的诊断信息是否误诊,包括:The method according to claim 1, wherein the matching the disease set with the diagnosis information in the medical record to be examined, and determining whether the diagnosis information of the medical record to be examined is misdiagnosed according to the matching result, comprises:
    当所述诊断信息与所述疾病集合中的疾病均不匹配时,确定所述待检病历的诊断信息为误诊;及When the diagnosis information does not match the diseases in the disease set, it is determined that the diagnosis information of the medical record to be checked is a misdiagnosis; and
    当所述诊断信息与所述疾病集合中任意疾病匹配时,确定所述待检病历的诊断信息未误诊。When the diagnosis information matches any disease in the disease set, it is determined that the diagnosis information of the medical record to be examined is not misdiagnosed.
  7. 根据权利要求4或5任一项所述的方法,其中,所述第一自然语言文本分类模型包括TextCNN模型;所述第二自然语言文本分类模型包括FastText模型。The method according to any one of claims 4 or 5, wherein the first natural language text classification model includes a TextCNN model; and the second natural language text classification model includes a FastText model.
  8. 一种病历质控装置,所述装置包括:A medical record quality control device, the device comprising:
    抽取模块,用于抽取待检病历中的主诉信息以及对应的症状关系属性对;The extraction module is used to extract the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined;
    处理模块,用于将所述主诉信息和所述症状关系属性对输入训练好的第一自然语言处理模型,得到与所述主诉信息匹配的疾病集合;及A processing module, configured to input the main complaint information and the symptom relationship attribute pair into the trained first natural language processing model to obtain a set of diseases matching the main complaint information; and
    确定模块,用于将所述疾病集合与所述待检病历中的诊断信息进行匹配,根据匹配结果确定所述待检病历的诊断信息是否误诊。The determining module is configured to match the disease set with the diagnostic information in the medical record to be checked, and determine whether the diagnostic information in the medical record to be checked is misdiagnosed according to the matching result.
  9. 一种计算机设备,包括存储器和一个或多个处理器,所述存储器存储有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device includes a memory and one or more processors. The memory stores computer readable instructions. When the computer readable instructions are executed by the one or more processors, the one or more The processor performs the following steps:
    抽取待检病历中的主诉信息以及对应的症状关系属性对;Extract the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined;
    将所述主诉信息和所述症状关系属性对输入训练好的第一自然语言处理模型,得到与所述主诉信息匹配的疾病集合;及Input the trained first natural language processing model to the main complaint information and the symptom relationship attribute pair to obtain a set of diseases matching the main complaint information; and
    将所述疾病集合与所述待检病历中的诊断信息进行匹配,根据匹配结果确定所述待检病历的诊断信息是否误诊。The disease set is matched with the diagnosis information in the medical record to be examined, and it is determined whether the diagnosis information of the medical record to be examined is misdiagnosed according to the matching result.
  10. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 9, wherein the processor further executes the following steps when executing the computer readable instruction:
    提取所述待检病历的主诉信息;Extract the main complaint information of the medical record to be examined;
    将所述主诉信息输入训练好的第二自然语言处理模型,利用所述第二自然语言处理模型从所述主诉信息中抽取症状实体;Input the main complaint information into a trained second natural language processing model, and use the second natural language processing model to extract symptom entities from the main complaint information;
    从所述主诉信息中查询所述症状实体的症状持续时间和症状部位,得到症状关系属性对;及Query the symptom duration and symptom location of the symptom entity from the main complaint information to obtain symptom relationship attribute pairs; and
    将所述症状关系属性对进行文本转换,得到文本形式的症状关系属性对。The symptom relationship attribute pair is text-converted to obtain the symptom relationship attribute pair in text form.
  11. 根据权利要求10所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 10, wherein the processor further executes the following steps when executing the computer-readable instructions:
    在所述主诉信息中匹配所述症状实体左右两侧最近的标点符号,确定所述症状实体所在的语句段;Matching the nearest punctuation marks on the left and right sides of the symptom entity in the main complaint information to determine the sentence segment where the symptom entity is located;
    将所述语句段中的各字符逐个与预设字典中的症状部位字符和症状时间字符进行匹配;Match each character in the sentence segment with the symptom part character and symptom time character in the preset dictionary one by one;
    当存在与预设字典中的所述症状部位字符和所述症状时间字符匹配成功的字符时,从所述语句段中抽取匹配成功的字符;及When there is a character that successfully matches the symptom part character and the symptom time character in the preset dictionary, extract the successfully matched character from the sentence segment; and
    组合所述症状实体和抽取的字符,得到症状关系属性对。The symptom entity and the extracted characters are combined to obtain a symptom relationship attribute pair.
  12. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 9, wherein the processor further executes the following steps when executing the computer readable instruction:
    将所述主诉信息和所述症状关系属性对输入训练好的第一自然语言处理模型,得到与所述主诉信息匹配的疾病集合,包括:The main complaint information and the symptom relationship attribute pair are input into the trained first natural language processing model to obtain a set of diseases matching the main complaint information, including:
    将所述主诉信息输入所述第一自然语言文本分类模型的嵌入层进行向量转换,得到所述主诉信息的词向量;Input the main complaint information into the embedding layer of the first natural language text classification model to perform vector conversion to obtain the word vector of the main complaint information;
    将所述症状关系属性对输入所述第二自然语言文本分类模型的嵌入层进行词向量转换,得到所述症状关系属性对的词向量;Performing word vector conversion on the symptom relationship attribute to the embedding layer input to the second natural language text classification model to obtain the word vector of the symptom relationship attribute pair;
    按照纵轴方向拼接所述主诉信息的词向量和所述症状关系属性对的词向量,得到拼接向量;及Splicing the word vector of the main complaint information and the word vector of the symptom relationship attribute pair according to the vertical axis direction to obtain a splicing vector; and
    将所述拼接向量输入至所述第一自然语言文本分类模型的嵌入层之后的网络层,输出与所述主诉信息匹配的疾病集合。The splicing vector is input to a network layer after the embedding layer of the first natural language text classification model, and a disease set matching the main complaint information is output.
  13. 根据权利要求12所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 12, wherein the processor further executes the following steps when executing the computer readable instruction:
    所述第一自然语言文本分类模型的嵌入层中的各卷积核对所述主诉信息进行卷积,得到各所述卷积核的卷积向量;及Each convolution kernel in the embedding layer of the first natural language text classification model convolves the main complaint information to obtain the convolution vector of each convolution kernel; and
    将各所述卷积向量进行加权平均处理,得到所述主诉信息的词向量。Perform weighted average processing on each of the convolution vectors to obtain the word vector of the main complaint information.
  14. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 9, wherein the processor further executes the following steps when executing the computer readable instruction:
    当所述诊断信息与所述疾病集合中的疾病均不匹配时,确定所述待检病历的诊断信息为误诊;及When the diagnosis information does not match the diseases in the disease set, it is determined that the diagnosis information of the medical record to be checked is a misdiagnosis; and
    当所述诊断信息与所述疾病集合中任意疾病匹配时,确定所述待检病历的诊断信息未误诊。When the diagnosis information matches any disease in the disease set, it is determined that the diagnosis information of the medical record to be examined is not misdiagnosed.
  15. 一个或多个存储有计算机可读指令的计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:One or more computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
    抽取待检病历中的主诉信息以及对应的症状关系属性对;Extract the main complaint information and the corresponding symptom relationship attribute pair in the medical record to be examined;
    将所述主诉信息和所述症状关系属性对输入训练好的第一自然语言处理模型,得到与所述主诉信息匹配的疾病集合;及Input the trained first natural language processing model to the main complaint information and the symptom relationship attribute pair to obtain a set of diseases matching the main complaint information; and
    将所述疾病集合与所述待检病历中的诊断信息进行匹配,根据匹配结果确定所述待检病历的诊断信息是否误诊。The disease set is matched with the diagnosis information in the medical record to be examined, and it is determined whether the diagnosis information of the medical record to be examined is misdiagnosed according to the matching result.
  16. 根据权利要求15所述的存储介质,其中,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 15, wherein the following steps are further performed when the computer-readable instructions are executed by the processor:
    提取所述待检病历的主诉信息;Extract the main complaint information of the medical record to be examined;
    将所述主诉信息输入训练好的第二自然语言处理模型,利用所述第二自然语言处理模型从所述主诉信息中抽取症状实体;Input the main complaint information into a trained second natural language processing model, and use the second natural language processing model to extract symptom entities from the main complaint information;
    从所述主诉信息中查询所述症状实体的症状持续时间和症状部位,得到症状关系属性对;及Query the symptom duration and symptom location of the symptom entity from the main complaint information to obtain symptom relationship attribute pairs; and
    将所述症状关系属性对进行文本转换,得到文本形式的症状关系属性对。The symptom relationship attribute pair is text-converted to obtain the symptom relationship attribute pair in text form.
  17. 根据权利要求16所述的存储介质,其中,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 16, wherein the following steps are further performed when the computer-readable instructions are executed by the processor:
    在所述主诉信息中匹配所述症状实体左右两侧最近的标点符号,确定所述症状实体所在的语句段;Matching the nearest punctuation marks on the left and right sides of the symptom entity in the main complaint information to determine the sentence segment where the symptom entity is located;
    将所述语句段中的各字符逐个与预设字典中的症状部位字符和症状时间字符进行匹配;Match each character in the sentence segment with the symptom part character and symptom time character in the preset dictionary one by one;
    当存在与预设字典中的所述症状部位字符和所述症状时间字符匹配成功的字符时,从所述语句段中抽取匹配成功的字符;及When there is a character that successfully matches the symptom part character and the symptom time character in the preset dictionary, extract the successfully matched character from the sentence segment; and
    组合所述症状实体和抽取的字符,得到症状关系属性对。The symptom entity and the extracted characters are combined to obtain a symptom relationship attribute pair.
  18. 根据权利要求15所述的存储介质,其中,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 15, wherein the following steps are further performed when the computer-readable instructions are executed by the processor:
    将所述主诉信息和所述症状关系属性对输入训练好的第一自然语言处理模型,得到与所述主诉信息匹配的疾病集合,包括:The main complaint information and the symptom relationship attribute pair are input into the trained first natural language processing model to obtain a set of diseases matching the main complaint information, including:
    将所述主诉信息输入所述第一自然语言文本分类模型的嵌入层进行向量转换,得到所述主诉信息的词向量;Input the main complaint information into the embedding layer of the first natural language text classification model to perform vector conversion to obtain the word vector of the main complaint information;
    将所述症状关系属性对输入所述第二自然语言文本分类模型的嵌入层进行词向量转换,得到所述症状关系属性对的词向量;Performing word vector conversion on the symptom relationship attribute to the embedding layer input to the second natural language text classification model to obtain the word vector of the symptom relationship attribute pair;
    按照纵轴方向拼接所述主诉信息的词向量和所述症状关系属性对的词向量,得到拼接向量;及Splicing the word vector of the main complaint information and the word vector of the symptom relationship attribute pair according to the vertical axis direction to obtain a splicing vector; and
    将所述拼接向量输入至所述第一自然语言文本分类模型的嵌入层之后的网络层,输出与所述主诉信息匹配的疾病集合。The splicing vector is input to a network layer after the embedding layer of the first natural language text classification model, and a disease set matching the main complaint information is output.
  19. 根据权利要求18所述的存储介质,其中,所述计算机可读指令被所述处理器执 行时还执行以下步骤:The storage medium according to claim 18, wherein the following steps are further performed when the computer-readable instructions are executed by the processor:
    所述第一自然语言文本分类模型的嵌入层中的各卷积核对所述主诉信息进行卷积,得到各所述卷积核的卷积向量;及Each convolution kernel in the embedding layer of the first natural language text classification model convolves the main complaint information to obtain the convolution vector of each convolution kernel; and
    将各所述卷积向量进行加权平均处理,得到所述主诉信息的词向量。Perform weighted average processing on each of the convolution vectors to obtain the word vector of the main complaint information.
  20. 根据权利要求16所述的存储介质,其中,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 16, wherein the following steps are further performed when the computer-readable instructions are executed by the processor:
    当所述诊断信息与所述疾病集合中的疾病均不匹配时,确定所述待检病历的诊断信息为误诊;及When the diagnosis information does not match the diseases in the disease set, it is determined that the diagnosis information of the medical record to be checked is a misdiagnosis; and
    当所述诊断信息与所述疾病集合中任意疾病匹配时,确定所述待检病历的诊断信息未误诊。When the diagnosis information matches any disease in the disease set, it is determined that the diagnosis information of the medical record to be examined is not misdiagnosed.
PCT/CN2020/099180 2020-06-16 2020-06-30 Medical-record quality control method, apparatus, computer device, and storage medium WO2021114620A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010548540.9 2020-06-16
CN202010548540.9A CN111710383A (en) 2020-06-16 2020-06-16 Medical record quality control method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021114620A1 true WO2021114620A1 (en) 2021-06-17

Family

ID=72540457

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099180 WO2021114620A1 (en) 2020-06-16 2020-06-30 Medical-record quality control method, apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN111710383A (en)
WO (1) WO2021114620A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117637092A (en) * 2024-01-24 2024-03-01 创智和宇信息技术股份有限公司 Medical record precoding method and device based on artificial intelligence model
CN117995346A (en) * 2024-04-07 2024-05-07 北京惠每云科技有限公司 Medical record quality control optimization method and device, electronic equipment and storage medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111863174B (en) * 2020-07-27 2023-10-20 北京颐圣智能科技有限公司 Medical record quality assessment method and computing equipment
CN111883222B (en) * 2020-09-28 2020-12-22 平安科技(深圳)有限公司 Text data error detection method and device, terminal equipment and storage medium
CN112349423B (en) * 2020-11-04 2024-05-24 吾征智能技术(北京)有限公司 BiMPM method-based mouth drying information matching system
CN112669928B (en) * 2021-01-06 2023-01-10 腾讯科技(深圳)有限公司 Structured information construction method and device, computer equipment and storage medium
CN113808663A (en) * 2021-09-01 2021-12-17 基诺莱(重庆)生物技术有限公司 Artificial intelligence-based gene variation site matching method, system and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140181128A1 (en) * 2011-03-07 2014-06-26 Daniel J. RISKIN Systems and Methods for Processing Patient Data History
CN109390058A (en) * 2018-09-28 2019-02-26 湖南智腾安控科技有限公司 A kind of method for building up of case history Computer Aided Analysis System and the system
CN109949929A (en) * 2019-03-19 2019-06-28 挂号网(杭州)科技有限公司 A kind of assistant diagnosis system based on the extensive case history of deep learning
CN110162779A (en) * 2019-04-04 2019-08-23 北京百度网讯科技有限公司 Appraisal procedure, device and the equipment of quality of case history
CN110910976A (en) * 2019-10-12 2020-03-24 平安国际智慧城市科技股份有限公司 Medical record detection method, device, equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140164003A1 (en) * 2012-12-12 2014-06-12 Debra Thesman Methods for optimizing managed healthcare administration and achieving objective quality standards
CN108447534A (en) * 2018-05-18 2018-08-24 灵玖中科软件(北京)有限公司 A kind of electronic health record data quality management method based on NLP
CN110136788B (en) * 2019-05-14 2021-08-17 清华大学 Medical record quality inspection method, device, equipment and storage medium based on automatic detection
CN110223742A (en) * 2019-06-14 2019-09-10 中南大学 The clinical manifestation information extraction method and equipment of Chinese electronic health record data
CN110277149A (en) * 2019-06-28 2019-09-24 北京百度网讯科技有限公司 Processing method, device and the equipment of electronic health record
CN110491499A (en) * 2019-07-10 2019-11-22 厦门大学 Clinical aid decision-making method and system towards mark electronic health record

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140181128A1 (en) * 2011-03-07 2014-06-26 Daniel J. RISKIN Systems and Methods for Processing Patient Data History
CN109390058A (en) * 2018-09-28 2019-02-26 湖南智腾安控科技有限公司 A kind of method for building up of case history Computer Aided Analysis System and the system
CN109949929A (en) * 2019-03-19 2019-06-28 挂号网(杭州)科技有限公司 A kind of assistant diagnosis system based on the extensive case history of deep learning
CN110162779A (en) * 2019-04-04 2019-08-23 北京百度网讯科技有限公司 Appraisal procedure, device and the equipment of quality of case history
CN110910976A (en) * 2019-10-12 2020-03-24 平安国际智慧城市科技股份有限公司 Medical record detection method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117637092A (en) * 2024-01-24 2024-03-01 创智和宇信息技术股份有限公司 Medical record precoding method and device based on artificial intelligence model
CN117637092B (en) * 2024-01-24 2024-04-23 创智和宇信息技术股份有限公司 Medical record precoding method and device based on artificial intelligence model
CN117995346A (en) * 2024-04-07 2024-05-07 北京惠每云科技有限公司 Medical record quality control optimization method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111710383A (en) 2020-09-25

Similar Documents

Publication Publication Date Title
WO2021114620A1 (en) Medical-record quality control method, apparatus, computer device, and storage medium
WO2020237869A1 (en) Question intention recognition method and apparatus, computer device, and storage medium
WO2020232877A1 (en) Question answer selection method and apparatus, computer device, and storage medium
WO2021068321A1 (en) Information pushing method and apparatus based on human-computer interaction, and computer device
WO2021047286A1 (en) Text processing model training method, and text processing method and apparatus
WO2022007823A1 (en) Text data processing method and device
CN109446302B (en) Question-answer data processing method and device based on machine learning and computer equipment
CN112036154B (en) Electronic medical record generation method and device based on inquiry dialogue and computer equipment
WO2021000497A1 (en) Retrieval method and apparatus, and computer device and storage medium
WO2021184571A1 (en) Dynamic form generation method, apparatus, computer device, and storage medium
CN110504028A (en) A kind of disease way of inquisition, device, system, computer equipment and storage medium
CN109635122A (en) Intelligent disease inquiry method, apparatus, equipment and storage medium
CN113157863B (en) Question-answer data processing method, device, computer equipment and storage medium
WO2022105118A1 (en) Image-based health status identification method and apparatus, device and storage medium
WO2020228636A1 (en) Training method and apparatus, dialogue processing method and system, and medium
WO2022001724A1 (en) Data processing method and device
CN112418059B (en) Emotion recognition method and device, computer equipment and storage medium
CN110598210B (en) Entity recognition model training, entity recognition method, entity recognition device, entity recognition equipment and medium
WO2022147910A1 (en) Medical record information verification method and apparatus, and computer device and storage medium
WO2021164301A1 (en) Medical text structuring method and apparatus, computer device and storage medium
WO2021159748A1 (en) Model compression method and apparatus, computer device, and storage medium
CN113707299A (en) Auxiliary diagnosis method and device based on inquiry session and computer equipment
WO2021052149A1 (en) Intelligent remote assistance method and apparatus, computer device and storage medium
WO2020192523A1 (en) Translation quality detection method and apparatus, machine translation system, and storage medium
CN113836192B (en) Parallel corpus mining method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20897713

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20897713

Country of ref document: EP

Kind code of ref document: A1