WO2021159759A1 - Method and apparatus for electronic medical record structuring, computer device and storage medium - Google Patents

Method and apparatus for electronic medical record structuring, computer device and storage medium Download PDF

Info

Publication number
WO2021159759A1
WO2021159759A1 PCT/CN2020/125146 CN2020125146W WO2021159759A1 WO 2021159759 A1 WO2021159759 A1 WO 2021159759A1 CN 2020125146 W CN2020125146 W CN 2020125146W WO 2021159759 A1 WO2021159759 A1 WO 2021159759A1
Authority
WO
WIPO (PCT)
Prior art keywords
medical record
text
electronic medical
sentence
sub
Prior art date
Application number
PCT/CN2020/125146
Other languages
French (fr)
Chinese (zh)
Inventor
周晓峰
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021159759A1 publication Critical patent/WO2021159759A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Definitions

  • This application relates to the technical field of intelligent decision-making, and in particular to a method, device, computer equipment, and storage medium for structuring electronic medical records.
  • the medical record is the original record of the patient's diagnosis and treatment in the hospital. It contains the home page, the course record, the examination results, the doctor's order, the operation record, the nursing record and so on.
  • Electronic medical records not only refer to static medical record information, but also include related services provided.
  • Electronic medical records are information about individuals' life-long health status and medical care behaviors that are managed electronically, and involve all process information in the collection, storage, transmission, processing, and utilization of patient information.
  • the structuring of electronic medical records can efficiently extract the key information in the medical records by extracting the disease entities, drug entities, body parts entities, etc. from the electronic medical records through the neural network structure, effectively assisting doctors in core data analysis and data Search.
  • the invention realizes that the existing electronic medical records have different lengths.
  • the main purpose of this application is to provide an electronic medical record structuring method, device, computer equipment, and storage medium to solve the problem that the truncation of the electronic medical record affects the accuracy of the structure of the sentence around the truncation.
  • this application provides a method for structuring an electronic medical record, which includes the following steps:
  • the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts
  • the sentence vector in each target medical record text is input into a classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, the classification The model is based on two-way recurrent neural network model training;
  • the classification label of each sentence is obtained.
  • This application also provides an electronic medical record structuring device, including:
  • the first obtaining unit is used to obtain the electronic medical record text and the number of sentences in the electronic medical record text;
  • the detection unit is configured to detect whether the number of sentences in the electronic medical record text exceeds a preset threshold
  • the first truncation unit is used for truncating the electronic medical record text if it exceeds, to obtain multiple electronic medical record sub-texts;
  • the first introduction unit is used to introduce each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;
  • the first mapping unit is used to map each sentence in the target medical record text to a sentence vector of a fixed dimension
  • the first calculation unit is configured to input the sentence vector in each target medical record text into the classification model for calculation according to the sequence of the sentence corresponding to the sentence vector in the target medical record text to obtain the first Output; wherein, the classification model is based on a two-way recurrent neural network model training;
  • the second calculation unit is configured to obtain the classification label of each sentence according to the first output.
  • the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the steps of a method for structuring an electronic medical record are implemented:
  • the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts
  • the sentence vector in each target medical record text is input into a classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, the classification The model is based on two-way recurrent neural network model training;
  • the classification label of each sentence is obtained.
  • This application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of an electronic medical record structuring method are realized:
  • the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts
  • the sentence vector in each target medical record text is input into a classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, the classification The model is based on two-way recurrent neural network model training;
  • the classification label of each sentence is obtained.
  • the electronic medical record structuring method, device, computer equipment and storage medium provided in this application introduce a part of the context at the truncation point according to preset rules, and input the introduced context and the truncated electronic medical record text into the classification model together.
  • the classification model Based on two-way cyclic neural network training, it can extract contextual information, and then calculate the classification of each sentence through SOFTMAX, which can effectively improve the structural accuracy of the sentence around the truncation.
  • FIG. 1 is a schematic diagram of the steps of a method for structuring an electronic medical record in an embodiment of the present application
  • FIG. 2 is a structural block diagram of an electronic medical record structuring device in an embodiment of the present application.
  • FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.
  • an embodiment of the present application provides a method for structuring an electronic medical record, including:
  • Step S1 obtaining the electronic medical record text and the number of sentences in the electronic medical record text
  • Step S2 detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold
  • Step S3 if it exceeds, the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts;
  • Step S4 introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;
  • Step S5 Map each sentence in the target medical record text to a sentence vector of a fixed dimension
  • Step S6 the sentence vector in each target medical record text is input into the classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output;
  • the classification model is based on training of a bidirectional cyclic neural network model;
  • Step S7 Obtain the classification label of each sentence according to the first output.
  • the electronic medical record text is acquired, and some preprocessing can be performed on the acquired electronic medical record text, such as text preprocessing and data cleaning through tools such as numpy, pandas, and jieba, including Chinese word segmentation, Remove stop words, remove useless symbols, etc., and desensitize the privacy in the electronic medical record text, and then remove patient privacy.
  • Privacy includes: name, bed number, hospital number, address and other key private information that can be easily identified by others , To obtain the number of sentences in the electronic medical record text after the above processing.
  • the context is introduced for each electronic medical record sub-text according to preset rules. For example, if one electronic medical record text is truncated into three electronic medical record sub-texts, they are as follows: The first electronic medical record sub-text, the second electronic medical record sub-text and the third electronic medical record sub-text, a part of the sentence in the second electronic medical record sub-text is introduced at the truncation of the first electronic medical record sub-text, in the second electronic medical record sub-text Introduce a part of the sentence of the first electronic medical record sub-text at the beginning of the truncation, introduce a part of the sentence of the third electronic medical record sub-text at the end of the second electronic medical record sub-text, and introduce the first sentence of the third electronic medical record sub-text at the truncation of the third electronic medical record sub-text 2. A part of the sentence in the sub-text of the electronic medical record.
  • each sentence in the target medical record text is mapped to a sentence vector of a fixed dimension.
  • the encoder can be passed through a neural network (convolutional neural network, recurrent neural network, transormer, etc.) (Encoder), by mapping the sentence to a vector of fixed dimensions, we can get the vector representation of a single sentence through the neural network.
  • a neural network convolutional neural network, recurrent neural network, transormer, etc.
  • the sentences in the medical record are not independent of each other but context-related.
  • the part describing the treatment process is usually composed of multiple sentences, and the context of a sentence describing the treatment is also the probability of describing the treatment process It is more likely than describing the user’s past medical history. Therefore, only a single sentence classification of the text will not achieve good results. It is necessary to include all the context information and input the sentence vectors into the classification model in order.
  • This classification The model is trained based on a two-way cyclic neural network model. After the forward and backward calculations of the classification model, each sentence can better obtain contextual information and effectively improve the accuracy of classification.
  • the classification model can Each sentence is classified into one of basic information, personal history, family history, past history, current medical history, chief complaint, examination, diagnosis, treatment, summary, and others.
  • the classification label of each sentence is obtained according to the first output.
  • the first output of each sentence vector is calculated by SOFTMAX, and SOFTMAX can map a K-dimensional arbitrary real number vector into Another K-dimensional real number vector, where each element in the vector has a value between (0, 1).
  • the function expression of SOFTMAX is: Among them, K represents the number of categories, j represents a category in K categories, j ⁇ (0, K], z j represents the value of the category.
  • a part of the context is introduced at the truncation according to the preset rules, and the introduced context and the truncated electronic medical record text are input together into the classification model to obtain the first output.
  • the classification model is based on a two-way cyclic neural network training.
  • the context information can be extracted, and the classification label of each sentence can be obtained according to the first output, which can effectively improve the structural accuracy of the sentence at the truncation of the electronic medical record.
  • step S7 of the step of obtaining the classification label of each sentence according to the first output includes:
  • Step S71 Input the first output of each sentence vector into a CRF (conditional random field, conditional random field) network and/or a self-attention network to obtain a second output;
  • CRF condition random field, conditional random field
  • step S72 the second output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
  • inputting the first output into the CRF network and/or the self-attention network can further improve the influence of the context information of the classification model and strengthen the contextual connection between sentences.
  • the SOFTMAX calculation can be performed between the first output to obtain the classification label of each sentence.
  • the step S5 of mapping each sentence in the target medical record text to a sentence vector of a fixed dimension includes:
  • Step S51 input each sentence in the target medical record text into the neural network
  • Step S52 Map each sentence to a sentence vector of a fixed dimension through the encoder of the neural network.
  • a sentence is mapped to a vector of a fixed dimension, and we can obtain a vector representation of a single sentence through the neural network.
  • a neural network which may be a convolutional neural network, a cyclic neural network, a transformer, etc.
  • a sentence is mapped to a vector of a fixed dimension, and we can obtain a vector representation of a single sentence through the neural network.
  • the transformer model as an example.
  • the first sub-layer is the multi-head attention layer.
  • the second is a simple fully connected layer. A residual connection is used between each sub-layer layer.
  • the step S4 of introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text includes:
  • Step S41 detecting the position of each sub-text of the electronic medical record in the text of the electronic medical record
  • Step S42 when the electronic medical record sub-text starts at the position of the electronic medical record text, introduce a preset number of sentences at the beginning of the next electronic medical record sub-text at the truncation of the electronic medical record sub-text;
  • Step S43 When the electronic medical record text is in the middle of the position of the electronic medical record text, introduce a preset number of sentences at the end of the last electronic medical record sub-text at the beginning of the electronic medical record sub-text.
  • the ending truncation of the electronic medical record sub-text introduces a preset number of sentences at the beginning of the next electronic medical record sub-text;
  • Step S44 when the electronic medical record sub-text ends at the position of the electronic medical record text, introduce a preset number of sentences at the end of the last electronic medical record sub-text at the truncation of the electronic medical record sub-text.
  • a part of sentences is introduced for each electronic medical record sub-text.
  • the number of sentences in an electronic medical record text is 120, and the classification model can only support 50 sentences at a time, and the electronic self-medical record sub-texts can be evened according to the number of sentences.
  • the end of the first part can be introduced into the first 10 sentences at the beginning of the second part to form the first target medical record text; the end of the first part can be introduced at the beginning of the second part At the end of the second part, the first 10 sentences at the beginning of the third part are introduced to form the second target medical record sub-file.
  • the specific number of sentences introduced in each electronic medical record sub-text can be set in advance according to needs. Certainly.
  • a sentence in the context is introduced for each electronic medical record sub-text, and then input to the classification model for classification, and the accuracy of classification of each sentence is improved through the connection between the contexts.
  • the method includes:
  • Step S2A if it does not exceed, map each sentence in the electronic medical record text to a sentence vector of a fixed dimension;
  • Step S2B input the sentence vectors in each electronic medical record text into the classification model in order for calculation to obtain a third output;
  • step S2C the third output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
  • the sentence vector of each sentence is directly input into the classification model in order for calculation, and then the classification label of each sentence is calculated by the SOFTMAX function.
  • the sentence vector in each target medical record text is input into the classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain the first Before step S6 of an output, it includes:
  • Step S6a Obtain case samples in the training data set, and each sentence in the medical record sample has a correct classification label
  • Step S6b truncating the medical record sample to obtain multiple medical record sub-samples
  • step S6c each of the medical record sub-samples is introduced into the context through a preset rule to obtain a target medical record sample
  • Step S6d mapping each sentence in the target medical record sample to a sentence vector of a fixed dimension
  • Step S6e input the sentence vectors in each target medical record text into the bidirectional recurrent neural network model in order for calculation to obtain training output;
  • Step S6f calculating the training output through SOFTMAX to obtain the predicted output
  • Step S6g Calculate the loss value of each sentence in the medical record sub-sample by using a loss function
  • step S6h the classification model parameters are determined according to the loss value, and the training of the classification model is completed.
  • the loss value of each sentence in the medical record subsample is calculated.
  • the medical record subsample introduces context according to certain rules, and the context follows each sentence in the medical record subsample and is input to the bidirectional recurrent neural network. , Extract the context information, and get the output of each sentence.
  • the output of each sentence is calculated through SOFTMAX to obtain the expected output of each sentence, and then only the loss value of each sentence in the medical record subsample is calculated through the loss function, and the smallest loss value is selected
  • the corresponding model parameters are used as the final model parameters to complete the training of the classification model.
  • each medical record sub-sample introduces context, but the introduced context only provides context information, and does not participate in the calculation of the loss value and the final classification. Specifically, through the cross entropy function Calculate the loss value of each sentence in the medical record subsample, where y is the expected output of each sentence in the medical record subsample, For its correct classification label.
  • the electronic medical record structuring method provided in this application can be used in the blockchain field.
  • the trained classification model is stored in the blockchain network.
  • the electronic medical record text can also be stored in the blockchain network.
  • the blockchain is New application modes of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • Blockchain Network (Blockchain Network) refers to the collection of a series of nodes that incorporate new blocks into the blockchain through consensus.
  • the underlying platform of the blockchain can include processing modules such as user management, basic services, smart contracts, and operation monitoring.
  • the user management module is responsible for the identity information management of all blockchain participants, including the maintenance of public and private key generation (account management), key management, and maintenance of the correspondence between the user’s real identity and the blockchain address (authority management), etc.
  • authorization supervise and audit certain real-identity transactions, and provide risk control rule configuration (risk control audit); basic service modules are deployed on all blockchain node devices to verify the validity of business requests, After completing the consensus on the valid request, it is recorded on the storage.
  • the basic service For a new business request, the basic service first performs interface adaptation analysis and authentication processing (interface adaptation), and then encrypts the business information through the consensus algorithm (consensus management), After encryption, it is completely and consistently transmitted to the shared ledger (network communication), and recorded and stored; the smart contract module is responsible for contract registration and issuance, contract triggering and contract execution.
  • interface adaptation interface adaptation
  • consensus algorithm consensus algorithm
  • the smart contract module is responsible for contract registration and issuance, contract triggering and contract execution.
  • the operation monitoring module is mainly responsible for the deployment of the product release process , Configuration modification, contract settings, cloud adaptation, and visual output of real-time status during product operation, such as: alarms, monitoring network conditions, monitoring node equipment health status, etc.
  • the structuring method, device, computer equipment, and storage medium of electronic medical records provided in this application can be applied in the field of smart medical care to accelerate the construction of digital medical care, thereby promoting the construction of smart cities.
  • an embodiment of the present application further provides an electronic medical record structuring device, including:
  • the first obtaining unit 10 is configured to obtain the electronic medical record text and the number of sentences in the electronic medical record text;
  • the detection unit 20 is configured to detect whether the number of sentences in the electronic medical record text exceeds a preset threshold
  • the first truncation unit 30 is used for truncating the electronic medical record text if it exceeds, to obtain a plurality of electronic medical record sub-texts;
  • the first introduction unit 40 is configured to introduce each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;
  • the first mapping unit 50 is configured to map each sentence in the target medical record text to a sentence vector of a fixed dimension
  • the first calculation unit 60 is configured to input the sentence vector in each target medical record text into the classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain the first One output; wherein, the classification model is based on a two-way recurrent neural network model training;
  • the second calculation unit 70 is configured to obtain the classification label of each sentence according to the first output.
  • the second calculation unit 70 includes:
  • a first input subunit configured to input the first output of each sentence vector into a CRF network and/or a self-attention network to obtain a second output;
  • the calculation subunit is configured to perform SOFTMAX calculation on the second output of each sentence vector to obtain the classification label of each sentence.
  • the first mapping unit 50 includes:
  • the second input subunit is used to input each sentence in the target medical record text into the neural network
  • the mapping subunit is used to map each sentence to a sentence vector of a fixed dimension through the encoder of the neural network.
  • the first introduction unit 40 includes:
  • the detection subunit is used to detect the position of each of the electronic medical record sub-texts in the electronic medical record text
  • the first introduction sub-unit is used for when the electronic medical record sub-text starts at the position of the electronic medical record text, introduce the preset of the beginning part of the next electronic medical record sub-text at the truncation of the electronic medical record sub-text Number of sentences
  • the second introduction sub-unit is used to introduce a preset end portion of the previous electronic medical record sub-text at the beginning and truncation of the electronic medical record sub-text when the electronic medical record text is in the middle of the position of the electronic medical record text Number of sentences, introducing a preset number of sentences at the beginning of the next electronic medical record sub-text at the end truncation of the electronic medical record sub-text;
  • the third introduction sub-unit is used to introduce the preset ending part of the last electronic medical record sub-text at the truncation of the electronic medical record sub-text when the electronic medical record sub-text ends at the position of the electronic medical record text Number of sentences.
  • the electronic medical record structuring device further includes:
  • the second mapping unit is used to map each sentence in the electronic medical record text to a sentence vector of a fixed dimension if it is not exceeded;
  • the third calculation unit is configured to input the sentence vectors in each electronic medical record text into the classification model in order for calculation to obtain a third output;
  • the fourth calculation unit is configured to perform SOFTMAX calculation on the third output of each sentence vector to obtain the classification label of each sentence.
  • the electronic medical record structuring device further includes:
  • the second acquiring unit is used to acquire case samples in the training data set, where each sentence in the medical record sample has a correct classification label
  • the second truncation unit is used for truncating the medical record sample to obtain multiple medical record sub-samples
  • the second introduction unit is used to introduce each of the medical record sub-samples into the context through preset rules to obtain the target medical record sample;
  • the third mapping unit is used to map each sentence in the target medical record sample to a sentence vector of a fixed dimension
  • a fifth calculation unit configured to sequentially input the sentence vectors in each target medical record text into the bidirectional cyclic neural network model for calculation to obtain training output;
  • the sixth calculation unit is used to calculate the training output through SOFTMAX to obtain the predicted output
  • the seventh calculation unit is used to calculate the loss value of each sentence in the medical record sub-sample by using a loss function
  • the determining unit is used to determine the parameters of the classification model according to the loss value to complete the training of the classification model.
  • the seventh calculation unit includes:
  • the calculation subunit is used to calculate the loss value of each sentence in the medical record subsample through a cross entropy function; the formula of the cross entropy function is: The y is the predicted output, For the correct classification label.
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store electronic medical record data and so on.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a structuring method of electronic medical records.
  • FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the above-mentioned storage medium may be a non-volatile storage medium or a volatile storage medium.
  • a computer program is stored thereon, and when the computer program is executed by a processor, a method for structuring an electronic medical record is realized.
  • the electronic medical record structuring method, device, computer equipment, and storage medium provided in the embodiments of this application introduce a part of the context at the truncation place according to preset rules, and combine the introduced context and the truncated electronic medical record text Enter the classification model together.
  • the classification model is based on two-way recurrent neural network training, which can extract context information, and then calculate the classification of each sentence through SOFTMAX, which can effectively improve the accuracy of the structure of the sentence around the truncation.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

A method and an apparatus for electronic medical record structuring, a computer device, and a storage medium, relating to the field of artificial intelligence, and for use in the field of smart medicine. The method comprises: acquiring an electronic medical record text and the number of sentences in the electronic medical record text (S1); detecting whether the number of sentences in the electronic medical record text surpasses a pre-set threshold (S2); if the number of sentences surpasses the threshold, then truncating the electronic medical record text to obtain a plurality of electronic medical record sub-texts (S3); incorporating each electronic medical record sub-text into preceding and following texts by means of a pre-set rule to obtain a target medical record text (S4); mapping each sentence in the target medical record text as a fixed-dimensional sentence vector (S5); inputting each sentence vector of the target medical record text sequentially into a classification model for calculation to obtain a first output; the classification model being constructed on the basis of bidirectional recurrent neural network training (S6); on the basis of the first output, obtaining a classification tag for each sentence (S7). The present method is able to improve the accuracy of sentence structuring at truncation sites.

Description

电子病历结构化方法、装置、计算机设备和存储介质Electronic medical record structuring method, device, computer equipment and storage medium
本申请要求于2020年09月04日提交中国专利局、申请号为202010922768.X,发明名称为“电子病历结构化方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on September 4, 2020, the application number is 202010922768.X, and the invention title is "Electronic Medical Record Structured Method, Device, Computer Equipment and Storage Medium", all of which The content is incorporated in this application by reference.
技术领域Technical field
本申请涉及智能决策的技术领域,特别涉及一种电子病历结构化方法、装置、计算机设备和存储介质。This application relates to the technical field of intelligent decision-making, and in particular to a method, device, computer equipment, and storage medium for structuring electronic medical records.
背景技术Background technique
病历是病人在医院诊断治疗全过程的原始记录,它包含有首页、病程记录、检查检验结果、医嘱、手术记录、护理记录等等。电子病历不仅指静态病历信息,还包括提供的相关服务。电子病历是以电子化方式管理的有关个人终生健康状态和医疗保健行为的信息,涉及病人信息的采集、存储、传输、处理和利用的所有过程信息。而电子病历结构化通过对电子病历中的疾病实体、药物实体、身体部位实体等通过神经网络进行结构化提取,能够高效地提取出病历中的关键信息,有效地辅助医生进行核心数据分析以及数据检索。发明意识到,现有的电子病历长短不一,当电子病历过长时,需要进行截断,但由于截断过程比较具有随机性,因此可能导致截断处的数据会丢失部分上下文信息,影响截断处周围的句子的结构化的准确性。The medical record is the original record of the patient's diagnosis and treatment in the hospital. It contains the home page, the course record, the examination results, the doctor's order, the operation record, the nursing record and so on. Electronic medical records not only refer to static medical record information, but also include related services provided. Electronic medical records are information about individuals' life-long health status and medical care behaviors that are managed electronically, and involve all process information in the collection, storage, transmission, processing, and utilization of patient information. The structuring of electronic medical records can efficiently extract the key information in the medical records by extracting the disease entities, drug entities, body parts entities, etc. from the electronic medical records through the neural network structure, effectively assisting doctors in core data analysis and data Search. The invention realizes that the existing electronic medical records have different lengths. When the electronic medical records are too long, they need to be truncated. However, because the truncation process is relatively random, it may cause the data at the truncation site to lose some context information, affecting the surrounding area of the truncation site. The accuracy of the structure of the sentences.
技术问题technical problem
本申请的主要目的为提供一种电子病历结构化方法、装置、计算机设备和存储介质,解决电子病历因截断影响截断处周围的句子的结构化的准确性的问题。The main purpose of this application is to provide an electronic medical record structuring method, device, computer equipment, and storage medium to solve the problem that the truncation of the electronic medical record affects the accuracy of the structure of the sentence around the truncation.
技术解决方案Technical solutions
为实现上述目的,本申请提供了一种电子病历结构化方法,包括以下步骤:In order to achieve the above objective, this application provides a method for structuring an electronic medical record, which includes the following steps:
获取电子病历文本,以及所述电子病历文本的句子数;Acquiring the electronic medical record text and the number of sentences in the electronic medical record text;
检测所述电子病历文本的句子数是否超过预设阀值;Detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold;
若超过,则将所述电子病历文本进行截断,得到多个电子病历子文本;If it exceeds, the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts;
将每个所述电子病历子文本通过预设规则引入上下文,得到目标病历文本;Introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;
将所述目标病历文本中的每个句子映射为固定维度的句向量;Mapping each sentence in the target medical record text to a sentence vector of a fixed dimension;
将每个所述目标病历文本中的所述句向量按照所述句向量所对应的句子在所述目标病历文本中的顺序输入至分类模型中进行计算,得到第一输出;其中,所述分类模型基于双向循环神经网络模型训练而成;The sentence vector in each target medical record text is input into a classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, the classification The model is based on two-way recurrent neural network model training;
根据所述第一输出得到每个句子的分类标签。According to the first output, the classification label of each sentence is obtained.
本申请还提供一种电子病历结构化装置,包括:This application also provides an electronic medical record structuring device, including:
第一获取单元,用于获取电子病历文本,以及所述电子病历文本的句子数;The first obtaining unit is used to obtain the electronic medical record text and the number of sentences in the electronic medical record text;
检测单元,用于检测所述电子病历文本的句子数是否超过预设阀值;The detection unit is configured to detect whether the number of sentences in the electronic medical record text exceeds a preset threshold;
第一截断单元,用于若超过,则将所述电子病历文本进行截断,得到多个电子病历子文本;The first truncation unit is used for truncating the electronic medical record text if it exceeds, to obtain multiple electronic medical record sub-texts;
第一引入单元,用于将每个所述电子病历子文本通过预设规则引入上下文,得到目标病历文本;The first introduction unit is used to introduce each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;
第一映射单元,用于将所述目标病历文本中的每个句子映射为固定维度的句向量;The first mapping unit is used to map each sentence in the target medical record text to a sentence vector of a fixed dimension;
第一计算单元,用于将每个所述目标病历文本中的所述句向量按照所述句向量所对应的句子在所述目标病历文本中的顺序输入至分类模型中进行计算,得到第一输出;其中, 所述分类模型基于双向循环神经网络模型训练而成;The first calculation unit is configured to input the sentence vector in each target medical record text into the classification model for calculation according to the sequence of the sentence corresponding to the sentence vector in the target medical record text to obtain the first Output; wherein, the classification model is based on a two-way recurrent neural network model training;
第二计算单元,用于根据所述第一输出得到每个句子的分类标签。The second calculation unit is configured to obtain the classification label of each sentence according to the first output.
本申请还提供一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器执行所述计算机程序时实现一种电子病历结构化方法的步骤:The present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the steps of a method for structuring an electronic medical record are implemented:
获取电子病历文本,以及所述电子病历文本的句子数;Acquiring the electronic medical record text and the number of sentences in the electronic medical record text;
检测所述电子病历文本的句子数是否超过预设阀值;Detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold;
若超过,则将所述电子病历文本进行截断,得到多个电子病历子文本;If it exceeds, the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts;
将每个所述电子病历子文本通过预设规则引入上下文,得到目标病历文本;Introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;
将所述目标病历文本中的每个句子映射为固定维度的句向量;Mapping each sentence in the target medical record text to a sentence vector of a fixed dimension;
将每个所述目标病历文本中的所述句向量按照所述句向量所对应的句子在所述目标病历文本中的顺序输入至分类模型中进行计算,得到第一输出;其中,所述分类模型基于双向循环神经网络模型训练而成;The sentence vector in each target medical record text is input into a classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, the classification The model is based on two-way recurrent neural network model training;
根据所述第一输出得到每个句子的分类标签。According to the first output, the classification label of each sentence is obtained.
本申请还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现一种电子病历结构化方法的步骤:This application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of an electronic medical record structuring method are realized:
获取电子病历文本,以及所述电子病历文本的句子数;Acquiring the electronic medical record text and the number of sentences in the electronic medical record text;
检测所述电子病历文本的句子数是否超过预设阀值;Detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold;
若超过,则将所述电子病历文本进行截断,得到多个电子病历子文本;If it exceeds, the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts;
将每个所述电子病历子文本通过预设规则引入上下文,得到目标病历文本;Introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;
将所述目标病历文本中的每个句子映射为固定维度的句向量;Mapping each sentence in the target medical record text to a sentence vector of a fixed dimension;
将每个所述目标病历文本中的所述句向量按照所述句向量所对应的句子在所述目标病历文本中的顺序输入至分类模型中进行计算,得到第一输出;其中,所述分类模型基于双向循环神经网络模型训练而成;The sentence vector in each target medical record text is input into a classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, the classification The model is based on two-way recurrent neural network model training;
根据所述第一输出得到每个句子的分类标签。According to the first output, the classification label of each sentence is obtained.
有益效果Beneficial effect
本申请提供的电子病历结构化方法、装置、计算机设备和存储介质,通过在截断处按照预设规则引入一部分上下文,将引入的上下文和截断后的电子病历文本一起输入分类模型中,该分类模型基于双向循环神经网络训练而成,能提取上下文信息,再经过SOFTMAX计算各个句子的分类,能够有效的提高截断处周围的句子的结构化的准确性。The electronic medical record structuring method, device, computer equipment and storage medium provided in this application introduce a part of the context at the truncation point according to preset rules, and input the introduced context and the truncated electronic medical record text into the classification model together. The classification model Based on two-way cyclic neural network training, it can extract contextual information, and then calculate the classification of each sentence through SOFTMAX, which can effectively improve the structural accuracy of the sentence around the truncation.
附图说明Description of the drawings
图1是本申请一实施例中电子病历结构化方法的步骤示意图;FIG. 1 is a schematic diagram of the steps of a method for structuring an electronic medical record in an embodiment of the present application;
图2是本申请一实施例中电子病历结构化装置的结构框图;2 is a structural block diagram of an electronic medical record structuring device in an embodiment of the present application;
图3为本申请一实施例的计算机设备的结构示意框图。FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
本发明的最佳实施方式The best mode of the present invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer and clearer, the following further describes the application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.
参照图1,本申请一实施例提供一种电子病历结构化方法,包括:1, an embodiment of the present application provides a method for structuring an electronic medical record, including:
步骤S1,获取电子病历文本,以及所述电子病历文本的句子数;Step S1, obtaining the electronic medical record text and the number of sentences in the electronic medical record text;
步骤S2,检测所述电子病历文本的句子数是否超过预设阀值;Step S2, detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold;
步骤S3,若超过,则将所述电子病历文本进行截断,得到多个电子病历子文本;Step S3, if it exceeds, the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts;
步骤S4,将每个所述电子病历子文本通过预设规则引入上下文,得到目标病历文本;Step S4, introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;
步骤S5,将所述目标病历文本中的每个句子映射为固定维度的句向量;Step S5: Map each sentence in the target medical record text to a sentence vector of a fixed dimension;
步骤S6,将每个所述目标病历文本中的所述句向量按照所述句向量所对应的句子在所述目标病历文本中的顺序输入至分类模型中进行计算,得到第一输出;其中,所述分类模型基于双向循环神经网络模型训练而成;Step S6, the sentence vector in each target medical record text is input into the classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, The classification model is based on training of a bidirectional cyclic neural network model;
步骤S7,根据所述第一输出得到每个句子的分类标签。Step S7: Obtain the classification label of each sentence according to the first output.
本实施例中,如上步骤S1所述,获取电子病历文本,对获取到的电子病历文本可做一些预处理,如通过numpy、pandas、jieba等工具进行文本预处理和数据清洗,包括中文分词、去停用词、去除无用符号等,还可对电子病历文本中的隐私进行信息脱敏,然后去除病人隐私,隐私包括:姓名、床号、住院号、地址等易被他人识别的关键隐私信息,获取经过上述处理后的电子病历文本的句子数。In this embodiment, as described in step S1 above, the electronic medical record text is acquired, and some preprocessing can be performed on the acquired electronic medical record text, such as text preprocessing and data cleaning through tools such as numpy, pandas, and jieba, including Chinese word segmentation, Remove stop words, remove useless symbols, etc., and desensitize the privacy in the electronic medical record text, and then remove patient privacy. Privacy includes: name, bed number, hospital number, address and other key private information that can be easily identified by others , To obtain the number of sentences in the electronic medical record text after the above processing.
如上述步骤S2-S3所述,由于分类模型支持的长度有限,因此当电子病历文本的句子数超过预设阀值后,需要对电子病历文本进行截断,使得截断后的电子病历子文本中的句子能够输入到分类模型中。As mentioned in the above steps S2-S3, due to the limited length supported by the classification model, when the number of sentences in the electronic medical record text exceeds the preset threshold, the electronic medical record text needs to be truncated, so that the text in the truncated electronic medical record text Sentences can be input into the classification model.
如上述步骤S4所述,由于对电子病历文本进行了截断,因此对每个电子病历子文本按照预设规则引入上下文,如将一个电子病历文本截断为3个电子病历子文本,按照顺序依次为第一电子病历子文本、第二电子病历子文本和第三电子病历子文本,在第一电子病历子文本的截断处引入一部分第二电子病历子文本中句子,在第二电子病历子文本中的开头截断处引入一部分第一电子病历子文本的句子,在第二电子病历子文本中的结尾截断处引入一部分第三电子病历子文本的句子,在第三电子病历子文本的截断处引入第二电子病历子文本中的一部分句子。As mentioned in step S4 above, since the electronic medical record text is truncated, the context is introduced for each electronic medical record sub-text according to preset rules. For example, if one electronic medical record text is truncated into three electronic medical record sub-texts, they are as follows: The first electronic medical record sub-text, the second electronic medical record sub-text and the third electronic medical record sub-text, a part of the sentence in the second electronic medical record sub-text is introduced at the truncation of the first electronic medical record sub-text, in the second electronic medical record sub-text Introduce a part of the sentence of the first electronic medical record sub-text at the beginning of the truncation, introduce a part of the sentence of the third electronic medical record sub-text at the end of the second electronic medical record sub-text, and introduce the first sentence of the third electronic medical record sub-text at the truncation of the third electronic medical record sub-text 2. A part of the sentence in the sub-text of the electronic medical record.
如上述步骤S5所述,将所述目标病历文本中的每个句子映射为固定维度的句向量,具体的,可通过神经网络(可以为卷积神经网络,循环神经网络,transormer等)的encoder(编码器),将句子映射为固定维度的向量,我们就可以得到单个句子经过神经网络的向量表示。如此将电子病历文本中的每个句子输入到神经网络中,可以得到每个句子的向量表示,如此一个完整的电子病历文本就可以用所有句子的句向量来表示。As described in step S5 above, each sentence in the target medical record text is mapped to a sentence vector of a fixed dimension. Specifically, the encoder can be passed through a neural network (convolutional neural network, recurrent neural network, transormer, etc.) (Encoder), by mapping the sentence to a vector of fixed dimensions, we can get the vector representation of a single sentence through the neural network. In this way, each sentence in the electronic medical record text is input into the neural network, and the vector representation of each sentence can be obtained, so that a complete electronic medical record text can be represented by the sentence vectors of all sentences.
如上述步骤S6所述,病历中的句子不是相互独立的而是上下文相关的,如描述治疗过程的部分通常是由多个句子组成的,一个描述治疗的句子它的上下文也是描述治疗过程的概率会比描述用户的既往病史的概率更大,因此只对文本做单个句子的分类不会取得很好的效果,需要将上下文信息都囊括进来,将句向量按照顺序输入到分类模型中,该分类模型基于双向循环神经网络模型训练而成,经过分类模型的前向推算和后向推算,每个句子能更好地获得上下文信息,有效地提高分类的准确性,具体的,该分类模型可将每个句子分类为基本信息、个人史、家族史、既往史、现病史、主诉、检查、诊断、治疗、总结、其他中的一个。As mentioned in step S6 above, the sentences in the medical record are not independent of each other but context-related. For example, the part describing the treatment process is usually composed of multiple sentences, and the context of a sentence describing the treatment is also the probability of describing the treatment process It is more likely than describing the user’s past medical history. Therefore, only a single sentence classification of the text will not achieve good results. It is necessary to include all the context information and input the sentence vectors into the classification model in order. This classification The model is trained based on a two-way cyclic neural network model. After the forward and backward calculations of the classification model, each sentence can better obtain contextual information and effectively improve the accuracy of classification. Specifically, the classification model can Each sentence is classified into one of basic information, personal history, family history, past history, current medical history, chief complaint, examination, diagnosis, treatment, summary, and others.
如上述步骤S7所述,根据第一输出得到每个句子的分类标签,具体的,将每个所述句向量的第一输出经过SOFTMAX计算,SOFTMAX可将将一个K维的任意实数向量映射成另一个K维的实数向量,其中向量中的每个元素取值都介于(0,1)之间,SOFTMAX的函数表达式为:
Figure PCTCN2020125146-appb-000001
其中,K表示分类的类别数,j表示K个分类中某个分类,j∈(0,K],z j表示该分类的值。经过上述计算,得到每个句子在每个类别中的值,选择值最大的一个类别作为该句子的分类标签。
As described in step S7 above, the classification label of each sentence is obtained according to the first output. Specifically, the first output of each sentence vector is calculated by SOFTMAX, and SOFTMAX can map a K-dimensional arbitrary real number vector into Another K-dimensional real number vector, where each element in the vector has a value between (0, 1). The function expression of SOFTMAX is:
Figure PCTCN2020125146-appb-000001
Among them, K represents the number of categories, j represents a category in K categories, j ∈ (0, K], z j represents the value of the category. After the above calculation, the value of each sentence in each category is obtained , Select the category with the largest value as the classification label of the sentence.
本实施例中,通过在截断处按照预设规则引入一部分上下文,将引入的上下文和截断后的电子病历文本一起输入分类模型中得到第一输出,该分类模型基于双向循环神经网络训练而成,能提取上下文信息,根据第一输出得到每个句子的分类标签,能够有效的提高电子病历截断处的句子的结构化的准确性。In this embodiment, a part of the context is introduced at the truncation according to the preset rules, and the introduced context and the truncated electronic medical record text are input together into the classification model to obtain the first output. The classification model is based on a two-way cyclic neural network training. The context information can be extracted, and the classification label of each sentence can be obtained according to the first output, which can effectively improve the structural accuracy of the sentence at the truncation of the electronic medical record.
在一实施例中,所述根据所述第一输出得到每个句子的分类标签的步骤的步骤S7,包 括:In an embodiment, the step S7 of the step of obtaining the classification label of each sentence according to the first output includes:
步骤S71,将每个所述句向量的所述第一输出输入至CRF(conditional random field,条件随机场)网络和/或自注意力网络中,得到第二输出;Step S71: Input the first output of each sentence vector into a CRF (conditional random field, conditional random field) network and/or a self-attention network to obtain a second output;
步骤S72,将每个所述句向量的所述第二输出经过SOFTMAX计算,得到每个句子的分类标签。In step S72, the second output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
本实施例中,如上述步骤S71-S72所述,将第一输出输入到CRF网络和/或自注意力网络中,可进一步提高分类模型的上下文信息的影响,加强句子间上下文的联系。在其他实施例中,可将第一输出之间经过SOFTMAX计算,得到每个句子的分类标签。In this embodiment, as described in the above steps S71-S72, inputting the first output into the CRF network and/or the self-attention network can further improve the influence of the context information of the classification model and strengthen the contextual connection between sentences. In other embodiments, the SOFTMAX calculation can be performed between the first output to obtain the classification label of each sentence.
在一实施例中,所述将所述目标病历文本中的每个句子映射为固定维度的句向量的步骤S5,包括:In an embodiment, the step S5 of mapping each sentence in the target medical record text to a sentence vector of a fixed dimension includes:
步骤S51,将所述目标病历文本中的每个句子输入至神经网络中;Step S51, input each sentence in the target medical record text into the neural network;
步骤S52,通过所述神经网络的encoder将每个所述句子映射为固定维度的句向量。Step S52: Map each sentence to a sentence vector of a fixed dimension through the encoder of the neural network.
本实施例中,通过神经网络(可以为卷积神经网络,循环神经网络,transformer等)的encoder,将句子映射为固定维度的向量,我们就可以得到单个句子经过神经网络的向量表示。以transformer模型为例,transformer模型的encoder是有N=6个layers层组成的,每一层包含了两个sub-layers,第一个sub-layer就是多头注意力层(multi-head attention layer),第二个就是一个简单的全连接层。在每个sub-layer层之间都用了残差连接,根据resNet,我们知道残差连接实际上是:H(x)=F(x)+x;因此每个sub-layer的输出都是:LayerNorm(x+Sublayer(x)),LayerNorm中每个样本都有不同的均值和方差。每个Layer的输入和输出的维度是一致的。如此将病历数据中的每个句子输入到transformer模型中,可以得到每个句子的向量表示,如此一个完整的电子病历文本就可以用所有句子的句向量来表示。In this embodiment, through the encoder of a neural network (which may be a convolutional neural network, a cyclic neural network, a transformer, etc.), a sentence is mapped to a vector of a fixed dimension, and we can obtain a vector representation of a single sentence through the neural network. Take the transformer model as an example. The encoder of the transformer model is composed of N=6 layers, and each layer contains two sub-layers. The first sub-layer is the multi-head attention layer. , The second is a simple fully connected layer. A residual connection is used between each sub-layer layer. According to resNet, we know that the residual connection is actually: H(x)=F(x)+x; therefore, the output of each sub-layer is : LayerNorm(x+Sublayer(x)), each sample in LayerNorm has a different mean and variance. The dimensions of the input and output of each Layer are consistent. In this way, each sentence in the medical record data is input into the transformer model, and the vector representation of each sentence can be obtained, so that a complete electronic medical record text can be represented by the sentence vectors of all sentences.
在一实施例中,所述将每个所述电子病历子文本通过预设规则引入上下文,得到目标病历文本的步骤S4,包括:In an embodiment, the step S4 of introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text includes:
步骤S41,检测各个所述电子病历子文本在所述电子病历文本中的位置;Step S41, detecting the position of each sub-text of the electronic medical record in the text of the electronic medical record;
步骤S42,当所述电子病历子文本在所述电子病历文本的位置为开始时,在所述电子病历子文本的截断处引入下一个电子病历子文本中开始部分的预设个数的句子;Step S42, when the electronic medical record sub-text starts at the position of the electronic medical record text, introduce a preset number of sentences at the beginning of the next electronic medical record sub-text at the truncation of the electronic medical record sub-text;
步骤S43,当所述电子病历文本在所述电子病历文本的位置为中间时,在所述电子病历子文本的开始截断处引入上一个电子病历子文本的结尾部分预设个数的句子,在所述电子病历子文本的结尾截断处引入下一个电子病历子文本的开始部分的预设个数的句子;Step S43: When the electronic medical record text is in the middle of the position of the electronic medical record text, introduce a preset number of sentences at the end of the last electronic medical record sub-text at the beginning of the electronic medical record sub-text. The ending truncation of the electronic medical record sub-text introduces a preset number of sentences at the beginning of the next electronic medical record sub-text;
步骤S44,当所述电子病历子文本在所述电子病历文本的位置为结尾时,在所述电子病历子文本的截断处引入上一个电子病历子文本中结尾部分的预设个数的句子。Step S44, when the electronic medical record sub-text ends at the position of the electronic medical record text, introduce a preset number of sentences at the end of the last electronic medical record sub-text at the truncation of the electronic medical record sub-text.
本实施例中,为每个电子病历子文本引入一部分的句子,如一份电子病历文本中句子数为120,而分类模型一次仅能支持50句,可将电子自病历子文本按照句子数进行均分,如分成4份,每份30句,可将第一份的结尾处引入第二份开始的前10句,形成第一份目标病历文本;可在第二份的开始引入第一份结尾处的10句,在第二份的结尾处引入第三份的开始的前10句,形成第二份目标病历子文件,具体的每个电子病历子文本引入的句子数可根据需要预先进行设定。本实施例通过为每个电子病历子文本引入上下文中的句子,再输入至分类模型进行分类,通过上下文之间的联系提高每个句子分类的准确性。In this embodiment, a part of sentences is introduced for each electronic medical record sub-text. For example, the number of sentences in an electronic medical record text is 120, and the classification model can only support 50 sentences at a time, and the electronic self-medical record sub-texts can be evened according to the number of sentences. If divided into 4 parts, each with 30 sentences, the end of the first part can be introduced into the first 10 sentences at the beginning of the second part to form the first target medical record text; the end of the first part can be introduced at the beginning of the second part At the end of the second part, the first 10 sentences at the beginning of the third part are introduced to form the second target medical record sub-file. The specific number of sentences introduced in each electronic medical record sub-text can be set in advance according to needs. Certainly. In this embodiment, a sentence in the context is introduced for each electronic medical record sub-text, and then input to the classification model for classification, and the accuracy of classification of each sentence is improved through the connection between the contexts.
在一实施例中,所述所述检测所述电子病历文本的句子数是否超过预设阀值的步骤S2之后,包括:In an embodiment, after the step S2 of detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold, the method includes:
步骤S2A,若不超过,将所述电子病历文本中的每个句子映射为固定维度的句向量;Step S2A, if it does not exceed, map each sentence in the electronic medical record text to a sentence vector of a fixed dimension;
步骤S2B,将每个所述电子病历文本中的所述句向量按照顺序输入至所述分类模型中进行计算,得到第三输出;Step S2B, input the sentence vectors in each electronic medical record text into the classification model in order for calculation to obtain a third output;
步骤S2C,将每个所述句向量的所述第三输出经过SOFTMAX计算,得到每个句子的 分类标签。In step S2C, the third output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
本实施例中,当电子病历文本的句子数未超过预设阀值时,直接将每个句子的句向量按照顺序输入至分类模型中进行计算,再经过SOFTMAX函数的计算得到各个句子的分类标签。In this embodiment, when the number of sentences in the electronic medical record text does not exceed the preset threshold, the sentence vector of each sentence is directly input into the classification model in order for calculation, and then the classification label of each sentence is calculated by the SOFTMAX function. .
在一实施例中,所述将每个所述目标病历文本中的所述句向量按照所述句向量所对应的句子在所述目标病历文本中的顺序输入至分类模型中进行计算,得到第一输出的步骤S6之前,包括:In an embodiment, the sentence vector in each target medical record text is input into the classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain the first Before step S6 of an output, it includes:
步骤S6a,获取训练数据集中的病例样本,所述病历样本中各个句子具有正确分类标签;Step S6a: Obtain case samples in the training data set, and each sentence in the medical record sample has a correct classification label;
步骤S6b,对所述病历样本进行截断,得到多个病历子样本;Step S6b, truncating the medical record sample to obtain multiple medical record sub-samples;
步骤S6c,将每个所述病历子样本通过预设规则引入上下文,得到目标病历样本;In step S6c, each of the medical record sub-samples is introduced into the context through a preset rule to obtain a target medical record sample;
步骤S6d,将所述目标病历样本中的每个句子映射为固定维度的句向量;Step S6d, mapping each sentence in the target medical record sample to a sentence vector of a fixed dimension;
步骤S6e,将每个所述目标病历文本中的所述句向量按照顺序输入至双向循环神经网模型中进行计算,得到训练输出;Step S6e, input the sentence vectors in each target medical record text into the bidirectional recurrent neural network model in order for calculation to obtain training output;
步骤S6f,将所述训练输出经过SOFTMAX计算得到预测输出;Step S6f, calculating the training output through SOFTMAX to obtain the predicted output;
步骤S6g,通过损失函数计算所述病历子样本中的各个句子的损失值;Step S6g: Calculate the loss value of each sentence in the medical record sub-sample by using a loss function;
步骤S6h,根据所述损失值确定分类模型参数,完成分类模型的训练。In step S6h, the classification model parameters are determined according to the loss value, and the training of the classification model is completed.
本实施例中,如上述步骤S6g所述,计算病历子样本中各个句子的损失值,病历子样本中按照一定的规则引入上下文,上下文跟随病历子样本中的各个句子输入至双向循环神经网络中,提取到上下文信息,且会得到各个句子的输出,将各个句子的输出经过SOFTMAX计算得到各个句子的期望输出,再通过损失函数仅计算病历子样本中各个句子的损失值,选择最小的损失值所对应的模型参数作为最终的模型参数,完成分类模型的训练。本实施例中,每个病历子样本均引入了上下文,但引入的上下文仅仅提供上下文信息,不参与损失值的计算和最终的分类。具体的,通过交叉熵函数
Figure PCTCN2020125146-appb-000002
计算病历子样本中各个句子的损失值,其中,y为病历子样本中各个句子的期望输出,
Figure PCTCN2020125146-appb-000003
为其正确的分类标签。
In this embodiment, as described in step S6g above, the loss value of each sentence in the medical record subsample is calculated. The medical record subsample introduces context according to certain rules, and the context follows each sentence in the medical record subsample and is input to the bidirectional recurrent neural network. , Extract the context information, and get the output of each sentence. The output of each sentence is calculated through SOFTMAX to obtain the expected output of each sentence, and then only the loss value of each sentence in the medical record subsample is calculated through the loss function, and the smallest loss value is selected The corresponding model parameters are used as the final model parameters to complete the training of the classification model. In this embodiment, each medical record sub-sample introduces context, but the introduced context only provides context information, and does not participate in the calculation of the loss value and the final classification. Specifically, through the cross entropy function
Figure PCTCN2020125146-appb-000002
Calculate the loss value of each sentence in the medical record subsample, where y is the expected output of each sentence in the medical record subsample,
Figure PCTCN2020125146-appb-000003
For its correct classification label.
本申请提供的电子病历结构化方法可运用在区块链领域中,将训练好的分类模型存储在区块链网络中,同时电子病历文本也可存储在区块链网络中,区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层。区块链网络(Blockchain Network),是指通过共识的方式将新区块纳入区块链的一系列的节点的集合。The electronic medical record structuring method provided in this application can be used in the blockchain field. The trained classification model is stored in the blockchain network. At the same time, the electronic medical record text can also be stored in the blockchain network. The blockchain is New application modes of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer. Blockchain Network (Blockchain Network) refers to the collection of a series of nodes that incorporate new blocks into the blockchain through consensus.
区块链底层平台可以包括用户管理、基础服务、智能合约以及运营监控等处理模块。其中,用户管理模块负责所有区块链参与者的身份信息管理,包括维护公私钥生成(账户管理)、密钥管理以及用户真实身份和区块链地址对应关系维护(权限管理)等,并且在授权的情况下,监管和审计某些真实身份的交易情况,提供风险控制的规则配置(风控审计);基础服务模块部署在所有区块链节点设备上,用来验证业务请求的有效性,并对有效请求完成共识后记录到存储上,对于一个新的业务请求,基础服务先对接口适配解析和鉴权处理(接口适配),然后通过共识算法将业务信息加密(共识管理),在加密之后完整一致的传输至共享账本上(网络通信),并进行记录存储;智能合约模块负责合约的注册发行以及合约触发和合约执行,开发人员可以通过某种编程语言定义合约逻辑,发布到区块链上(合约注册),根据合约条款的逻辑,调用密钥或者其它的事件触发执行,完成合约逻辑,同时还提供对合约升级注销的功能;运营监控模块主要负责产品发布过程中的部署、配置的修改、合约 设置、云适配以及产品运行中的实时状态的可视化输出,例如:告警、监控网络情况、监控节点设备健康状态等。The underlying platform of the blockchain can include processing modules such as user management, basic services, smart contracts, and operation monitoring. Among them, the user management module is responsible for the identity information management of all blockchain participants, including the maintenance of public and private key generation (account management), key management, and maintenance of the correspondence between the user’s real identity and the blockchain address (authority management), etc. In the case of authorization, supervise and audit certain real-identity transactions, and provide risk control rule configuration (risk control audit); basic service modules are deployed on all blockchain node devices to verify the validity of business requests, After completing the consensus on the valid request, it is recorded on the storage. For a new business request, the basic service first performs interface adaptation analysis and authentication processing (interface adaptation), and then encrypts the business information through the consensus algorithm (consensus management), After encryption, it is completely and consistently transmitted to the shared ledger (network communication), and recorded and stored; the smart contract module is responsible for contract registration and issuance, contract triggering and contract execution. Developers can define the contract logic through a certain programming language and publish it to On the blockchain (contract registration), according to the logic of the contract terms, call keys or other events to trigger execution, complete the contract logic, and also provide the function of contract upgrade and cancellation; the operation monitoring module is mainly responsible for the deployment of the product release process , Configuration modification, contract settings, cloud adaptation, and visual output of real-time status during product operation, such as: alarms, monitoring network conditions, monitoring node equipment health status, etc.
本申请提供的电子病历结构化方法、装置、计算机设备和存储介质,可应用在智能医疗领域,加快数字医疗的建设,从而推动智慧城市的建设。The structuring method, device, computer equipment, and storage medium of electronic medical records provided in this application can be applied in the field of smart medical care to accelerate the construction of digital medical care, thereby promoting the construction of smart cities.
参照图2,本申请一实施例还提供一种电子病历结构化装置,包括:2, an embodiment of the present application further provides an electronic medical record structuring device, including:
第一获取单元10,用于获取电子病历文本,以及所述电子病历文本的句子数;The first obtaining unit 10 is configured to obtain the electronic medical record text and the number of sentences in the electronic medical record text;
检测单元20,用于检测所述电子病历文本的句子数是否超过预设阀值;The detection unit 20 is configured to detect whether the number of sentences in the electronic medical record text exceeds a preset threshold;
第一截断单元30,用于若超过,则将所述电子病历文本进行截断,得到多个电子病历子文本;The first truncation unit 30 is used for truncating the electronic medical record text if it exceeds, to obtain a plurality of electronic medical record sub-texts;
第一引入单元40,用于将每个所述电子病历子文本通过预设规则引入上下文,得到目标病历文本;The first introduction unit 40 is configured to introduce each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;
第一映射单元50,用于将所述目标病历文本中的每个句子映射为固定维度的句向量;The first mapping unit 50 is configured to map each sentence in the target medical record text to a sentence vector of a fixed dimension;
第一计算单元60,用于将每个所述目标病历文本中的所述句向量按照所述句向量所对应的句子在所述目标病历文本中的顺序输入至分类模型中进行计算,得到第一输出;其中,所述分类模型基于双向循环神经网络模型训练而成;The first calculation unit 60 is configured to input the sentence vector in each target medical record text into the classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain the first One output; wherein, the classification model is based on a two-way recurrent neural network model training;
第二计算单元70,用于根据所述第一输出得到每个句子的分类标签。The second calculation unit 70 is configured to obtain the classification label of each sentence according to the first output.
在一实施例中,所述第二计算单元70包括:In an embodiment, the second calculation unit 70 includes:
第一输入子单元,用于将每个所述句向量的所述第一输出输入至CRF网络和/或自注意力网络中,得到第二输出;A first input subunit, configured to input the first output of each sentence vector into a CRF network and/or a self-attention network to obtain a second output;
计算子单元,用于将每个所述句向量的所述第二输出经过SOFTMAX计算,得到每个句子的分类标签。The calculation subunit is configured to perform SOFTMAX calculation on the second output of each sentence vector to obtain the classification label of each sentence.
在一实施例中,所述第一映射单元50,包括:In an embodiment, the first mapping unit 50 includes:
第二输入子单元,用于将所述目标病历文本中的每个句子输入至神经网络中;The second input subunit is used to input each sentence in the target medical record text into the neural network;
映射子单元,用于通过所述神经网络的encoder将每个所述句子映射为固定维度的句向量。The mapping subunit is used to map each sentence to a sentence vector of a fixed dimension through the encoder of the neural network.
在一实施例中,所述第一引入单元40,包括:In an embodiment, the first introduction unit 40 includes:
检测子单元,用于检测各个所述电子病历子文本在所述电子病历文本中的位置;The detection subunit is used to detect the position of each of the electronic medical record sub-texts in the electronic medical record text;
第一引入子单元,用于当所述电子病历子文本在所述电子病历文本的位置为开始时,在所述电子病历子文本的截断处引入下一个电子病历子文本中开始部分的预设个数的句子;The first introduction sub-unit is used for when the electronic medical record sub-text starts at the position of the electronic medical record text, introduce the preset of the beginning part of the next electronic medical record sub-text at the truncation of the electronic medical record sub-text Number of sentences
第二引入子单元,用于当所述电子病历文本在所述电子病历文本的位置为中间时,在所述电子病历子文本的开始截断处引入上一个电子病历子文本的结尾部分预设个数的句子,在所述电子病历子文本的结尾截断处引入下一个电子病历子文本的开始部分的预设个数的句子;The second introduction sub-unit is used to introduce a preset end portion of the previous electronic medical record sub-text at the beginning and truncation of the electronic medical record sub-text when the electronic medical record text is in the middle of the position of the electronic medical record text Number of sentences, introducing a preset number of sentences at the beginning of the next electronic medical record sub-text at the end truncation of the electronic medical record sub-text;
第三引入子单元,用于当所述电子病历子文本在所述电子病历文本的位置为结尾时,在所述电子病历子文本的截断处引入上一个电子病历子文本中结尾部分的预设个数的句子。The third introduction sub-unit is used to introduce the preset ending part of the last electronic medical record sub-text at the truncation of the electronic medical record sub-text when the electronic medical record sub-text ends at the position of the electronic medical record text Number of sentences.
在一实施例中,所述电子病历结构化装置,还包括:In an embodiment, the electronic medical record structuring device further includes:
第二映射单元,用于若不超过,将所述电子病历文本中的每个句子映射为固定维度的句向量;The second mapping unit is used to map each sentence in the electronic medical record text to a sentence vector of a fixed dimension if it is not exceeded;
第三计算单元,用于将每个所述电子病历文本中的所述句向量按照顺序输入至所述分类模型中进行计算,得到第三输出;The third calculation unit is configured to input the sentence vectors in each electronic medical record text into the classification model in order for calculation to obtain a third output;
第四计算单元,用于将每个所述句向量的所述第三输出经过SOFTMAX计算,得到每个句子的分类标签。The fourth calculation unit is configured to perform SOFTMAX calculation on the third output of each sentence vector to obtain the classification label of each sentence.
在一实施例中,所述电子病历结构化装置还包括:In an embodiment, the electronic medical record structuring device further includes:
第二获取单元,用于获取训练数据集中的病例样本,所述病历样本中各个句子具有正确分类标签;The second acquiring unit is used to acquire case samples in the training data set, where each sentence in the medical record sample has a correct classification label;
第二截断单元,用于对所述病历样本进行截断,得到多个病历子样本;The second truncation unit is used for truncating the medical record sample to obtain multiple medical record sub-samples;
第二引入单元,用于将每个所述病历子样本通过预设规则引入上下文,得到目标病历样本;The second introduction unit is used to introduce each of the medical record sub-samples into the context through preset rules to obtain the target medical record sample;
第三映射单元,用于将所述目标病历样本中的每个句子映射为固定维度的句向量;The third mapping unit is used to map each sentence in the target medical record sample to a sentence vector of a fixed dimension;
第五计算单元,用于将每个所述目标病历文本中的所述句向量按照顺序输入至双向循环神经网模型中进行计算,得到训练输出;A fifth calculation unit, configured to sequentially input the sentence vectors in each target medical record text into the bidirectional cyclic neural network model for calculation to obtain training output;
第六计算单元,用于将所述训练输出经过SOFTMAX计算得到预测输出;The sixth calculation unit is used to calculate the training output through SOFTMAX to obtain the predicted output;
第七计算单元,用于通过损失函数计算所述病历子样本中的各个句子的损失值;The seventh calculation unit is used to calculate the loss value of each sentence in the medical record sub-sample by using a loss function;
确定单元,用于根据所述损失值确定分类模型参数,完成分类模型的训练。The determining unit is used to determine the parameters of the classification model according to the loss value to complete the training of the classification model.
在一实施例中,所述第七计算单元,包括:In an embodiment, the seventh calculation unit includes:
计算子单元,用于通过交叉熵函数计算所述病历子样本中的各个句子的损失值;所述交叉熵函数的公式为:
Figure PCTCN2020125146-appb-000004
所述y为预测输出,
Figure PCTCN2020125146-appb-000005
为所述正确分类标签。
The calculation subunit is used to calculate the loss value of each sentence in the medical record subsample through a cross entropy function; the formula of the cross entropy function is:
Figure PCTCN2020125146-appb-000004
The y is the predicted output,
Figure PCTCN2020125146-appb-000005
For the correct classification label.
在本实施例中,上述各个单元、子单元、模块的具体实现请参照上述方法实施例中所述,在此不再进行赘述。In this embodiment, please refer to the above method embodiment for the specific implementation of the above-mentioned units, sub-units, and modules, which will not be repeated here.
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储电子病历数据等。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种电子病历结构化方法。Referring to FIG. 3, an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store electronic medical record data and so on. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a structuring method of electronic medical records.
本领域技术人员可以理解,图3中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定。Those skilled in the art can understand that the structure shown in FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
本申请一实施例还提供一种计算机可读存储介质,上述存储介质可以是非易失性存储介质,也可以是易失性存储介质。其上存储有计算机程序,计算机程序被处理器执行时实现一种电子病历结构化方法。An embodiment of the present application also provides a computer-readable storage medium. The above-mentioned storage medium may be a non-volatile storage medium or a volatile storage medium. A computer program is stored thereon, and when the computer program is executed by a processor, a method for structuring an electronic medical record is realized.
综上所述,为本申请实施例中提供的电子病历结构化方法、装置、计算机设备和存储介质,通过在截断处按照预设规则引入一部分上下文,将引入的上下文和截断后的电子病历文本一起输入分类模型中,该分类模型基于双向循环神经网络训练而成,能提取上下文信息,再经过SOFTMAX计算各个句子的分类,能够有效的提高截断处周围的句子的结构化的准确性。In summary, the electronic medical record structuring method, device, computer equipment, and storage medium provided in the embodiments of this application introduce a part of the context at the truncation place according to preset rules, and combine the introduced context and the truncated electronic medical record text Enter the classification model together. The classification model is based on two-way recurrent neural network training, which can extract context information, and then calculate the classification of each sentence through SOFTMAX, which can effectively improve the accuracy of the structure of the sentence around the truncation.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储与一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM通过多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、 同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored and a non-volatile computer readable storage. In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其它变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其它要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that, in this article, the terms "including", "including" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其它相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of this application, and do not therefore limit the scope of the patent of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of this application.

Claims (20)

  1. 一种电子病历结构化方法,其中,包括以下步骤:A method for structuring electronic medical records, which includes the following steps:
    获取电子病历文本,以及所述电子病历文本的句子数;Acquiring the electronic medical record text and the number of sentences in the electronic medical record text;
    检测所述电子病历文本的句子数是否超过预设阀值;Detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold;
    若超过,则将所述电子病历文本进行截断,得到多个电子病历子文本;If it exceeds, the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts;
    将每个所述电子病历子文本通过预设规则引入上下文,得到目标病历文本;Introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;
    将所述目标病历文本中的每个句子映射为固定维度的句向量;Mapping each sentence in the target medical record text to a sentence vector of a fixed dimension;
    将每个所述目标病历文本中的所述句向量按照所述句向量所对应的句子在所述目标病历文本中的顺序输入至分类模型中进行计算,得到第一输出;其中,所述分类模型基于双向循环神经网络模型训练而成;The sentence vector in each target medical record text is input into a classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, the classification The model is based on two-way recurrent neural network model training;
    根据所述第一输出得到每个句子的分类标签。According to the first output, the classification label of each sentence is obtained.
  2. 根据权利要求1所述的电子病历结构化方法,其中,所述根据所述第一输出得到每个句子的分类标签的步骤,包括:The method for structuring electronic medical records according to claim 1, wherein the step of obtaining the classification label of each sentence according to the first output comprises:
    将每个所述句向量的所述第一输出输入至CRF网络和/或自注意力网络中,得到第二输出;Inputting the first output of each sentence vector into a CRF network and/or a self-attention network to obtain a second output;
    将每个所述句向量的所述第二输出经过SOFTMAX计算,得到每个句子的分类标签。The second output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
  3. 根据权利要求1所述的电子病历结构化方法,其中,所述将所述目标病历文本中的每个句子映射为固定维度的句向量的步骤,包括:The method for structuring electronic medical records according to claim 1, wherein the step of mapping each sentence in the target medical record text to a sentence vector of a fixed dimension comprises:
    将所述目标病历文本中的每个句子输入至神经网络中;Input each sentence in the target medical record text into the neural network;
    通过所述神经网络的encoder将每个所述句子映射为固定维度的句向量。The encoder of the neural network maps each sentence to a sentence vector of a fixed dimension.
  4. 根据权利要求1所述的电子病历结构化方法,其中,所述将每个所述电子病历子文本通过预设规则引入上下文,得到目标病历文本的步骤,包括:The method for structuring an electronic medical record according to claim 1, wherein the step of introducing each sub-text of the electronic medical record into the context through a preset rule to obtain the target medical record text comprises:
    检测各个所述电子病历子文本在所述电子病历文本中的位置;Detecting the position of each sub-text of the electronic medical record in the electronic medical record text;
    当所述电子病历子文本在所述电子病历文本的位置为开始时,在所述电子病历子文本的截断处引入下一个电子病历子文本中开始部分的预设个数的句子;When the electronic medical record sub-text starts at the position of the electronic medical record text, introduce a preset number of sentences at the beginning of the next electronic medical record sub-text at the truncation of the electronic medical record sub-text;
    当所述电子病历文本在所述电子病历文本的位置为中间时,在所述电子病历子文本的开始截断处引入上一个电子病历子文本的结尾部分预设个数的句子,在所述电子病历子文本的结尾截断处引入下一个电子病历子文本的开始部分的预设个数的句子;When the electronic medical record text is in the middle of the position of the electronic medical record text, introduce a preset number of sentences at the end of the last electronic medical record sub-text at the beginning of the electronic medical record sub-text. The truncation at the end of the medical record sub-text introduces the preset number of sentences at the beginning of the next electronic medical record sub-text;
    当所述电子病历子文本在所述电子病历文本的位置为结尾时,在所述电子病历子文本的截断处引入上一个电子病历子文本中结尾部分的预设个数的句子。When the electronic medical record sub-text ends at the position of the electronic medical record text, a preset number of sentences in the ending part of the last electronic medical record sub-text are introduced at the truncation of the electronic medical record sub-text.
  5. 根据权利要求1所述的电子病历结构化方法,其中,所述检测所述电子病历文本的句子数是否超过预设阀值的步骤之后,包括:The method for structuring electronic medical records according to claim 1, wherein after the step of detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold, the method comprises:
    若不超过,将所述电子病历文本中的每个句子映射为固定维度的句向量;If it does not exceed, map each sentence in the electronic medical record text to a sentence vector with a fixed dimension;
    将每个所述电子病历文本中的所述句向量按照顺序输入至所述分类模型中进行计算,得到第三输出;Inputting the sentence vectors in each electronic medical record text into the classification model in order for calculation to obtain a third output;
    将每个所述句向量的所述第三输出经过SOFTMAX计算,得到每个句子的分类标签。The third output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
  6. 根据权利要求1所述的电子病历结构化方法,其中,所述将每个所述目标病历文本中的所述句向量按照所述句向量所对应的句子在所述目标病历文本中的顺序输入至分类模型中进行计算,得到第一输出的步骤之前,包括:The method for structuring an electronic medical record according to claim 1, wherein the sentence vector in each of the target medical record text is input in the order of the sentence corresponding to the sentence vector in the target medical record text Before performing calculations in the classification model to obtain the first output, the steps include:
    获取训练数据集中的病例样本,所述病历样本中各个句子具有正确分类标签;Obtain case samples in the training data set, where each sentence in the medical record sample has a correct classification label;
    对所述病历样本进行截断,得到多个病历子样本;Truncating the medical record sample to obtain multiple medical record sub-samples;
    将每个所述病历子样本通过预设规则引入上下文,得到目标病历样本;将所述目标病历样本中的每个句子映射为固定维度的句向量;Introducing each of the medical record sub-samples into the context through preset rules to obtain a target medical record sample; mapping each sentence in the target medical record sample to a sentence vector of a fixed dimension;
    将每个所述目标病历文本中的所述句向量按照顺序输入至双向循环神经网模型中进行计算,得到训练输出;Inputting the sentence vectors in each target medical record text into a bidirectional cyclic neural network model in order for calculation to obtain a training output;
    将所述训练输出经过SOFTMAX计算得到预测输出;Calculating the training output through SOFTMAX to obtain a prediction output;
    通过损失函数计算所述病历子样本中的各个句子的损失值;Calculate the loss value of each sentence in the medical record sub-sample by using a loss function;
    根据所述损失值确定分类模型参数,完成分类模型的训练。The classification model parameters are determined according to the loss value, and the training of the classification model is completed.
  7. 根据权利要求6所述的电子病历结构化方法,其中,所述通过损失函数计算所述病历子样本中的各个句子的损失值的步骤,包括:The method for structuring an electronic medical record according to claim 6, wherein the step of calculating the loss value of each sentence in the medical record sub-sample by using a loss function comprises:
    通过交叉熵函数计算所述病历子样本中的各个句子的损失值;所述交叉熵函数的公式为:
    Figure PCTCN2020125146-appb-100001
    所述y为预测输出,
    Figure PCTCN2020125146-appb-100002
    为所述正确分类标签。
    The loss value of each sentence in the medical record sub-sample is calculated by a cross entropy function; the formula of the cross entropy function is:
    Figure PCTCN2020125146-appb-100001
    The y is the predicted output,
    Figure PCTCN2020125146-appb-100002
    For the correct classification label.
  8. 一种电子病历结构化装置,其中,包括:An electronic medical record structured device, which includes:
    第一获取单元,用于获取电子病历文本,以及所述电子病历文本的句子数;The first obtaining unit is used to obtain the electronic medical record text and the number of sentences in the electronic medical record text;
    检测单元,用于检测所述电子病历文本的句子数是否超过预设阀值;The detection unit is configured to detect whether the number of sentences in the electronic medical record text exceeds a preset threshold;
    第一截断单元,用于若超过,则将所述电子病历文本进行截断,得到多个电子病历子文本;The first truncation unit is used for truncating the electronic medical record text if it exceeds, to obtain multiple electronic medical record sub-texts;
    第一引入单元,用于将每个所述电子病历子文本通过预设规则引入上下文,得到目标病历文本;The first introduction unit is used to introduce each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;
    第一映射单元,用于将所述目标病历文本中的每个句子映射为固定维度的句向量;The first mapping unit is used to map each sentence in the target medical record text to a sentence vector of a fixed dimension;
    第一计算单元,用于将每个所述目标病历文本中的所述句向量按照所述句向量所对应的句子在所述目标病历文本中的顺序输入至分类模型中进行计算,得到第一输出;其中,所述分类模型基于双向循环神经网络模型训练而成;The first calculation unit is configured to input the sentence vector in each target medical record text into the classification model for calculation according to the sequence of the sentence corresponding to the sentence vector in the target medical record text to obtain the first Output; wherein, the classification model is based on a two-way recurrent neural network model training;
    第二计算单元,用于根据所述第一输出得到每个句子的分类标签。The second calculation unit is configured to obtain the classification label of each sentence according to the first output.
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机程序,其中,所述处理器执行所述计算机程序时实现一种电子病历结构化方法的步骤:A computer device includes a memory and a processor, and a computer program is stored in the memory, wherein the steps of a method for structuring an electronic medical record are realized when the processor executes the computer program:
    获取电子病历文本,以及所述电子病历文本的句子数;Acquiring the electronic medical record text and the number of sentences in the electronic medical record text;
    检测所述电子病历文本的句子数是否超过预设阀值;Detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold;
    若超过,则将所述电子病历文本进行截断,得到多个电子病历子文本;If it exceeds, the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts;
    将每个所述电子病历子文本通过预设规则引入上下文,得到目标病历文本;Introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;
    将所述目标病历文本中的每个句子映射为固定维度的句向量;Mapping each sentence in the target medical record text to a sentence vector of a fixed dimension;
    将每个所述目标病历文本中的所述句向量按照所述句向量所对应的句子在所述目标病历文本中的顺序输入至分类模型中进行计算,得到第一输出;其中,所述分类模型基于双向循环神经网络模型训练而成;The sentence vector in each target medical record text is input into a classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, the classification The model is based on two-way recurrent neural network model training;
    根据所述第一输出得到每个句子的分类标签。According to the first output, the classification label of each sentence is obtained.
  10. 根据权利要求9所述的计算机设备,其中,所述根据所述第一输出得到每个句子的分类标签的步骤,包括:The computer device according to claim 9, wherein the step of obtaining the classification label of each sentence according to the first output comprises:
    将每个所述句向量的所述第一输出输入至CRF网络和/或自注意力网络中,得到第二输出;Inputting the first output of each sentence vector into a CRF network and/or a self-attention network to obtain a second output;
    将每个所述句向量的所述第二输出经过SOFTMAX计算,得到每个句子的分类标签。The second output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
  11. 根据权利要求9所述的计算机设备,其中,所述将所述目标病历文本中的每个句子映射为固定维度的句向量的步骤,包括:The computer device according to claim 9, wherein the step of mapping each sentence in the target medical record text to a sentence vector of a fixed dimension comprises:
    将所述目标病历文本中的每个句子输入至神经网络中;Input each sentence in the target medical record text into the neural network;
    通过所述神经网络的encoder将每个所述句子映射为固定维度的句向量。The encoder of the neural network maps each sentence to a sentence vector of a fixed dimension.
  12. 根据权利要求9所述的计算机设备,其中,所述将每个所述电子病历子文本通过预设规则引入上下文,得到目标病历文本的步骤,包括:The computer device according to claim 9, wherein the step of introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text comprises:
    检测各个所述电子病历子文本在所述电子病历文本中的位置;Detecting the position of each sub-text of the electronic medical record in the electronic medical record text;
    当所述电子病历子文本在所述电子病历文本的位置为开始时,在所述电子病历子文本的截断处引入下一个电子病历子文本中开始部分的预设个数的句子;When the electronic medical record sub-text starts at the position of the electronic medical record text, introduce a preset number of sentences at the beginning of the next electronic medical record sub-text at the truncation of the electronic medical record sub-text;
    当所述电子病历文本在所述电子病历文本的位置为中间时,在所述电子病历子文本的开始截断处引入上一个电子病历子文本的结尾部分预设个数的句子,在所述电子病历子文本的结尾截断处引入下一个电子病历子文本的开始部分的预设个数的句子;When the electronic medical record text is in the middle of the position of the electronic medical record text, introduce a preset number of sentences at the end of the last electronic medical record sub-text at the beginning of the electronic medical record sub-text. The truncation at the end of the medical record sub-text introduces the preset number of sentences at the beginning of the next electronic medical record sub-text;
    当所述电子病历子文本在所述电子病历文本的位置为结尾时,在所述电子病历子文本的截断处引入上一个电子病历子文本中结尾部分的预设个数的句子。When the electronic medical record sub-text ends at the position of the electronic medical record text, a preset number of sentences in the ending part of the last electronic medical record sub-text are introduced at the truncation of the electronic medical record sub-text.
  13. 根据权利要求9所述的计算机设备,其中,所述检测所述电子病历文本的句子数是否超过预设阀值的步骤之后,包括:9. The computer device according to claim 9, wherein after the step of detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold, the method comprises:
    若不超过,将所述电子病历文本中的每个句子映射为固定维度的句向量;If it does not exceed, map each sentence in the electronic medical record text to a sentence vector with a fixed dimension;
    将每个所述电子病历文本中的所述句向量按照顺序输入至所述分类模型中进行计算,得到第三输出;Inputting the sentence vectors in each electronic medical record text into the classification model in order for calculation to obtain a third output;
    将每个所述句向量的所述第三输出经过SOFTMAX计算,得到每个句子的分类标签。The third output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
  14. 根据权利要求9所述的计算机设备,其中,所述将每个所述目标病历文本中的所述句向量按照所述句向量所对应的句子在所述目标病历文本中的顺序输入至分类模型中进行计算,得到第一输出的步骤之前,包括:9. The computer device according to claim 9, wherein the sentence vector in each of the target medical record text is input to the classification model in the order of the sentence corresponding to the sentence vector in the target medical record text Before the step of calculating in the first output to obtain the first output, it includes:
    获取训练数据集中的病例样本,所述病历样本中各个句子具有正确分类标签;Obtain case samples in the training data set, where each sentence in the medical record sample has a correct classification label;
    对所述病历样本进行截断,得到多个病历子样本;Truncating the medical record sample to obtain multiple medical record sub-samples;
    将每个所述病历子样本通过预设规则引入上下文,得到目标病历样本;将所述目标病历样本中的每个句子映射为固定维度的句向量;Introducing each of the medical record sub-samples into the context through preset rules to obtain a target medical record sample; mapping each sentence in the target medical record sample to a sentence vector of a fixed dimension;
    将每个所述目标病历文本中的所述句向量按照顺序输入至双向循环神经网模型中进行计算,得到训练输出;Inputting the sentence vectors in each target medical record text into a bidirectional cyclic neural network model in order for calculation to obtain a training output;
    将所述训练输出经过SOFTMAX计算得到预测输出;Calculating the training output through SOFTMAX to obtain a prediction output;
    通过损失函数计算所述病历子样本中的各个句子的损失值;Calculate the loss value of each sentence in the medical record sub-sample by using a loss function;
    根据所述损失值确定分类模型参数,完成分类模型的训练。The classification model parameters are determined according to the loss value, and the training of the classification model is completed.
  15. 根据权利要求14所述的计算机设备,其中,所述通过损失函数计算所述病历子样本中的各个句子的损失值的步骤,包括:14. The computer device according to claim 14, wherein the step of calculating the loss value of each sentence in the medical record sub-sample by using a loss function comprises:
    通过交叉熵函数计算所述病历子样本中的各个句子的损失值;所述交叉熵函数的公式为:
    Figure PCTCN2020125146-appb-100003
    所述y为预测输出,
    Figure PCTCN2020125146-appb-100004
    为所述正确分类标签。
    The loss value of each sentence in the medical record sub-sample is calculated by a cross entropy function; the formula of the cross entropy function is:
    Figure PCTCN2020125146-appb-100003
    The y is the predicted output,
    Figure PCTCN2020125146-appb-100004
    For the correct classification label.
  16. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现一种电子病历结构化方法的步骤:A computer-readable storage medium with a computer program stored thereon, wherein the steps of a method for structuring an electronic medical record are realized when the computer program is executed by a processor:
    获取电子病历文本,以及所述电子病历文本的句子数;Acquiring the electronic medical record text and the number of sentences in the electronic medical record text;
    检测所述电子病历文本的句子数是否超过预设阀值;Detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold;
    若超过,则将所述电子病历文本进行截断,得到多个电子病历子文本;If it exceeds, the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts;
    将每个所述电子病历子文本通过预设规则引入上下文,得到目标病历文本;Introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;
    将所述目标病历文本中的每个句子映射为固定维度的句向量;Mapping each sentence in the target medical record text to a sentence vector of a fixed dimension;
    将每个所述目标病历文本中的所述句向量按照所述句向量所对应的句子在所述目标病历文本中的顺序输入至分类模型中进行计算,得到第一输出;其中,所述分类模型基于双向循环神经网络模型训练而成;The sentence vector in each target medical record text is input into a classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, the classification The model is based on two-way recurrent neural network model training;
    根据所述第一输出得到每个句子的分类标签。According to the first output, the classification label of each sentence is obtained.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述根据所述第一输出得到每个句子的分类标签的步骤,包括:The computer-readable storage medium according to claim 16, wherein the step of obtaining the classification label of each sentence according to the first output comprises:
    将每个所述句向量的所述第一输出输入至CRF网络和/或自注意力网络中,得到第二输出;Inputting the first output of each sentence vector into a CRF network and/or a self-attention network to obtain a second output;
    将每个所述句向量的所述第二输出经过SOFTMAX计算,得到每个句子的分类标签。The second output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
  18. 根据权利要求16所述的计算机可读存储介质,其中,所述将所述目标病历文本中 的每个句子映射为固定维度的句向量的步骤,包括:The computer-readable storage medium according to claim 16, wherein the step of mapping each sentence in the target medical record text to a sentence vector of a fixed dimension comprises:
    将所述目标病历文本中的每个句子输入至神经网络中;Input each sentence in the target medical record text into the neural network;
    通过所述神经网络的encoder将每个所述句子映射为固定维度的句向量。The encoder of the neural network maps each sentence to a sentence vector of a fixed dimension.
  19. 根据权利要求16所述的计算机可读存储介质,其中,所述将每个所述电子病历子文本通过预设规则引入上下文,得到目标病历文本的步骤,包括:15. The computer-readable storage medium according to claim 16, wherein the step of introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text comprises:
    检测各个所述电子病历子文本在所述电子病历文本中的位置;Detecting the position of each sub-text of the electronic medical record in the electronic medical record text;
    当所述电子病历子文本在所述电子病历文本的位置为开始时,在所述电子病历子文本的截断处引入下一个电子病历子文本中开始部分的预设个数的句子;When the electronic medical record sub-text starts at the position of the electronic medical record text, introduce a preset number of sentences at the beginning of the next electronic medical record sub-text at the truncation of the electronic medical record sub-text;
    当所述电子病历文本在所述电子病历文本的位置为中间时,在所述电子病历子文本的开始截断处引入上一个电子病历子文本的结尾部分预设个数的句子,在所述电子病历子文本的结尾截断处引入下一个电子病历子文本的开始部分的预设个数的句子;When the electronic medical record text is in the middle of the position of the electronic medical record text, introduce a preset number of sentences at the end of the last electronic medical record sub-text at the beginning of the electronic medical record sub-text. The truncation at the end of the medical record sub-text introduces the preset number of sentences at the beginning of the next electronic medical record sub-text;
    当所述电子病历子文本在所述电子病历文本的位置为结尾时,在所述电子病历子文本的截断处引入上一个电子病历子文本中结尾部分的预设个数的句子。When the electronic medical record sub-text ends at the position of the electronic medical record text, a preset number of sentences in the ending part of the last electronic medical record sub-text are introduced at the truncation of the electronic medical record sub-text.
  20. 根据权利要求16所述的计算机可读存储介质,其中,所述检测所述电子病历文本的句子数是否超过预设阀值的步骤之后,包括:15. The computer-readable storage medium according to claim 16, wherein after the step of detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold, the method comprises:
    若不超过,将所述电子病历文本中的每个句子映射为固定维度的句向量;If it does not exceed, map each sentence in the electronic medical record text to a sentence vector with a fixed dimension;
    将每个所述电子病历文本中的所述句向量按照顺序输入至所述分类模型中进行计算,得到第三输出;Input the sentence vectors in each electronic medical record text into the classification model in order for calculation to obtain a third output;
    将每个所述句向量的所述第三输出经过SOFTMAX计算,得到每个句子的分类标签。The third output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
PCT/CN2020/125146 2020-09-04 2020-10-30 Method and apparatus for electronic medical record structuring, computer device and storage medium WO2021159759A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010922768.X 2020-09-04
CN202010922768.XA CN112016279B (en) 2020-09-04 2020-09-04 Method, device, computer equipment and storage medium for structuring electronic medical record

Publications (1)

Publication Number Publication Date
WO2021159759A1 true WO2021159759A1 (en) 2021-08-19

Family

ID=73517190

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/125146 WO2021159759A1 (en) 2020-09-04 2020-10-30 Method and apparatus for electronic medical record structuring, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN112016279B (en)
WO (1) WO2021159759A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627564A (en) * 2021-08-23 2021-11-09 李永鑫 Deep learning-based CT medical image processing model training method and diagnosis and treatment system
CN114861630A (en) * 2022-05-10 2022-08-05 马上消费金融股份有限公司 Information acquisition and related model training method and device, electronic equipment and medium
CN116525125A (en) * 2023-07-04 2023-08-01 之江实验室 Virtual electronic medical record generation method and device
CN117854713A (en) * 2024-03-06 2024-04-09 之江实验室 Method for training traditional Chinese medicine syndrome waiting diagnosis model and method for recommending information

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112562809A (en) * 2020-12-15 2021-03-26 贵州小宝健康科技有限公司 Method and system for auxiliary diagnosis based on electronic medical record text
CN112820367B (en) * 2021-01-11 2023-06-30 平安科技(深圳)有限公司 Medical record information verification method and device, computer equipment and storage medium
CN112883712B (en) * 2021-02-05 2023-05-02 中国人民解放军南部战区总医院 Intelligent input method and device for electronic medical record
CN113836292B (en) * 2021-09-15 2024-01-09 灵犀量子(北京)医疗科技有限公司 Structuring method, system, device and medium for biomedical literature abstract
CN115359867B (en) * 2022-09-06 2024-02-02 中国电信股份有限公司 Electronic medical record classification method, device, electronic equipment and storage medium
CN116013503B (en) * 2022-12-27 2024-02-20 北京大学长沙计算与数字经济研究院 Dental treatment plan determining method, electronic equipment and storage medium
CN116386800B (en) * 2023-06-06 2023-08-18 神州医疗科技股份有限公司 Medical record data segmentation method and system based on pre-training language model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140278547A1 (en) * 2013-03-14 2014-09-18 Opera Solutions, Llc System and Method For Healthcare Outcome Predictions Using Medical History Categorical Data
CN106980608A (en) * 2017-03-16 2017-07-25 四川大学 A kind of Chinese electronic health record participle and name entity recognition method and system
CN107578798A (en) * 2017-10-26 2018-01-12 北京康夫子科技有限公司 The processing method and system of electronic health record
CN110046252A (en) * 2019-03-29 2019-07-23 北京工业大学 A kind of medical textual hierarchy method based on attention mechanism neural network and knowledge mapping
CN111177309A (en) * 2019-12-05 2020-05-19 宁波紫冬认知信息科技有限公司 Medical record data processing method and device
CN111191668A (en) * 2018-11-15 2020-05-22 零氪科技(北京)有限公司 Method for identifying disease content in medical record text

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7058887B2 (en) * 2002-03-07 2006-06-06 International Business Machines Corporation Audio clutter reduction and content identification for web-based screen-readers
CN108536754A (en) * 2018-03-14 2018-09-14 四川大学 Electronic health record entity relation extraction method based on BLSTM and attention mechanism
CN111563399B (en) * 2019-02-14 2023-04-28 阿里巴巴集团控股有限公司 Method and device for obtaining structured information of electronic medical record
CN110032648B (en) * 2019-03-19 2021-05-07 微医云(杭州)控股有限公司 Medical record structured analysis method based on medical field entity
CN110277149A (en) * 2019-06-28 2019-09-24 北京百度网讯科技有限公司 Processing method, device and the equipment of electronic health record
CN111540468B (en) * 2020-04-21 2023-05-16 重庆大学 ICD automatic coding method and system for visualizing diagnostic reasons

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140278547A1 (en) * 2013-03-14 2014-09-18 Opera Solutions, Llc System and Method For Healthcare Outcome Predictions Using Medical History Categorical Data
CN106980608A (en) * 2017-03-16 2017-07-25 四川大学 A kind of Chinese electronic health record participle and name entity recognition method and system
CN107578798A (en) * 2017-10-26 2018-01-12 北京康夫子科技有限公司 The processing method and system of electronic health record
CN111191668A (en) * 2018-11-15 2020-05-22 零氪科技(北京)有限公司 Method for identifying disease content in medical record text
CN110046252A (en) * 2019-03-29 2019-07-23 北京工业大学 A kind of medical textual hierarchy method based on attention mechanism neural network and knowledge mapping
CN111177309A (en) * 2019-12-05 2020-05-19 宁波紫冬认知信息科技有限公司 Medical record data processing method and device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627564A (en) * 2021-08-23 2021-11-09 李永鑫 Deep learning-based CT medical image processing model training method and diagnosis and treatment system
CN114861630A (en) * 2022-05-10 2022-08-05 马上消费金融股份有限公司 Information acquisition and related model training method and device, electronic equipment and medium
CN116525125A (en) * 2023-07-04 2023-08-01 之江实验室 Virtual electronic medical record generation method and device
CN116525125B (en) * 2023-07-04 2023-09-19 之江实验室 Virtual electronic medical record generation method and device
CN117854713A (en) * 2024-03-06 2024-04-09 之江实验室 Method for training traditional Chinese medicine syndrome waiting diagnosis model and method for recommending information
CN117854713B (en) * 2024-03-06 2024-06-04 之江实验室 Method for training traditional Chinese medicine syndrome waiting diagnosis model and method for recommending information

Also Published As

Publication number Publication date
CN112016279B (en) 2023-11-14
CN112016279A (en) 2020-12-01

Similar Documents

Publication Publication Date Title
WO2021159759A1 (en) Method and apparatus for electronic medical record structuring, computer device and storage medium
WO2020220545A1 (en) Long short-term memory model-based disease prediction method and apparatus, and computer device
Połap et al. Blockchain technology and neural networks for the internet of medical things
CN109670727B (en) Crowd-sourcing-based word segmentation annotation quality evaluation system and evaluation method
WO2021159761A1 (en) Pathological data analysis method and apparatus, and computer device and storage medium
CN108899064A (en) Electronic health record generation method, device, computer equipment and storage medium
WO2022041722A1 (en) Hospital guidance data acquisition method and apparatus, and computer device and storage medium
WO2022095434A1 (en) Auto-encoder-based data anomaly identification method and apparatus and computer device
CN112036749B (en) Method, device and computer equipment for identifying risk users based on medical data
CN110875093A (en) Treatment scheme processing method, device, equipment and storage medium
CN112287068B (en) Artificial intelligence-based inquiry dialogue data processing method and device
CN112035611B (en) Target user recommendation method, device, computer equipment and storage medium
CN111710383A (en) Medical record quality control method and device, computer equipment and storage medium
WO2021139282A1 (en) Medical field knowledge graph construction method and apparatus, device and storage medium
KR102311398B1 (en) Mobile based self-oral examination device
WO2022057309A1 (en) Lung feature recognition method and apparatus, computer device, and storage medium
CN110570916A (en) diagnosis assistance method, system, device and storage medium
WO2021155684A1 (en) Gene-disease relationship knowledge base construction method and apparatus, and computer device
WO2021211964A1 (en) Medical screening entry
CN112530550A (en) Image report generation method and device, computer equipment and storage medium
Cole Splitting hairs? Evaluating ‘split testimony’as an approach to the problem of forensic expert evidence.
CN115794958A (en) Medical data sharing method, device and system based on block chain
WO2021139271A1 (en) Fm model based method and apparatus for predicting medical hot spot, and computer device
CN110648754A (en) Department recommendation method, device and equipment
CN116612879A (en) Diagnostic result prediction method, diagnostic result prediction device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20919191

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20919191

Country of ref document: EP

Kind code of ref document: A1