WO2021159759A1

WO2021159759A1 - Method and apparatus for electronic medical record structuring, computer device and storage medium

Info

Publication number: WO2021159759A1
Application number: PCT/CN2020/125146
Authority: WO
Inventors: 周晓峰
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-09-04
Filing date: 2020-10-30
Publication date: 2021-08-19
Also published as: CN112016279A; CN112016279B

Abstract

A method and an apparatus for electronic medical record structuring, a computer device, and a storage medium, relating to the field of artificial intelligence, and for use in the field of smart medicine. The method comprises: acquiring an electronic medical record text and the number of sentences in the electronic medical record text (S1); detecting whether the number of sentences in the electronic medical record text surpasses a pre-set threshold (S2); if the number of sentences surpasses the threshold, then truncating the electronic medical record text to obtain a plurality of electronic medical record sub-texts (S3); incorporating each electronic medical record sub-text into preceding and following texts by means of a pre-set rule to obtain a target medical record text (S4); mapping each sentence in the target medical record text as a fixed-dimensional sentence vector (S5); inputting each sentence vector of the target medical record text sequentially into a classification model for calculation to obtain a first output; the classification model being constructed on the basis of bidirectional recurrent neural network training (S6); on the basis of the first output, obtaining a classification tag for each sentence (S7). The present method is able to improve the accuracy of sentence structuring at truncation sites.

Description

Electronic medical record structuring method, device, computer equipment and storage medium

This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on September 4, 2020, the application number is 202010922768.X, and the invention title is "Electronic Medical Record Structured Method, Device, Computer Equipment and Storage Medium", all of which The content is incorporated in this application by reference.

Technical field

This application relates to the technical field of intelligent decision-making, and in particular to a method, device, computer equipment, and storage medium for structuring electronic medical records.

Background technique

The medical record is the original record of the patient's diagnosis and treatment in the hospital. It contains the home page, the course record, the examination results, the doctor's order, the operation record, the nursing record and so on. Electronic medical records not only refer to static medical record information, but also include related services provided. Electronic medical records are information about individuals' life-long health status and medical care behaviors that are managed electronically, and involve all process information in the collection, storage, transmission, processing, and utilization of patient information. The structuring of electronic medical records can efficiently extract the key information in the medical records by extracting the disease entities, drug entities, body parts entities, etc. from the electronic medical records through the neural network structure, effectively assisting doctors in core data analysis and data Search. The invention realizes that the existing electronic medical records have different lengths. When the electronic medical records are too long, they need to be truncated. However, because the truncation process is relatively random, it may cause the data at the truncation site to lose some context information, affecting the surrounding area of the truncation site. The accuracy of the structure of the sentences.

technical problem

The main purpose of this application is to provide an electronic medical record structuring method, device, computer equipment, and storage medium to solve the problem that the truncation of the electronic medical record affects the accuracy of the structure of the sentence around the truncation.

Technical solutions

In order to achieve the above objective, this application provides a method for structuring an electronic medical record, which includes the following steps:

Acquiring the electronic medical record text and the number of sentences in the electronic medical record text;

Detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold;

If it exceeds, the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts;

Introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;

Mapping each sentence in the target medical record text to a sentence vector of a fixed dimension;

The sentence vector in each target medical record text is input into a classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, the classification The model is based on two-way recurrent neural network model training;

According to the first output, the classification label of each sentence is obtained.

This application also provides an electronic medical record structuring device, including:

The first obtaining unit is used to obtain the electronic medical record text and the number of sentences in the electronic medical record text;

The detection unit is configured to detect whether the number of sentences in the electronic medical record text exceeds a preset threshold;

The first truncation unit is used for truncating the electronic medical record text if it exceeds, to obtain multiple electronic medical record sub-texts;

The first introduction unit is used to introduce each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;

The first mapping unit is used to map each sentence in the target medical record text to a sentence vector of a fixed dimension;

The first calculation unit is configured to input the sentence vector in each target medical record text into the classification model for calculation according to the sequence of the sentence corresponding to the sentence vector in the target medical record text to obtain the first Output; wherein, the classification model is based on a two-way recurrent neural network model training;

The second calculation unit is configured to obtain the classification label of each sentence according to the first output.

The present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the steps of a method for structuring an electronic medical record are implemented:

This application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of an electronic medical record structuring method are realized:

Beneficial effect

The electronic medical record structuring method, device, computer equipment and storage medium provided in this application introduce a part of the context at the truncation point according to preset rules, and input the introduced context and the truncated electronic medical record text into the classification model together. The classification model Based on two-way cyclic neural network training, it can extract contextual information, and then calculate the classification of each sentence through SOFTMAX, which can effectively improve the structural accuracy of the sentence around the truncation.

Description of the drawings

FIG. 1 is a schematic diagram of the steps of a method for structuring an electronic medical record in an embodiment of the present application;

2 is a structural block diagram of an electronic medical record structuring device in an embodiment of the present application;

FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.

The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

The best mode of the present invention

In order to make the purpose, technical solutions, and advantages of this application clearer and clearer, the following further describes the application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.

1, an embodiment of the present application provides a method for structuring an electronic medical record, including:

Step S1, obtaining the electronic medical record text and the number of sentences in the electronic medical record text;

Step S2, detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold;

Step S3, if it exceeds, the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts;

Step S4, introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;

Step S5: Map each sentence in the target medical record text to a sentence vector of a fixed dimension;

Step S6, the sentence vector in each target medical record text is input into the classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, The classification model is based on training of a bidirectional cyclic neural network model;

Step S7: Obtain the classification label of each sentence according to the first output.

In this embodiment, as described in step S1 above, the electronic medical record text is acquired, and some preprocessing can be performed on the acquired electronic medical record text, such as text preprocessing and data cleaning through tools such as numpy, pandas, and jieba, including Chinese word segmentation, Remove stop words, remove useless symbols, etc., and desensitize the privacy in the electronic medical record text, and then remove patient privacy. Privacy includes: name, bed number, hospital number, address and other key private information that can be easily identified by others , To obtain the number of sentences in the electronic medical record text after the above processing.

As mentioned in the above steps S2-S3, due to the limited length supported by the classification model, when the number of sentences in the electronic medical record text exceeds the preset threshold, the electronic medical record text needs to be truncated, so that the text in the truncated electronic medical record text Sentences can be input into the classification model.

As mentioned in step S4 above, since the electronic medical record text is truncated, the context is introduced for each electronic medical record sub-text according to preset rules. For example, if one electronic medical record text is truncated into three electronic medical record sub-texts, they are as follows: The first electronic medical record sub-text, the second electronic medical record sub-text and the third electronic medical record sub-text, a part of the sentence in the second electronic medical record sub-text is introduced at the truncation of the first electronic medical record sub-text, in the second electronic medical record sub-text Introduce a part of the sentence of the first electronic medical record sub-text at the beginning of the truncation, introduce a part of the sentence of the third electronic medical record sub-text at the end of the second electronic medical record sub-text, and introduce the first sentence of the third electronic medical record sub-text at the truncation of the third electronic medical record sub-text 2. A part of the sentence in the sub-text of the electronic medical record.

As described in step S5 above, each sentence in the target medical record text is mapped to a sentence vector of a fixed dimension. Specifically, the encoder can be passed through a neural network (convolutional neural network, recurrent neural network, transormer, etc.) (Encoder), by mapping the sentence to a vector of fixed dimensions, we can get the vector representation of a single sentence through the neural network. In this way, each sentence in the electronic medical record text is input into the neural network, and the vector representation of each sentence can be obtained, so that a complete electronic medical record text can be represented by the sentence vectors of all sentences.

As mentioned in step S6 above, the sentences in the medical record are not independent of each other but context-related. For example, the part describing the treatment process is usually composed of multiple sentences, and the context of a sentence describing the treatment is also the probability of describing the treatment process It is more likely than describing the user’s past medical history. Therefore, only a single sentence classification of the text will not achieve good results. It is necessary to include all the context information and input the sentence vectors into the classification model in order. This classification The model is trained based on a two-way cyclic neural network model. After the forward and backward calculations of the classification model, each sentence can better obtain contextual information and effectively improve the accuracy of classification. Specifically, the classification model can Each sentence is classified into one of basic information, personal history, family history, past history, current medical history, chief complaint, examination, diagnosis, treatment, summary, and others.

As described in step S7 above, the classification label of each sentence is obtained according to the first output. Specifically, the first output of each sentence vector is calculated by SOFTMAX, and SOFTMAX can map a K-dimensional arbitrary real number vector into Another K-dimensional real number vector, where each element in the vector has a value between (0, 1). The function expression of SOFTMAX is:

Among them, K represents the number of categories, j represents a category in K categories, j ∈ (0, K], z _j represents the value of the category. After the above calculation, the value of each sentence in each category is obtained , Select the category with the largest value as the classification label of the sentence.

In this embodiment, a part of the context is introduced at the truncation according to the preset rules, and the introduced context and the truncated electronic medical record text are input together into the classification model to obtain the first output. The classification model is based on a two-way cyclic neural network training. The context information can be extracted, and the classification label of each sentence can be obtained according to the first output, which can effectively improve the structural accuracy of the sentence at the truncation of the electronic medical record.

In an embodiment, the step S7 of the step of obtaining the classification label of each sentence according to the first output includes:

Step S71: Input the first output of each sentence vector into a CRF (conditional random field, conditional random field) network and/or a self-attention network to obtain a second output;

In step S72, the second output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.

In this embodiment, as described in the above steps S71-S72, inputting the first output into the CRF network and/or the self-attention network can further improve the influence of the context information of the classification model and strengthen the contextual connection between sentences. In other embodiments, the SOFTMAX calculation can be performed between the first output to obtain the classification label of each sentence.

In an embodiment, the step S5 of mapping each sentence in the target medical record text to a sentence vector of a fixed dimension includes:

Step S51, input each sentence in the target medical record text into the neural network;

Step S52: Map each sentence to a sentence vector of a fixed dimension through the encoder of the neural network.

In this embodiment, through the encoder of a neural network (which may be a convolutional neural network, a cyclic neural network, a transformer, etc.), a sentence is mapped to a vector of a fixed dimension, and we can obtain a vector representation of a single sentence through the neural network. Take the transformer model as an example. The encoder of the transformer model is composed of N=6 layers, and each layer contains two sub-layers. The first sub-layer is the multi-head attention layer. , The second is a simple fully connected layer. A residual connection is used between each sub-layer layer. According to resNet, we know that the residual connection is actually: H(x)=F(x)+x; therefore, the output of each sub-layer is : LayerNorm(x+Sublayer(x)), each sample in LayerNorm has a different mean and variance. The dimensions of the input and output of each Layer are consistent. In this way, each sentence in the medical record data is input into the transformer model, and the vector representation of each sentence can be obtained, so that a complete electronic medical record text can be represented by the sentence vectors of all sentences.

In an embodiment, the step S4 of introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text includes:

Step S41, detecting the position of each sub-text of the electronic medical record in the text of the electronic medical record;

Step S42, when the electronic medical record sub-text starts at the position of the electronic medical record text, introduce a preset number of sentences at the beginning of the next electronic medical record sub-text at the truncation of the electronic medical record sub-text;

Step S43: When the electronic medical record text is in the middle of the position of the electronic medical record text, introduce a preset number of sentences at the end of the last electronic medical record sub-text at the beginning of the electronic medical record sub-text. The ending truncation of the electronic medical record sub-text introduces a preset number of sentences at the beginning of the next electronic medical record sub-text;

Step S44, when the electronic medical record sub-text ends at the position of the electronic medical record text, introduce a preset number of sentences at the end of the last electronic medical record sub-text at the truncation of the electronic medical record sub-text.

In this embodiment, a part of sentences is introduced for each electronic medical record sub-text. For example, the number of sentences in an electronic medical record text is 120, and the classification model can only support 50 sentences at a time, and the electronic self-medical record sub-texts can be evened according to the number of sentences. If divided into 4 parts, each with 30 sentences, the end of the first part can be introduced into the first 10 sentences at the beginning of the second part to form the first target medical record text; the end of the first part can be introduced at the beginning of the second part At the end of the second part, the first 10 sentences at the beginning of the third part are introduced to form the second target medical record sub-file. The specific number of sentences introduced in each electronic medical record sub-text can be set in advance according to needs. Certainly. In this embodiment, a sentence in the context is introduced for each electronic medical record sub-text, and then input to the classification model for classification, and the accuracy of classification of each sentence is improved through the connection between the contexts.

In an embodiment, after the step S2 of detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold, the method includes:

Step S2A, if it does not exceed, map each sentence in the electronic medical record text to a sentence vector of a fixed dimension;

Step S2B, input the sentence vectors in each electronic medical record text into the classification model in order for calculation to obtain a third output;

In step S2C, the third output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.

In this embodiment, when the number of sentences in the electronic medical record text does not exceed the preset threshold, the sentence vector of each sentence is directly input into the classification model in order for calculation, and then the classification label of each sentence is calculated by the SOFTMAX function. .

In an embodiment, the sentence vector in each target medical record text is input into the classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain the first Before step S6 of an output, it includes:

Step S6a: Obtain case samples in the training data set, and each sentence in the medical record sample has a correct classification label;

Step S6b, truncating the medical record sample to obtain multiple medical record sub-samples;

In step S6c, each of the medical record sub-samples is introduced into the context through a preset rule to obtain a target medical record sample;

Step S6d, mapping each sentence in the target medical record sample to a sentence vector of a fixed dimension;

Step S6e, input the sentence vectors in each target medical record text into the bidirectional recurrent neural network model in order for calculation to obtain training output;

Step S6f, calculating the training output through SOFTMAX to obtain the predicted output;

Step S6g: Calculate the loss value of each sentence in the medical record sub-sample by using a loss function;

In step S6h, the classification model parameters are determined according to the loss value, and the training of the classification model is completed.

In this embodiment, as described in step S6g above, the loss value of each sentence in the medical record subsample is calculated. The medical record subsample introduces context according to certain rules, and the context follows each sentence in the medical record subsample and is input to the bidirectional recurrent neural network. , Extract the context information, and get the output of each sentence. The output of each sentence is calculated through SOFTMAX to obtain the expected output of each sentence, and then only the loss value of each sentence in the medical record subsample is calculated through the loss function, and the smallest loss value is selected The corresponding model parameters are used as the final model parameters to complete the training of the classification model. In this embodiment, each medical record sub-sample introduces context, but the introduced context only provides context information, and does not participate in the calculation of the loss value and the final classification. Specifically, through the cross entropy function

Calculate the loss value of each sentence in the medical record subsample, where y is the expected output of each sentence in the medical record subsample,

For its correct classification label.

The electronic medical record structuring method provided in this application can be used in the blockchain field. The trained classification model is stored in the blockchain network. At the same time, the electronic medical record text can also be stored in the blockchain network. The blockchain is New application modes of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer. Blockchain Network (Blockchain Network) refers to the collection of a series of nodes that incorporate new blocks into the blockchain through consensus.

The underlying platform of the blockchain can include processing modules such as user management, basic services, smart contracts, and operation monitoring. Among them, the user management module is responsible for the identity information management of all blockchain participants, including the maintenance of public and private key generation (account management), key management, and maintenance of the correspondence between the user’s real identity and the blockchain address (authority management), etc. In the case of authorization, supervise and audit certain real-identity transactions, and provide risk control rule configuration (risk control audit); basic service modules are deployed on all blockchain node devices to verify the validity of business requests, After completing the consensus on the valid request, it is recorded on the storage. For a new business request, the basic service first performs interface adaptation analysis and authentication processing (interface adaptation), and then encrypts the business information through the consensus algorithm (consensus management), After encryption, it is completely and consistently transmitted to the shared ledger (network communication), and recorded and stored; the smart contract module is responsible for contract registration and issuance, contract triggering and contract execution. Developers can define the contract logic through a certain programming language and publish it to On the blockchain (contract registration), according to the logic of the contract terms, call keys or other events to trigger execution, complete the contract logic, and also provide the function of contract upgrade and cancellation; the operation monitoring module is mainly responsible for the deployment of the product release process , Configuration modification, contract settings, cloud adaptation, and visual output of real-time status during product operation, such as: alarms, monitoring network conditions, monitoring node equipment health status, etc.

The structuring method, device, computer equipment, and storage medium of electronic medical records provided in this application can be applied in the field of smart medical care to accelerate the construction of digital medical care, thereby promoting the construction of smart cities.

2, an embodiment of the present application further provides an electronic medical record structuring device, including:

The first obtaining unit 10 is configured to obtain the electronic medical record text and the number of sentences in the electronic medical record text;

The detection unit 20 is configured to detect whether the number of sentences in the electronic medical record text exceeds a preset threshold;

The first truncation unit 30 is used for truncating the electronic medical record text if it exceeds, to obtain a plurality of electronic medical record sub-texts;

The first introduction unit 40 is configured to introduce each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;

The first mapping unit 50 is configured to map each sentence in the target medical record text to a sentence vector of a fixed dimension;

The first calculation unit 60 is configured to input the sentence vector in each target medical record text into the classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain the first One output; wherein, the classification model is based on a two-way recurrent neural network model training;

The second calculation unit 70 is configured to obtain the classification label of each sentence according to the first output.

In an embodiment, the second calculation unit 70 includes:

A first input subunit, configured to input the first output of each sentence vector into a CRF network and/or a self-attention network to obtain a second output;

The calculation subunit is configured to perform SOFTMAX calculation on the second output of each sentence vector to obtain the classification label of each sentence.

In an embodiment, the first mapping unit 50 includes:

The second input subunit is used to input each sentence in the target medical record text into the neural network;

The mapping subunit is used to map each sentence to a sentence vector of a fixed dimension through the encoder of the neural network.

In an embodiment, the first introduction unit 40 includes:

The detection subunit is used to detect the position of each of the electronic medical record sub-texts in the electronic medical record text;

The first introduction sub-unit is used for when the electronic medical record sub-text starts at the position of the electronic medical record text, introduce the preset of the beginning part of the next electronic medical record sub-text at the truncation of the electronic medical record sub-text Number of sentences

The second introduction sub-unit is used to introduce a preset end portion of the previous electronic medical record sub-text at the beginning and truncation of the electronic medical record sub-text when the electronic medical record text is in the middle of the position of the electronic medical record text Number of sentences, introducing a preset number of sentences at the beginning of the next electronic medical record sub-text at the end truncation of the electronic medical record sub-text;

The third introduction sub-unit is used to introduce the preset ending part of the last electronic medical record sub-text at the truncation of the electronic medical record sub-text when the electronic medical record sub-text ends at the position of the electronic medical record text Number of sentences.

In an embodiment, the electronic medical record structuring device further includes:

The second mapping unit is used to map each sentence in the electronic medical record text to a sentence vector of a fixed dimension if it is not exceeded;

The third calculation unit is configured to input the sentence vectors in each electronic medical record text into the classification model in order for calculation to obtain a third output;

The fourth calculation unit is configured to perform SOFTMAX calculation on the third output of each sentence vector to obtain the classification label of each sentence.

The second acquiring unit is used to acquire case samples in the training data set, where each sentence in the medical record sample has a correct classification label;

The second truncation unit is used for truncating the medical record sample to obtain multiple medical record sub-samples;

The second introduction unit is used to introduce each of the medical record sub-samples into the context through preset rules to obtain the target medical record sample;

The third mapping unit is used to map each sentence in the target medical record sample to a sentence vector of a fixed dimension;

A fifth calculation unit, configured to sequentially input the sentence vectors in each target medical record text into the bidirectional cyclic neural network model for calculation to obtain training output;

The sixth calculation unit is used to calculate the training output through SOFTMAX to obtain the predicted output;

The seventh calculation unit is used to calculate the loss value of each sentence in the medical record sub-sample by using a loss function;

The determining unit is used to determine the parameters of the classification model according to the loss value to complete the training of the classification model.

In an embodiment, the seventh calculation unit includes:

The calculation subunit is used to calculate the loss value of each sentence in the medical record subsample through a cross entropy function; the formula of the cross entropy function is:

The y is the predicted output,

For the correct classification label.

In this embodiment, please refer to the above method embodiment for the specific implementation of the above-mentioned units, sub-units, and modules, which will not be repeated here.

Referring to FIG. 3, an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store electronic medical record data and so on. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a structuring method of electronic medical records.

Those skilled in the art can understand that the structure shown in FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.

An embodiment of the present application also provides a computer-readable storage medium. The above-mentioned storage medium may be a non-volatile storage medium or a volatile storage medium. A computer program is stored thereon, and when the computer program is executed by a processor, a method for structuring an electronic medical record is realized.

In summary, the electronic medical record structuring method, device, computer equipment, and storage medium provided in the embodiments of this application introduce a part of the context at the truncation place according to preset rules, and combine the introduced context and the truncated electronic medical record text Enter the classification model together. The classification model is based on two-way recurrent neural network training, which can extract context information, and then calculate the classification of each sentence through SOFTMAX, which can effectively improve the accuracy of the structure of the sentence around the truncation.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored and a non-volatile computer readable storage. In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

It should be noted that, in this article, the terms "including", "including" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.

The above are only the preferred embodiments of this application, and do not therefore limit the scope of the patent of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of this application.

Claims

A method for structuring electronic medical records, which includes the following steps:

Acquiring the electronic medical record text and the number of sentences in the electronic medical record text;

Detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold;

If it exceeds, the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts;

Introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;

Mapping each sentence in the target medical record text to a sentence vector of a fixed dimension;

The sentence vector in each target medical record text is input into a classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, the classification The model is based on two-way recurrent neural network model training;

According to the first output, the classification label of each sentence is obtained.
The method for structuring electronic medical records according to claim 1, wherein the step of obtaining the classification label of each sentence according to the first output comprises:

Inputting the first output of each sentence vector into a CRF network and/or a self-attention network to obtain a second output;

The second output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
The method for structuring electronic medical records according to claim 1, wherein the step of mapping each sentence in the target medical record text to a sentence vector of a fixed dimension comprises:

Input each sentence in the target medical record text into the neural network;

The encoder of the neural network maps each sentence to a sentence vector of a fixed dimension.
The method for structuring an electronic medical record according to claim 1, wherein the step of introducing each sub-text of the electronic medical record into the context through a preset rule to obtain the target medical record text comprises:

Detecting the position of each sub-text of the electronic medical record in the electronic medical record text;

When the electronic medical record sub-text starts at the position of the electronic medical record text, introduce a preset number of sentences at the beginning of the next electronic medical record sub-text at the truncation of the electronic medical record sub-text;

When the electronic medical record text is in the middle of the position of the electronic medical record text, introduce a preset number of sentences at the end of the last electronic medical record sub-text at the beginning of the electronic medical record sub-text. The truncation at the end of the medical record sub-text introduces the preset number of sentences at the beginning of the next electronic medical record sub-text;

When the electronic medical record sub-text ends at the position of the electronic medical record text, a preset number of sentences in the ending part of the last electronic medical record sub-text are introduced at the truncation of the electronic medical record sub-text.
The method for structuring electronic medical records according to claim 1, wherein after the step of detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold, the method comprises:

If it does not exceed, map each sentence in the electronic medical record text to a sentence vector with a fixed dimension;

Inputting the sentence vectors in each electronic medical record text into the classification model in order for calculation to obtain a third output;

The third output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
The method for structuring an electronic medical record according to claim 1, wherein the sentence vector in each of the target medical record text is input in the order of the sentence corresponding to the sentence vector in the target medical record text Before performing calculations in the classification model to obtain the first output, the steps include:

Obtain case samples in the training data set, where each sentence in the medical record sample has a correct classification label;

Truncating the medical record sample to obtain multiple medical record sub-samples;

Introducing each of the medical record sub-samples into the context through preset rules to obtain a target medical record sample; mapping each sentence in the target medical record sample to a sentence vector of a fixed dimension;

Inputting the sentence vectors in each target medical record text into a bidirectional cyclic neural network model in order for calculation to obtain a training output;

Calculating the training output through SOFTMAX to obtain a prediction output;

Calculate the loss value of each sentence in the medical record sub-sample by using a loss function;

The classification model parameters are determined according to the loss value, and the training of the classification model is completed.
The method for structuring an electronic medical record according to claim 6, wherein the step of calculating the loss value of each sentence in the medical record sub-sample by using a loss function comprises:

The loss value of each sentence in the medical record sub-sample is calculated by a cross entropy function; the formula of the cross entropy function is:
The y is the predicted output,
For the correct classification label.
An electronic medical record structured device, which includes:

The first obtaining unit is used to obtain the electronic medical record text and the number of sentences in the electronic medical record text;

The detection unit is configured to detect whether the number of sentences in the electronic medical record text exceeds a preset threshold;

The first truncation unit is used for truncating the electronic medical record text if it exceeds, to obtain multiple electronic medical record sub-texts;

The first introduction unit is used to introduce each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;

The first mapping unit is used to map each sentence in the target medical record text to a sentence vector of a fixed dimension;

The first calculation unit is configured to input the sentence vector in each target medical record text into the classification model for calculation according to the sequence of the sentence corresponding to the sentence vector in the target medical record text to obtain the first Output; wherein, the classification model is based on a two-way recurrent neural network model training;

The second calculation unit is configured to obtain the classification label of each sentence according to the first output.
A computer device includes a memory and a processor, and a computer program is stored in the memory, wherein the steps of a method for structuring an electronic medical record are realized when the processor executes the computer program:

Acquiring the electronic medical record text and the number of sentences in the electronic medical record text;

Detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold;

If it exceeds, the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts;

Introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;

Mapping each sentence in the target medical record text to a sentence vector of a fixed dimension;

The sentence vector in each target medical record text is input into a classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, the classification The model is based on two-way recurrent neural network model training;

According to the first output, the classification label of each sentence is obtained.
The computer device according to claim 9, wherein the step of obtaining the classification label of each sentence according to the first output comprises:

Inputting the first output of each sentence vector into a CRF network and/or a self-attention network to obtain a second output;

The second output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
The computer device according to claim 9, wherein the step of mapping each sentence in the target medical record text to a sentence vector of a fixed dimension comprises:

Input each sentence in the target medical record text into the neural network;

The encoder of the neural network maps each sentence to a sentence vector of a fixed dimension.
The computer device according to claim 9, wherein the step of introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text comprises:

Detecting the position of each sub-text of the electronic medical record in the electronic medical record text;

When the electronic medical record sub-text starts at the position of the electronic medical record text, introduce a preset number of sentences at the beginning of the next electronic medical record sub-text at the truncation of the electronic medical record sub-text;

When the electronic medical record text is in the middle of the position of the electronic medical record text, introduce a preset number of sentences at the end of the last electronic medical record sub-text at the beginning of the electronic medical record sub-text. The truncation at the end of the medical record sub-text introduces the preset number of sentences at the beginning of the next electronic medical record sub-text;

When the electronic medical record sub-text ends at the position of the electronic medical record text, a preset number of sentences in the ending part of the last electronic medical record sub-text are introduced at the truncation of the electronic medical record sub-text.
9. The computer device according to claim 9, wherein after the step of detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold, the method comprises:

If it does not exceed, map each sentence in the electronic medical record text to a sentence vector with a fixed dimension;

Inputting the sentence vectors in each electronic medical record text into the classification model in order for calculation to obtain a third output;

The third output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
9. The computer device according to claim 9, wherein the sentence vector in each of the target medical record text is input to the classification model in the order of the sentence corresponding to the sentence vector in the target medical record text Before the step of calculating in the first output to obtain the first output, it includes:

Obtain case samples in the training data set, where each sentence in the medical record sample has a correct classification label;

Truncating the medical record sample to obtain multiple medical record sub-samples;

Introducing each of the medical record sub-samples into the context through preset rules to obtain a target medical record sample; mapping each sentence in the target medical record sample to a sentence vector of a fixed dimension;

Inputting the sentence vectors in each target medical record text into a bidirectional cyclic neural network model in order for calculation to obtain a training output;

Calculating the training output through SOFTMAX to obtain a prediction output;

Calculate the loss value of each sentence in the medical record sub-sample by using a loss function;

The classification model parameters are determined according to the loss value, and the training of the classification model is completed.
14. The computer device according to claim 14, wherein the step of calculating the loss value of each sentence in the medical record sub-sample by using a loss function comprises:

The loss value of each sentence in the medical record sub-sample is calculated by a cross entropy function; the formula of the cross entropy function is:
The y is the predicted output,
For the correct classification label.
A computer-readable storage medium with a computer program stored thereon, wherein the steps of a method for structuring an electronic medical record are realized when the computer program is executed by a processor:

Acquiring the electronic medical record text and the number of sentences in the electronic medical record text;

Detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold;

If it exceeds, the electronic medical record text is truncated to obtain multiple electronic medical record sub-texts;

Introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text;

Mapping each sentence in the target medical record text to a sentence vector of a fixed dimension;

The sentence vector in each target medical record text is input into a classification model for calculation according to the order of the sentence corresponding to the sentence vector in the target medical record text to obtain a first output; wherein, the classification The model is based on two-way recurrent neural network model training;

According to the first output, the classification label of each sentence is obtained.
The computer-readable storage medium according to claim 16, wherein the step of obtaining the classification label of each sentence according to the first output comprises:

Inputting the first output of each sentence vector into a CRF network and/or a self-attention network to obtain a second output;

The second output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.
The computer-readable storage medium according to claim 16, wherein the step of mapping each sentence in the target medical record text to a sentence vector of a fixed dimension comprises:

Input each sentence in the target medical record text into the neural network;

The encoder of the neural network maps each sentence to a sentence vector of a fixed dimension.
15. The computer-readable storage medium according to claim 16, wherein the step of introducing each of the electronic medical record sub-texts into the context through preset rules to obtain the target medical record text comprises:

Detecting the position of each sub-text of the electronic medical record in the electronic medical record text;

When the electronic medical record sub-text starts at the position of the electronic medical record text, introduce a preset number of sentences at the beginning of the next electronic medical record sub-text at the truncation of the electronic medical record sub-text;

When the electronic medical record text is in the middle of the position of the electronic medical record text, introduce a preset number of sentences at the end of the last electronic medical record sub-text at the beginning of the electronic medical record sub-text. The truncation at the end of the medical record sub-text introduces the preset number of sentences at the beginning of the next electronic medical record sub-text;

When the electronic medical record sub-text ends at the position of the electronic medical record text, a preset number of sentences in the ending part of the last electronic medical record sub-text are introduced at the truncation of the electronic medical record sub-text.
15. The computer-readable storage medium according to claim 16, wherein after the step of detecting whether the number of sentences in the electronic medical record text exceeds a preset threshold, the method comprises:

If it does not exceed, map each sentence in the electronic medical record text to a sentence vector with a fixed dimension;

Input the sentence vectors in each electronic medical record text into the classification model in order for calculation to obtain a third output;

The third output of each sentence vector is calculated by SOFTMAX to obtain the classification label of each sentence.