CN112434756A

CN112434756A - Training method, processing method, device and storage medium of medical data

Info

Publication number: CN112434756A
Application number: CN202011477617.4A
Authority: CN
Inventors: 郑永升; 周世正; 梁平
Original assignee: Hangzhou Yitu Medical Technology Co ltd
Current assignee: Hangzhou Yitu Medical Technology Co ltd
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2021-03-02

Abstract

The present disclosure relates to a training method, a processing method, an apparatus, and a storage medium for medical data, the training method including acquiring medical data defined as a positive case; generating a negative case in response to the acquisition of the positive case; training of the medical data is performed based at least on the negative examples. The processing method comprises the steps of acquiring original medical data; inputting original medical data into a model obtained based on a training method; outputting the target medical data. The device comprises a positive case generation module; a negative case generation module; and a training module. Through the embodiments of the present disclosure, the result that the algorithm output does not conform to the medical logic can be avoided, and the performance of medical data processing is improved.

Description

Training method, processing method, device and storage medium of medical data

Technical Field

The present disclosure relates to the field of medical data intelligent processing technology, and in particular, to a medical data training method, a medical data processing method, a processing apparatus for medical data training, and a computer-readable storage medium.

Background

There are some concepts in medical data that seem literally close but medically need to be distinguished, such as "pericardiotomy" and "pericardiotomy exploratory"; "one coronary artery (aorta) coronary artery bypass graft" and "two coronary artery (aorta) coronary artery bypass graft". If the similarity matching of the keywords is used, good coding effect on the data is difficult to obtain, and even if a deep learning algorithm is used, the original medical data can be wrongly coded on some similar standard names or some similar names are output. The ICD standard table generally comprises English letters and Arabic numerals, standard names (Chinese or English) are recorded in the table, the relation of different standard names can be shown, and related contents are clustered in the whole standard table and are given to similar codes.

Disclosure of Invention

The present disclosure is intended to provide a medical data training method, a medical data processing method, a processing apparatus for medical data training, and a computer-readable storage medium, which can avoid the result that the algorithm output does not conform to the medical logic, and improve the performance of medical data processing.

According to one aspect of the present disclosure, there is provided a method for training medical data, including:

acquiring medical data defined as a positive case;

generating a negative case in response to the acquisition of the positive case;

training of the medical data is performed based at least on the negative examples.

In some embodiments, wherein the acquiring medical data defined as a positive case comprises:

and marking the medical data by combining a standard information table, wherein the marking result exists in the standard information table.

In some embodiments, wherein generating negative examples in response to the obtaining of positive examples comprises:

selecting standard information related to the labeling result in the standard information table according to the labeling result;

and generating a negative example based on the selected standard information and the original information in the positive example.

In some embodiments, the annotation result includes an annotation code representing original information in a positive case;

selecting the standard information related to the labeling result in the standard information table according to the labeling result, wherein the selecting step comprises the following steps:

according to the marking code, determining a standard code related to the marking code in the standard information table;

and selecting the standard code and standard information represented by the standard code.

In some embodiments, the determining, according to the annotation code, the standard code associated with the annotation code in the standard information table includes:

comparing the label code with the standard code;

and taking the standard code which is the same as the label code on the preset digit as the standard code related to the label code.

In some embodiments, the determining, according to the annotation code, the standard code associated with the annotation code in the standard information table further includes:

a preset number of standard codes are extracted.

In some embodiments, wherein the standard information table comprises an ICD-9-CM-3 standard table.

According to one aspect of the present disclosure, a method of processing medical data is provided, wherein,

acquiring original medical data;

inputting raw medical data into a model obtained based on the training method as described above;

outputting the target medical data.

According to one aspect of the present disclosure, there is provided a processing apparatus for medical data training, comprising:

a due case generation module configured for acquiring medical data defined as a due case;

a negative case generation module configured for generating a negative case in response to the obtaining of the positive case;

a training module configured for training of medical data based at least on the negative examples.

According to one aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement:

a method of training based on the above medical data, or

The method for processing medical data according to the above.

The training method of medical data, the processing device for medical data training and the computer-readable storage medium of various embodiments of the present disclosure are provided by acquiring medical data defined as positive examples; generating a negative case in response to the acquisition of the positive case; the medical data is trained at least based on the negative examples, so that the negative examples can be automatically generated based on the generation of positive example samples, the negative example training data is fully utilized, the algorithm model can learn not only the positive example data containing a certain ICD standard word in the original medical data, but also the knowledge scene that the original medical data does not contain the standard word which is closer to the interpretation result and actually does not have reference so as to accord with the design logic of the ICD standard table. According to the method, the internal coding logic is mined through deep learning, the result that the algorithm output does not conform to the medical logic is avoided in the mode, the coding performance of the algorithm model is improved, and the performance can be improved by 1% compared with the method without negative training data, so that the accuracy and the efficiency of medical research and medical diagnosis and treatment are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure, as claimed.

Drawings

In the drawings, which are not necessarily drawn to scale, like reference numerals may designate like components in different views. Like reference numerals with letter suffixes or like reference numerals with different letter suffixes may represent different instances of like components. The drawings illustrate various embodiments generally, by way of example and not by way of limitation, and together with the description and claims, serve to explain the disclosed embodiments.

Fig. 1 shows a flow chart of a method of training medical data to which embodiments of the present disclosure relate;

fig. 2 shows a flow chart of a method of processing medical data to which an embodiment of the present disclosure relates;

fig. 3 shows an architecture diagram of a processing device for medical data training to which embodiments of the present disclosure relate;

FIG. 4 illustrates an example of a portion of an ICD-9-CM-3 standard table to which embodiments of the present disclosure relate;

fig. 5 shows an architecture diagram of a medical data processing apparatus according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described below clearly and completely with reference to the accompanying drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.

Unless otherwise defined, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items.

To maintain the following description of the embodiments of the present disclosure clear and concise, a detailed description of known functions and known components have been omitted from the present disclosure.

The present disclosure relates to training and processing under deep learning models with respect to medical data for accurate characterization of medical information and medical term concepts. There are some concepts in medical data that seem literally close but medically need to be distinguished, such as "pericardiotomy" and "pericardiotomy exploratory"; "one coronary artery (aorta) coronary artery bypass graft" and "two coronary artery (aorta) coronary artery bypass graft". If the similarity matching of the keywords is used, good coding effect on the data is difficult to obtain, and even if a deep learning algorithm is used, the original medical data can be wrongly coded on some similar standard names or some similar names are output. The codes of the ICD standard table generally consist of english letters and arabic numerals, and for example, the codes include "36.1100 (aorta) coronary artery bypass graft of one coronary artery", "37.1200 x010 pericardial blood clot removal" and the like for surgical information. The table not only records standard names (Chinese or English) but also can show the relationship of different standard names, and the whole standard table clusters related contents and gives similar codes.

As one solution, as shown in fig. 1 in combination with fig. 4, an embodiment of the present disclosure provides a training method of medical data, including:

s101: acquiring medical data defined as a positive case;

s102: generating a negative case in response to the acquisition of the positive case;

s103: training of the medical data is performed based at least on the negative examples.

One of the inventive concepts disclosed in the present disclosure is intended to be able to automatically generate negative examples based on the generation of positive example samples, and make full use of negative example training data, so that an algorithm model can learn not only positive example data including a certain ICD standard word in original medical data, but also learn a knowledge scene that the original medical data does not include a standard word that is closer to an interpretation result and is actually not mentioned, so as to meet the design logic of an ICD standard table itself.

The original medical data in the embodiments of the present disclosure, which belongs to the data source, need not be particularly limited, and may be historical data or current real-time data. From the aspect of data format, medical record text data, video data, audio data, etc. can be used as long as medical information that can be contained in the medical record text data can be identified through identification means, such as text recognition (e.g., NLP recognition, OCR recognition, etc.), some medical information such as diagnosis content, operation content, etc. described in the medical record text data can be identified through voice recognition, video image recognition, etc., or medical information content identified through character splitting, word splitting, etc. can be identified. Referring to the ICD-9-CM-3 standard table, the raw medical data of the embodiments of the present disclosure includes content for characterizing medical information, the raw medical data may be labeled, etc. to generate a positive example, the positive example may include the raw medical information in the medical data, and the labeled content of the raw medical information. In response to the generation of the positive examples, the negative examples are automatically generated. Of course, the present disclosure focuses primarily on negative case-based data training, but does not preclude training with respect to defining positive cases. In a specific application scenario, the original medical data of the present disclosure may also be included in medical records and diagnostic books, which include a plurality of or a plurality of diagnostic information and surgical information, and may be used for interpretation of related medical information by manual or machine through labeling or parsing.

In some embodiments, the acquiring of medical data defined as a positive case of the present disclosure includes:

Specifically, the processing of medical data defined as a proper case according to the present disclosure may be implemented by manually labeling the medical data, or by labeling the medical data through machine recognition with a corresponding interpretation capability. For example, labeling is performed by professional personnel, medical experts. Based on the expert annotation results, positive examples of training data for the present disclosure are generated. Each good case can be treated as a binary set in a format referred to as "plain, ICD code | ICD standard name", for example, a binary set in a format labeled "thoracoscopic assisted small-incision right superior lung cancer radical surgery", and 32.4100| thoracoscopic inferior lobular resection "in which the expert contains the record contents" thoracoscopic assisted small-incision right superior lung cancer radical surgery "for the raw data.

In some embodiments, the generating negative examples of the present disclosure in response to the obtaining of positive examples comprises:

Specifically, the present disclosure aims to at least enable intelligent response to generation of positive examples, that is, generation of negative examples, so that only the training method related to the embodiments of the present disclosure can learn not only positive example data including a certain standard word in original medical data, but also a knowledge scene that the original medical data does not include a standard word that is closer to the positive example result and is actually not mentioned. That is, the basic conditions of the negative example of the present disclosure are: the information which is different from the true case is required to be contained, and the difference in the specific training process can be adjusted according to the required precision of the training model. And, the object generating the difference can also be selected according to the measurement standard required by the actual data processing. Embodiments of the present disclosure may automatically generate negative examples from positive examples, or hard-to-divide samples containing multiple negative examples, in a manner that contrasts with an ICD standard information table.

In particular, continuing with the above example, the positive example in this embodiment may be standardized

"thoracoscopic assisted small incision radical right suprapulmonary carcinoma procedure, 32.4100| -thoracoscopic lobectomy"

Wherein, the "thoracoscopic assisted small incision right suprapulmonary carcinoma radical treatment" can be used as the original information of the embodiment, and the "32.4100 | thoracoscopic lobectomy" can be used as the labeling result compared with the ICD-9-CM-3 standard table.

In response to the generation of the positive examples described above, the present embodiment may automatically generate negative examples from the positive examples, as well as in conjunction with the ICD-9-CM-3 standard table.

Further, the labeling result of the present disclosure includes a labeling code representing original information in a positive case;

In particular, based on the positive example

The reference code "32.4100" is included, and in this embodiment, the reference code may be used as an index to select the standard information from the standard information table by referring to the ICD-9-CM-3 standard table.

For example, indexed by "32.4100", one can select from the ICD-9-CM-3 standard table:

32.4100X 002 composite lobectomy under thoracoscope "

32.4101 segmental resection of the lobes of the lung with adjacent lobes under thoracoscopy "

"32.4901 segmental resection of the pulmonary lobes with adjacent pulmonary lobes"

32.4902 lobectomy "

The standard information such as "32.4903 | pulmonary lobe sleeve resection" satisfies the condition that the information and the proper case are encoded closer to each other.

In the embodiment of the present disclosure, a negative example may be specially generated for the positive example through a separate execution step, and the negative example includes the above original information, standard code, and standard information, as follows:

"thoracoscopic assisted small incision right superior lung carcinoma radical treatment, 32.4100X 002| compound lobectomy under thoracoscopic"

"thoracoscopic assisted small incision radical right supralung carcinoma procedure, 32.4101| segmental resection of inferior and adjacent lobes of the lung under thoracoscopic"

"thoracoscopic assisted small incision radical right supralung carcinoma, 32.4901 segmental resection of the lobes of the lung with the adjacent lobes"

Radical operation of thoracoscopic assisted small incision for upper right lung carcinoma, 32.4902 lung lobectomy "

"thoracoscopic assisted small incision radical right supralung carcinoma, 32.4903 Lung lobe Cuff resection"

Through the negative examples, in the embodiment of the disclosure, the hard-to-distinguish samples can be constructed based on the automatically generated negative examples, so that the training samples are provided for the training of the medical data.

comparing the label code with the standard code;

Specifically, in the process of constructing the hard-to-divide sample, the standard information with the same six-bit code, five-bit code, four-bit code and three-bit code as the marking code in the positive example in the ICD-9-CM-3 standard information table can be sequentially searched. For example, when the annotation code of the positive example in this embodiment is 32.4900, the construction of the hard-to-divide sample can be to look up whether there is a code whose first six bits are 32.4900 (i.e., "no calculation is made in six bits) in the ICD-9-CM-3 standard information table, and if there are other codes whose first six bits are 32.4900 in the codes, such as 32.4900x002 and 32.4900x002, the negative examples are generated by these codes. Similarly, the standard information of five-bit code, four-bit code and three-bit code can be selected. With the requirements of actual medical data processing and medical data training scenarios, the number of encoding bits involved in the embodiments of the present disclosure may be preset in a value interval, for example, the upper limit of the number of bits is six bits, and the lower limit is three bits.

a preset number of standard codes are extracted.

Specifically, continuing with the previous example, with a preset number of 10, if there are less than 10 of the first six bits, the step is decreased, and the ICD-9-CM-3 standard information table is searched for codes with 32.490 in the first five bits, such as 32.4901, 32.4902, and the like. And the like until the position where more than 10 negative examples can be generated is satisfied. On the other hand, in the specific implementation process, if it is found in the above process that the number of the first bits is more than 10, for example, the first five bits have satisfied the condition of 10 standard codes, 10 bits may be arbitrarily selected.

The standard information table in the embodiments of the present disclosure includes an ICD-9-CM-3 standard table based on a specific usage mode of a standard code. Of course, the standard table can be expanded to code the surgery operation classification code national clinical edition 2.0 and the disease classification and code national clinical edition 2.0 which are issued in a unified way, and the coding system is constructed based on the international ICD9-CM-3 and ICD10 expansion.

In combination with the above example, the embodiments of the present disclosure perform medical data training based on the above hard-to-classify sample, that is, the expert labeling result and the information of the ICD standard table can be fully utilized, so that the algorithm model not only learns the positive example data in the original text containing a certain ICD standard word, but also can perform binary operation in the format of "thoracoscopic assisted small-incision right superior lung cancer radical surgery, 32.4100| thoracoscopic inferior lobe resection," which is a negative example, at least learning the "thoracoscopic assisted small-incision right superior lung cancer radical surgery" of the original medical data does not contain the "32.4100 | thoracoscopic inferior lobe resection" following the standard result, and the original surgical data does not actually refer to the methods such as "32.4100 × 002| thoracoscopic inferior compound lobular resection", "32.4101 | thoracoscopic inferior lobe with adjacent lobe segmental resection of lung lobe", "32.4901 | lobe with segmental resection of adjacent lung lobe", "or" of adjacent lung lobe segmental resection, "32.4902 lobectomy", "32.4903 lobemias" and the like.

In some embodiments, the training method of the present disclosure may perform training based on the negative examples related to each embodiment, or may combine the negative examples with the positive examples, and perform data training through two learning logic dimensions of the positive examples and the negative examples.

As one of the aspects of the present disclosure, as shown in fig. 2, the present disclosure also provides a method of processing medical data, wherein,

s201: acquiring original medical data;

s202: inputting original medical data into a model obtained based on a training method;

s203: outputting the target medical data.

Specifically, the model of the present embodiment, the construction method and the training method thereof can be implemented by the medical data training method in the above embodiments. The training method may specifically include:

acquiring medical data defined as a positive case;

generating a negative case in response to the acquisition of the positive case;

The target medical data in the embodiment is used as medical data in aspects of building a standard table, finally outputting and recording medical information, assisting medical research and the like

As one of the aspects of the present disclosure, as shown in fig. 3, the present disclosure also provides a processing apparatus for medical data training, including:

In combination with the foregoing example, the negative case generation module of the present disclosure is further configured to:

The negative case generation module of the present disclosure is further configured to:

comparing the label code with the standard code;

Further, it may be configured to: a preset number of standard codes are extracted.

As one of the aspects of the present disclosure, as shown in fig. 5, the present disclosure also provides a processing apparatus of medical data, including:

an acquisition unit configured for acquiring raw medical data;

a processing model, derived based on a training method, for processing the raw medical data.

In some embodiments, the obtaining unit of the present disclosure, which may be an input device, a screen capturing device, a text recognition device, etc., is intended to enable obtaining medical data containing medical information, which may include surgical information, diagnostic information, and codes characterizing the information accordingly.

In some embodiments, the present disclosure relates to a method for training medical data of a processing model, comprising:

acquiring medical data defined as a positive case;

generating a negative case in response to the acquisition of the positive case;

In particular, one of the inventive concepts of the present disclosure is directed to providing medical data by acquiring medical data defined as a positive case; generating a negative case in response to the acquisition of the positive case; the medical data is trained at least based on the negative examples, so that the negative examples can be automatically generated based on the generation of positive example samples, the negative example training data is fully utilized, the algorithm model can learn not only the positive example data containing a certain ICD standard word in the original medical data, but also the knowledge scene that the original medical data does not contain the standard word which is closer to the interpretation result and actually does not have reference so as to accord with the design logic of the ICD standard table. According to the method, the internal coding logic is mined through deep learning, the result that the algorithm output does not conform to the medical logic is avoided in the mode, the coding performance of the algorithm model is improved, and the performance can be improved by 1% compared with the method without negative training data, so that the accuracy and the efficiency of medical research and medical diagnosis and treatment are improved.

As one of the aspects of the present disclosure, the present disclosure also provides a computer-readable storage medium having stored thereon computer-executable instructions, which when executed by a processor, mainly implement a training method according to the medical data described above, including at least:

acquiring medical data defined as a positive case;

generating a negative case in response to the acquisition of the positive case;

As one of the aspects of the present disclosure, the present disclosure also provides a computer-readable storage medium having stored thereon computer-executable instructions, which when executed by a processor, mainly implement a processing method according to the medical data described above, including at least:

acquiring original medical data;

outputting the target medical data.

In some embodiments, a processor executing computer-executable instructions may be a processing device including more than one general-purpose processing device, such as a microprocessor, Central Processing Unit (CPU), Graphics Processing Unit (GPU), or the like. More specifically, the processor may be a Complex Instruction Set Computing (CISC) microprocessor, Reduced Instruction Set Computing (RISC) microprocessor, Very Long Instruction Word (VLIW) microprocessor, processor running other instruction sets, or processors running a combination of instruction sets. The processor may also be one or more special-purpose processing devices such as an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), a system on a chip (SoC), or the like.

In some embodiments, the computer-readable storage medium may be a memory, such as a read-only memory (ROM), a random-access memory (RAM), a phase-change random-access memory (PRAM), a static random-access memory (SRAM), a dynamic random-access memory (DRAM), an electrically erasable programmable read-only memory (EEPROM), other types of random-access memory (RAM), a flash disk or other form of flash memory, a cache, a register, a static memory, a compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD) or other optical storage, a tape cartridge or other magnetic storage device, or any other potentially non-transitory medium that may be used to store information or instructions that may be accessed by a computer device, and so forth.

In some embodiments, the computer-executable instructions may be implemented as a plurality of program modules that collectively implement the method for displaying medical images according to any one of the present disclosure.

The present disclosure describes various operations or functions that may be implemented as or defined as software code or instructions. The display unit may be implemented as software code or modules of instructions stored on a memory, which when executed by a processor may implement the respective steps and methods.

Such content may be source code or differential code ("delta" or "patch" code) that may be executed directly ("object" or "executable" form). A software implementation of the embodiments described herein may be provided through an article of manufacture having code or instructions stored thereon, or through a method of operating a communication interface to transmit data through the communication interface. A machine or computer-readable storage medium may cause a machine to perform the functions or operations described, and includes any mechanism for storing information in a form accessible by a machine (e.g., a computing display device, an electronic system, etc.), such as recordable/non-recordable media (e.g., Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media, optical storage media, flash memory display devices, etc.). The communication interface includes any mechanism for interfacing with any of a hardwired, wireless, optical, etc. medium to communicate with other display devices, such as a memory bus interface, a processor bus interface, an internet connection, a disk controller, etc. The communication interface may be configured by providing configuration parameters and/or transmitting signals to prepare the communication interface to provide data signals describing the software content. The communication interface may be accessed by sending one or more commands or signals to the communication interface.

The computer-executable instructions of embodiments of the present disclosure may be organized into one or more computer-executable components or modules. Aspects of the disclosure may be implemented with any number and combination of such components or modules. For example, aspects of the disclosure are not limited to the specific computer-executable instructions or the specific components or modules illustrated in the figures and described herein. Other embodiments may include different computer-executable instructions or components having more or less functionality than illustrated and described herein.

The above description is intended to be illustrative and not restrictive. For example, the above-described examples (or one or more versions thereof) may be used in combination with each other. For example, other embodiments may be used by those of ordinary skill in the art upon reading the above description. In addition, in the foregoing detailed description, various features may be grouped together to streamline the disclosure. This should not be interpreted as an intention that a disclosed feature not claimed is essential to any claim. Rather, the subject matter of the present disclosure may lie in less than all features of a particular disclosed embodiment. Thus, the following claims are hereby incorporated into the detailed description as examples or embodiments, with each claim standing on its own as a separate embodiment, and it is contemplated that these embodiments may be combined with each other in various combinations or permutations. The scope of the disclosure should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

The above embodiments are merely exemplary embodiments of the present disclosure, which is not intended to limit the present disclosure, and the scope of the present disclosure is defined by the claims. Various modifications and equivalents of the disclosure may occur to those skilled in the art within the spirit and scope of the disclosure, and such modifications and equivalents are considered to be within the scope of the disclosure.

Claims

1. A method of training medical data, comprising:

acquiring medical data defined as a positive case;

generating a negative case in response to the acquisition of the positive case;

2. The method of claim 1, wherein the acquiring medical data defined as positive examples comprises:

3. The method of claim 2, wherein generating negative examples in response to the obtaining of positive examples comprises:

4. The method of claim 3, wherein the annotation result comprises an annotation code characterizing the original information in the positive case;

5. The method of claim 4, wherein the determining the standard code associated with the label code in the standard information table according to the label code comprises:

comparing the label code with the standard code;

6. The method of claim 1, wherein the determining the standard code associated with the label code in the standard information table according to the label code further comprises:

a preset number of standard codes are extracted.

7. The method according to any one of claims 2 to 6, wherein the standard information table comprises an ICD-9-CM-3 standard information table.

8. A method of processing medical data, wherein,

acquiring original medical data;

inputting raw medical data into a model obtained based on the training method of any one of claims 1 to 7;

outputting the target medical data.

9. Processing apparatus for medical data training, comprising:

10. A computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement:

a method of training medical data according to any one of claims 1 to 7; or

The method of processing medical data according to claim 8.